## Scraper für Krypto-Kurse

Wir interessieren uns in diesem Notebook für Krypto-Coins.

Die Webseite https://coinmarketcap.com/ führt Marktdaten zu den hundert wichtigsten Coins auf.

Mit einem einfachen Scraper werden wir diese Daten beschaffen und rudimentär analysieren.

## Vorbereitung

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
import pandas as pd

In [39]:
import time

## Scraper

In [17]:
path = ''

### Liste von allen Kryptowährungen

Zuerst kucken wir auf der Seite, welches die 100 grössten Kryptowährungen sind, und laden uns Namen und Links derselbigen.

In [4]:
base_url = 'https://coinmarketcap.com/'

In [5]:
response = requests.get(base_url)
doc = BeautifulSoup(response.text, "html.parser")

In [6]:
currencies = doc.find_all('a', class_='currency-name-container link-secondary')

In [8]:
currencies[0]

<a class="currency-name-container link-secondary" href="/currencies/bitcoin/">Bitcoin</a>

In [9]:
len(currencies)

100

In [10]:
currency_list = []

In [11]:
for currency in currencies:
    this_currency = {}
    this_currency['name'] = currency.text
    this_currency['link'] = currency['href']
    currency_list.append(this_currency)

In [12]:
df_currencies = pd.DataFrame(currency_list)

In [13]:
df_currencies.head(2)

Unnamed: 0,name,link
0,Bitcoin,/currencies/bitcoin/
1,Ethereum,/currencies/ethereum/


In [14]:
df_currencies['link'] = df_currencies['link'].str.extract('/currencies/(.+)/')

In [15]:
df_currencies.head(2)

Unnamed: 0,name,link
0,Bitcoin,bitcoin
1,Ethereum,ethereum


In [18]:
df_currencies.to_csv(path + 'currencies.csv', index=False)

### Daten von den einzelnen Währungen

Zuerst testen wir mit einer Probewährung aus, wie wir an die Informationen kommen.

In [19]:
base_url = 'https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20181012&end=20191012'

In [20]:
response = requests.get(base_url)
doc = BeautifulSoup(response.text, "html.parser")

In [21]:
days = doc.find_all('tr', class_='text-right')

In [22]:
days_list = []

In [23]:
cells = days[0].find_all('td')

In [24]:
cells

[<td class="text-left">Oct 12, 2019</td>,
 <td data-format-fiat="" data-format-value="8315.66465289">8315.66</td>,
 <td data-format-fiat="" data-format-value="8415.24222171">8415.24</td>,
 <td data-format-fiat="" data-format-value="8313.34130028">8313.34</td>,
 <td data-format-fiat="" data-format-value="8336.55527417">8336.56</td>,
 <td data-format-market-cap="" data-format-value="14532641604.5">14,532,641,605</td>,
 <td data-format-market-cap="" data-format-value="1.49965767624e+11">149,965,767,624</td>]

In [25]:
this_day = {}

In [26]:
this_day['date'] = cells[0].text
this_day['open'] = cells[1].text
this_day['high'] = cells[2].text
this_day['low'] = cells[3].text
this_day['close'] = cells[4].text
this_day['volume'] = cells[5].text
this_day['marketcap'] = cells[6].text

In [27]:
this_day

{'date': 'Oct 12, 2019',
 'open': '8315.66',
 'high': '8415.24',
 'low': '8313.34',
 'close': '8336.56',
 'volume': '14,532,641,605',
 'marketcap': '149,965,767,624'}

In [28]:
for day in days:
    this_day = {}
    cells = day.find_all('td')
    this_day['date'] = cells[0].text
    this_day['open'] = cells[1].text
    this_day['high'] = cells[2].text
    this_day['low'] = cells[3].text
    this_day['close'] = cells[4].text
    this_day['volume'] = cells[5].text
    this_day['marketcap'] = cells[6].text
    days_list.append(this_day)

In [29]:
df = pd.DataFrame(days_list)

In [30]:
df.head(2)

Unnamed: 0,date,open,high,low,close,volume,marketcap
0,"Oct 12, 2019",8315.66,8415.24,8313.34,8336.56,14532641605,149965767624
1,"Oct 11, 2019",8585.26,8721.78,8316.18,8321.76,19604381101,149685618275


Nun wenden wir den Scraper auf alle Währungen an

In [33]:
df_currencies = pd.read_csv(path + 'currencies.csv')

In [34]:
df_currencies.head(2)

Unnamed: 0,name,link
0,Bitcoin,bitcoin
1,Ethereum,ethereum


In [35]:
len(df_currencies)

100

In [36]:
currencies = df_currencies.to_dict(orient='records')

In [37]:
url_start = 'https://coinmarketcap.com/currencies/'
url_end = '/historical-data/?start=20181012&end=20191012'

In [42]:
for currency in currencies:
    print ('working on: ' + currency['name'])
    
    url = url_start + currency['link'] + url_end
    response = requests.get(url)
    doc = BeautifulSoup(response.text, "html.parser")
    
    # print (doc)
    
    days = doc.find_all('tr', class_='text-right')
    days_list = []
    
    this_day = {}
    for day in days:
        this_day = {}
        cells = day.find_all('td')
        this_day['date'] = cells[0].text
        this_day['open'] = cells[1].text
        this_day['high'] = cells[2].text
        this_day['low'] = cells[3].text
        this_day['close'] = cells[4].text
        this_day['volume'] = cells[5].text
        this_day['marketcap'] = cells[6].text
        days_list.append(this_day)
        
    df = pd.DataFrame(days_list)
    filename = currency['name'] + '.csv'
    df.to_csv(path + 'data/' + filename, index=False)
    
    time.sleep(10)
    
print('Done')

working on: Bitcoin
working on: Ethereum
working on: XRP
working on: Tether
working on: Bitcoin Cash
working on: Litecoin
working on: EOS
working on: Binance Coin
working on: Bitcoin SV
working on: Stellar
working on: TRON
working on: Cardano
working on: UNUS SED LEO
working on: Monero
working on: Chainlink
working on: Huobi Token
working on: IOTA
working on: Dash
working on: Tezos
working on: Ethereum Classic
working on: Cosmos
working on: NEO
working on: Maker
working on: USD Coin
working on: Crypto.com Coin
working on: NEM
working on: Ontology
working on: Dogecoin
working on: Zcash
working on: Basic Attenti...
working on: Paxos Standard
working on: HedgeTrade
working on: VeChain
working on: TrueUSD
working on: Qtum
working on: Decred
working on: Ravencoin
working on: 0x
working on: V Systems
working on: ZB
working on: Bitcoin Gold
working on: Holo
working on: ABBC Coin
working on: EDUCare
working on: OmiseGO
working on: Swipe
working on: DigiByte
working on: Centrality
working on: A

Am Ende haben wir eine Liste von Dateien: Zu jeder Kryptowährung existiert eine Tabelle mit den Marktdaten über den definierten Zeitraum.

Die Daten sind im Unterordner `data/` abgelegt.