# Project : The price fluctuations of bitcoin

### (1) Download Data from Internet

First, we download the data of price of bitcoin from [https://blockchain.info/charts/], and save the data as python objects in the memory. We can download from the url the 'Json' formatted data, and we need to parse the Json data into Pandas DataFrame that we are able to cope with easily.

The downloaded Json-formatted data is turned into a 'dict' type python object, which contains the following items:
```
data['unit']
data['description']
data['status']
data['period']
data['name']
data['values'][0]['x']  # time
data['values'][0]['y']  # year
```
We develope a `download_data_from_blockchain(..)` function making this downloading process much more easier.

In [1]:
import requests
import re
import json

import numpy as ny
import pandas as pd

def download_data_from_blockchain(timespan, item):
    url = 'https://api.blockchain.info/charts/'+item+'?timespan='+time_span+'&format=json'
    json_webpage = requests.get(url).content
    json_obj = json.loads(json_webpage.decode('utf8'))
    re_time = []
    re_data = []
    for data in json_obj['values']:
        re_time.append(data['x'])
        re_data.append(data['y'])
    return(re_time, re_data)


time_span = '1year'

# 1. Download 'Currency statistics' datas from https://blockchain.info/charts/
# 
# download market-price of bitcoin
market_price_time, market_price_data = download_data_from_blockchain(time_span, 'market-price')

# download The total number of bitcoins that have already been mined; 
# in other words, the current supply of bitcoins on the network.
supply_time, supply_data = download_data_from_blockchain(time_span, 'total-bitcoins')

# download the total USD value of bitcoin supply in circulation, 
# as calculated by the daily average market price across major exchanges.
market_cap_time, market_cap_data = download_data_from_blockchain(time_span, 'market-cap')

# downlad The total USD value of trading volume on major bitcoin exchanges.
trade_vol_time, trade_vol_data = download_data_from_blockchain(time_span, 'trade-volume')

# check whether the downloaded data matches with each other.
# if supply_time == market_price_time:
#     print("Yes, the 'supply' data is matches the 'market_price' data well.")
# if market_cap_time == market_price_time:
#     print("Yes, the 'market_cap' data matches the 'market price' data well.")
# if trade_vol_time == market_price_time:
#     print("Yes, the 'trade-vol' data matches the 'market price' data well.")

# print(market_price_time[0])  # the format of time needs to be transformed. 
# print()


# Also we can download other features available on this website, such as:
#
#
# 2. Mining information 
#
# (1) difficulty : A relative measure of how difficult it is to find a new block. 
#                  The difficulty is adjusted periodically as a function of how 
#                  much hashing power has been deployed by the network of miners.
# (2) miners-revenue : Total value of coinbase block rewards and transaction fees paid to miners.
# (3) hash-rate : The estimated number of tera hashes per second 
#                 (trillions of hashes per second) the Bitcoin network is performing.
# (4) transaction-fees : The total value of all transaction fees paid to miners 
#                 (not including the coinbase value of block rewards).
# (5) transaction-fees-usd
# (6) cost-per-transaction-percent : A chart showing miners revenue as percentage 
#                 of the transaction volume.
# (7) cost-per-transaction : A chart showing miners revenue divided by the number of transactions.
# 
#
# 3. Network activity 
#
#
# 4. Blockchain Wallet Activity 
#
#
# 5. Block details 
# 
# ......
# 

In [2]:
import time

f_time = []
for day in market_price_time:
    time_array_day = time.localtime(day)
    f_time_day = time.strftime("%Y-%m-%d", time_array_day)
    f_time.append(f_time_day)

In [3]:
bc_data = list(zip(f_time, market_price_data, supply_data, market_cap_data, trade_vol_data))
df = pd.DataFrame(data = bc_data, columns = ['date', 'price', 'supply', 'capital', 'trade'])

# Memory Clear:
#
# market_price_time = None
# market_price_data = None
# supply_time = None
# supply_data = None
# market_cap_time =None
# market_cap_data = None
# trade_vol_time = None
# trade_vol_data = None

Secondly, it would be interesting if we search a little bite about 'Wicki searching' data. We can download the data from [https://tools.wmflabs.org/pageviews].

In [4]:
page = '?pages=Bitcoin'
dates_range = '&range=all-time'
agent = '&agent=user'                 # 'all', 'spider', 'bot'
platform = '&platform=all-access'     # 'Desktop', 'Mobile app', 'Mobile web'
project = '&project=en.wikipedia.org'

url = 'https://tools.wmflabs.org/pageviews/'+page+dates_range+project+platform+agent
print(url)

## Sorry, I have to admit that, the data is generated with JS dynamically, and we
## need crawler if we want to do this in an automatical way, which is out of 
## my ability. Thus, I choose to download the data manually into a csv file.

https://tools.wmflabs.org/pageviews/?pages=Bitcoin&range=all-time&project=en.wikipedia.org&platform=all-access&agent=user


In [5]:
wiki_data = pd.read_csv("./pageviews-20150701-20180501.csv")
wiki_data.columns = ['date', 'wiki']

Also, there are many other data sourses that we can use. For example, we can download datas from [https://www.quandl.com/search], which is a free financial database, where we can find some financial relat. I would not display every detail of this data collection process. 

### (2) Merge DataFrames and save data in a file

First, we need to merge the two data frames by a common attribute "data":

In [6]:
df = df.merge(wiki_data, on = 'date', how = 'inner')
df.head()

Unnamed: 0,date,price,supply,capital,trade,wiki
0,2017-05-02,1452.076288,16307212.5,23679320000.0,86306470.0,17143
1,2017-05-03,1507.576857,16308862.5,24586860000.0,98768080.0,17277
2,2017-05-04,1508.292125,16310962.5,24601700000.0,178681000.0,19238
3,2017-05-05,1533.335071,16312575.0,25012640000.0,136654800.0,16900
4,2017-05-06,1560.4102,16314675.0,25457590000.0,68907280.0,17793


Then, save the data frame into a file:

In [7]:
df.to_csv("./Bitcoin_data.csv", index = False)