# Project : The price fluctuations of bitcoin

### (1) Download Data from Internet

First, we download the data of price of bitcoin from [https://blockchain.info/charts/], and save the data as python objects in the memory. We can download from the url the 'Json' formatted data, and we need to parse the Json data into Pandas DataFrame that we are able to cope with easily.

The downloaded Json-formatted data is turned into a 'dict' type python object, which contains the following items:
```
data['unit']
data['description']
data['status']
data['period']
data['name']
data['values'][0]['x']  # time
data['values'][0]['y']  # year
```
We develope a `download_data_from_blockchain(..)` function to make this downloading process much more easier.

In [7]:
import requests
import re
import json

import numpy as ny
import pandas as pd

# develop a download_data_from_blockchain function to make downloading process easier
def download_data_from_blockchain(timespan, item):
    url = 'https://api.blockchain.info/charts/'+item+'?timespan='+time_span+'&format=json'
    json_webpage = requests.get(url).content
    json_obj = json.loads(json_webpage.decode('utf8'))
    re_time = []
    re_data = []
    for data in json_obj['values']:
        re_time.append(data['x'])
        re_data.append(data['y'])
    return(re_time, re_data)


time_span = '1year' # our time span is 1 year and ranges from May, 2017 to May, 2018

# 1. Download 'Currency statistics' datas from https://blockchain.info/charts/
# 
# download market-price of bitcoin on a daily basis
market_price_time, market_price_data = download_data_from_blockchain(time_span, 'market-price')

# download the total number of bitcoins that have already been mined; 
# in other words, the current supply of bitcoins on the network.
# In the beginning we find other variables including number of bitcoin addresses, 
# number of transactions and day destroyed (The number of bitcoins used in transactions 
# multiply by the days since they are last traded). After general data feature analysis and graphing,
# we finally chose "total bitcoins" to measure the supply of bitcoins.
supply_time, supply_data = download_data_from_blockchain(time_span, 'total-bitcoins')

# download the total USD value of bitcoin supply in circulation, 
# as calculated by the daily average market price across major exchanges.
market_cap_time, market_cap_data = download_data_from_blockchain(time_span, 'market-cap')

# downlad the total USD value of trading volume on major bitcoin exchanges.
# which stands for the liquidity of bitcoin trading
trade_vol_time, trade_vol_data = download_data_from_blockchain(time_span, 'trade-volume')

# check whether the downloaded data matches with each other.
# if supply_time == market_price_time:
#     print("Yes, the 'supply' data is matches the 'market_price' data well.")
# if market_cap_time == market_price_time:
#     print("Yes, the 'market_cap' data matches the 'market price' data well.")
# if trade_vol_time == market_price_time:
#     print("Yes, the 'trade-vol' data matches the 'market price' data well.")

# print(market_price_time[0])  # the format of time needs to be transformed. 
# print()


# Also we can download other features available on this website, such as:
#
#
# 2. Mining information 
#
# (1) difficulty : A relative measure of how difficult it is to find a new block. 
#                  The difficulty is adjusted periodically as a function of how 
#                  much hashing power has been deployed by the network of miners.
# (2) miners-revenue : Total value of coinbase block rewards and transaction fees paid to miners.
# (3) hash-rate : The estimated number of tera hashes per second 
#                 (trillions of hashes per second) the Bitcoin network is performing.
# (4) transaction-fees : The total value of all transaction fees paid to miners 
#                 (not including the coinbase value of block rewards).
# (5) transaction-fees-usd
# (6) cost-per-transaction-percent : A chart showing miners revenue as percentage 
#                 of the transaction volume.
# (7) cost-per-transaction : A chart showing miners revenue divided by the number of transactions.
# 
#
# 3. Network activity 
#
#
# 4. Blockchain Wallet Activity 
#
#
# 5. Block details 
# 
# We mainly use data from step1 this time. In the future, we'll try to put in other variables to improve our model.
# 

In [8]:
# create time variables for future time-series analysis
import time

f_time = []
for day in market_price_time:
    time_array_day = time.localtime(day)
    f_time_day = time.strftime("%Y-%m-%d", time_array_day)
    f_time.append(f_time_day)

In [9]:
bc_data = list(zip(f_time, market_price_data, supply_data, market_cap_data, trade_vol_data))
df = pd.DataFrame(data = bc_data, columns = ['date', 'price', 'supply', 'capital', 'trade'])

# Memory Clear:
#
# market_price_time = None
# market_price_data = None
# supply_time = None
# supply_data = None
# market_cap_time =None
# market_cap_data = None
# trade_vol_time = None
# trade_vol_data = None

Secondly, it would be interesting if we search a little bit about 'Wiki searching' data. We can download the data from [https://tools.wmflabs.org/pageviews].

In [10]:
page = '?pages=Bitcoin'
dates_range = '&range=all-time'
agent = '&agent=user'                 # 'all', 'spider', 'bot'
platform = '&platform=all-access'     # 'Desktop', 'Mobile app', 'Mobile web'
project = '&project=en.wikipedia.org'

url = 'https://tools.wmflabs.org/pageviews/'+page+dates_range+project+platform+agent
print(url)

## Sorry, I have to admit that, the data is generated with JS dynamically, and we
## need crawler if we want to do this in an automatical way, which is out of 
## my ability. Thus, I choose to download the data manually into a csv file.

# We didn't adopt Google search data in our model because it's on a weekly basis, 
# whereas bitcoin data is on a daily basis.

https://tools.wmflabs.org/pageviews/?pages=Bitcoin&range=all-time&project=en.wikipedia.org&platform=all-access&agent=user


In [11]:
wiki_data = pd.read_csv("./pageviews-20150701-20180501.csv")
wiki_data.columns = ['date', 'wiki']

Also, there are many other data sourses that we can use. For example, we can download data from [https://www.quandl.com/search], which is a free financial database, where we can find some financial related data. I would not display every detail of this data collection process. 

### (2) Merge DataFrames and save data in a file

First, we need to merge the two data frames by a common attribute "data":

In [12]:
df = df.merge(wiki_data, on = 'date', how = 'inner')
df.head()

Unnamed: 0,date,price,supply,capital,trade,wiki
0,2017-05-07,1535.868429,16316762.5,25060400000.0,62492470.0,17777
1,2017-05-08,1640.619225,16319012.5,26773290000.0,139627600.0,19666
2,2017-05-09,1721.284971,16320950.0,28093010000.0,167512000.0,23935
3,2017-05-10,1762.88625,16322800.0,28775240000.0,131817400.0,24385
4,2017-05-11,1820.990562,16324487.5,29726740000.0,151505800.0,25185


Then, save the data frame into a file:

In [13]:
df.to_csv("./Bitcoin_data.csv", index = False)