# Data Extraction Documentation
## BS Data Science Project - Recurrent Bitcoin Network

This notebook contains a brief documentation on retrieving data from [Messari](https://messari.io/) through its [API](https://messari.io/api/docs). Method of data collection would be through a REST API with a `json` response.

### Data

The expected data would be a data frame of time series data denoted by its columns and it would be splitted, normalized, and be fed for model training and testing. In this project, we would only use daily time series data from 2016 to 2020 as the training and validation set, while 2021 would be used as a test set. Note that this notebook does not include the retrieval of the test set.

### Prerequesites

Before running this notebook, several python packages are needed to be installed as shown in the following cell. In addition, changing of working directory within the local repository was done.

In [1]:
%%capture
# Redirects the current working directory to `/src/` directory
%cd ..\src

In [2]:
# Built-in packages
from requests_futures.sessions import FuturesSession
import json
import time
from datetime import datetime

# Packages to be installed
import pandas as pd
from tqdm import tqdm

# Local python functions
from collect import Collector

### Retrieve Metrics Data From Messari

This section retrieves the complete list of the possible metrics to be queried in time series.

In [3]:
collector = Collector()

In [4]:
metrics = collector.get_metrics()

In [5]:
freemetrics = metrics.get_free_metrics(return_df=True)
freemetrics.reset_index()['metric_id'].duplicated().sum()

0

In [6]:
freemetrics

Unnamed: 0_level_0,name,description,values_schema,minimum_interval,source_attribution,role_restriction
metric_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
price,Price,Volume weighted average price computed using M...,{'open': 'The price of the asset at the beginn...,1m,"[{'name': 'Kaiko', 'url': 'https://www.kaiko.c...",
txn.fee.med.ntv,Median Transaction Fees (Native Units),The median fee per transaction in native units...,{'transaction_fee_median': 'The median fee per...,1d,"[{'name': 'Coinmetrics', 'url': 'https://coinm...",
txn.tfr.erc721.cnt,ERC-721 Transfer Count,The sum count of ERC-721 transfers in that int...,{'transaction_transfer_count_erc721': 'The sum...,1d,"[{'name': 'Coinmetrics', 'url': 'https://coinm...",
exch.flow.in.usd,Deposits on Exchanges,The sum USD value sent to exchanges that inter...,{'flow_in_usd': 'The sum USD value sent to exc...,1d,"[{'name': 'Coinmetrics', 'url': 'https://coinm...",
txn.tfr.erc20.cnt,ERC-20 Transfer Count,The sum count of ERC-20 transfers in that inte...,{'transaction_transfer_count_erc20': 'The sum ...,1d,"[{'name': 'Coinmetrics', 'url': 'https://coinm...",
new.iss.ntv,New Issuance (Native Units),The sum of new native units issued that interv...,{'issuance_native': 'The sum of new native uni...,1d,"[{'name': 'Coinmetrics', 'url': 'https://coinm...",
iss.rate,Annual Issuance Rate,The percentage of new native units (continuous...,{'issuance_rate': 'The percentage of new nativ...,1d,"[{'name': 'Coinmetrics', 'url': 'https://coinm...",
mcap.realized,Realized Marketcap,The sum USD value based on the USD closing pri...,{'marketcap_realized': 'The sum USD value base...,1d,"[{'name': 'Coinmetrics', 'url': 'https://coinm...",
exch.flow.in.usd.incl,Deposits on Exchanges - Inclusive,The sum USD value sent to exchanges that inter...,{'flow_in_usd': 'The sum USD value sent to exc...,1d,"[{'name': 'Coinmetrics', 'url': 'https://coinm...",
sply.circ,Circulating Supply,The circulating supply acknowledges that token...,{'circulating_supply': 'The circulating supply...,1d,"[{'name': 'CoinGecko', 'url': 'https://coingec...",


In [20]:
freemetrics.loc['blk.cnt','description']

'The sum count of blocks created each day'

In [23]:
freemetrics.loc['blk.size.byte','description']

'The sum of the size (in bytes) of all blocks created each day'

In [24]:
freemetrics.loc['blk.size.bytes.avg','description']

'The mean size (in bytes) of all blocks created'

In [7]:
freemetrics.to_csv('../raw/freemetrics.csv')

In [8]:
with open('../raw/bitcoin_metrics.json', 'w') as f:
    json.dump(metrics.get_bitcoin_metrics(), f, indent=4)

### Retrieve Bitcoin Time Series

In this section, we would retrieve time series data with the given metrics retrieved above. The `tqdm` progress bar package was implemented to show the data collection process.

In [9]:
y = collector.get_timeseries('BTC','price')
y

<messari.Timeseries at 0x19f598251f0>

In [10]:
y.data

Unnamed: 0,timestamp,open,high,low,close,volume
0,2021-02-26T00:00:00Z,46844.230807,48453.878092,44107.496843,46309.909584,1.461488e+10
1,2021-02-27T00:00:00Z,46317.629780,48373.380427,45041.180621,46168.276611,7.630471e+09
2,2021-02-28T00:00:00Z,46138.784058,46676.901247,43030.466959,45223.360320,1.029078e+10
3,2021-03-01T00:00:00Z,45169.630736,49807.308916,45033.589248,49611.571197,1.104579e+10
4,2021-03-02T00:00:00Z,49628.987746,50256.036957,47067.573885,48500.723574,8.906216e+09
...,...,...,...,...,...,...
250,2021-11-03T00:00:00Z,63258.522394,63561.906871,60530.193377,62927.542965,6.718036e+09
251,2021-11-04T00:00:00Z,62936.750955,63126.171802,60713.376141,61456.704150,3.483813e+09
252,2021-11-05T00:00:00Z,61443.058359,62635.308486,60778.112164,61016.505544,2.599436e+09
253,2021-11-06T00:00:00Z,61038.386278,61600.546046,60102.705539,61528.835302,2.792822e+09


In [11]:
y.get_structured_data()

Unnamed: 0_level_0,Price,Price,Price,Price,Price
Unnamed: 0_level_1,open,high,low,close,volume
timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2021-02-26,46844.230807,48453.878092,44107.496843,46309.909584,1.461488e+10
2021-02-27,46317.629780,48373.380427,45041.180621,46168.276611,7.630471e+09
2021-02-28,46138.784058,46676.901247,43030.466959,45223.360320,1.029078e+10
2021-03-01,45169.630736,49807.308916,45033.589248,49611.571197,1.104579e+10
2021-03-02,49628.987746,50256.036957,47067.573885,48500.723574,8.906216e+09
...,...,...,...,...,...
2021-11-03,63258.522394,63561.906871,60530.193377,62927.542965,6.718036e+09
2021-11-04,62936.750955,63126.171802,60713.376141,61456.704150,3.483813e+09
2021-11-05,61443.058359,62635.308486,60778.112164,61016.505544,2.599436e+09
2021-11-06,61038.386278,61600.546046,60102.705539,61528.835302,2.792822e+09


In [12]:
import time
from tqdm import tqdm

btc_metrics = metrics.get_bitcoin_metrics()
X = list()


with tqdm(total=len(btc_metrics)) as pbar:
    pbar.set_description('Response [200]')

    for metric in btc_metrics:
        while True:
            try:
                result = collector.get_timeseries(
                    assetkey='BTC', metric_id=metric,
                    start='2016-01-01', end='2020-12-31'
                )
                X.append(result.get_structured_data())
                pbar.update(1)
                break
            except Exception as err:
                pbar.write(str(err))
                time.sleep(5)


Response [200]:  33%|███▎      | 17/51 [00:48<01:06,  1.97s/it]

429 Client Error: Too Many Requests for url: https://data.messari.io/api/v1/assets/BTC/metrics/exch.sply/time-series?start=2016-01-01&end=2020-12-31&interval=1d&columns=&format=json&timestamp-format=rfc3339


Response [200]:  33%|███▎      | 17/51 [00:54<01:06,  1.97s/it]

429 Client Error: Too Many Requests for url: https://data.messari.io/api/v1/assets/BTC/metrics/exch.sply/time-series?start=2016-01-01&end=2020-12-31&interval=1d&columns=&format=json&timestamp-format=rfc3339


Response [200]:  71%|███████   | 36/51 [01:45<00:55,  3.72s/it]

429 Client Error: Too Many Requests for url: https://data.messari.io/api/v1/assets/BTC/metrics/nvt.adj/time-series?start=2016-01-01&end=2020-12-31&interval=1d&columns=&format=json&timestamp-format=rfc3339


Response [200]:  71%|███████   | 36/51 [01:50<00:55,  3.72s/it]

429 Client Error: Too Many Requests for url: https://data.messari.io/api/v1/assets/BTC/metrics/nvt.adj/time-series?start=2016-01-01&end=2020-12-31&interval=1d&columns=&format=json&timestamp-format=rfc3339


Response [200]:  71%|███████   | 36/51 [01:56<00:55,  3.72s/it]

429 Client Error: Too Many Requests for url: https://data.messari.io/api/v1/assets/BTC/metrics/nvt.adj/time-series?start=2016-01-01&end=2020-12-31&interval=1d&columns=&format=json&timestamp-format=rfc3339


Response [200]: 100%|██████████| 51/51 [02:53<00:00,  3.40s/it]


In [14]:
data = pd.concat(X, axis=1)
data

Unnamed: 0_level_0,Price,Price,Price,Price,Price,Median Transaction Fees (Native Units),Deposits on Exchanges,New Issuance (Native Units),Annual Issuance Rate,Realized Marketcap,...,Average Transfer Value,Average Difficulty,Circulating Marketcap,Marketcap Dominance,Total Fees,Average Transaction Fees (Native Units),Adjusted NVT 90-days MA,Block Count,Average Transfer Value (Native Units),Active Addresses Count
Unnamed: 0_level_1,open,high,low,close,volume,transaction_fee_median,flow_in_usd,issuance_native,issuance_rate,marketcap_realized,...,average_transfer_value_usd,mean_difficulty,circulating_marketcap,marketcap_dominance,fees_total_usd,transaction_fee_avg_ntv,nvt_adjusted_90d_ma,block_count,transfer_value_avg,active_addresses
timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2016-01-01,430.206620,437.841344,426.068766,433.960345,2.099646e+07,0.000100,1.308937e+07,3375.0,8.194534,4.531462e+09,...,1181.005400,1.038803e+11,6.468191e+09,94.142223,8.638224e+03,0.000160,41.147256,135,2.716966,316781
2016-01-02,434.015456,436.820867,431.542401,433.221343,1.393225e+07,0.000100,1.066508e+07,3625.0,8.799415,4.533731e+09,...,767.118818,1.038803e+11,6.527223e+09,94.131509,1.356996e+04,0.000211,41.048731,145,1.765765,417966
2016-01-03,433.212063,433.727409,422.757469,429.132867,2.396948e+07,0.000100,1.552670e+07,3625.0,8.797294,4.537363e+09,...,793.981305,1.038803e+11,6.516692e+09,94.116662,1.052675e+04,0.000169,40.644184,145,1.845884,398443
2016-01-04,429.401480,435.231337,428.233668,433.152291,2.528594e+07,0.000100,1.800722e+07,4525.0,10.978147,4.541544e+09,...,954.420051,1.038803e+11,6.459280e+09,94.080515,1.316932e+04,0.000170,40.918827,181,2.201920,413159
2016-01-05,432.978697,434.548442,428.386082,432.001391,1.956450e+07,0.000100,2.088151e+07,3925.0,9.519997,4.545016e+09,...,1242.632112,1.038803e+11,6.516855e+09,94.131122,1.406663e+04,0.000177,40.754906,157,2.872012,435291
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-12-27,26453.928239,28375.198389,25748.450880,26250.457794,1.043023e+10,0.000158,1.213820e+09,925.0,1.816875,1.618915e+11,...,17367.771562,1.866826e+13,4.917348e+11,70.425326,2.370802e+06,0.000295,114.994093,148,0.657102,1049825
2020-12-28,26248.564564,27469.396137,26069.019129,27032.289152,5.653047e+09,0.000164,1.016468e+09,1000.0,1.964065,1.656552e+11,...,21894.585318,1.859959e+13,4.873379e+11,69.689198,2.778372e+06,0.000327,115.311281,160,0.809730,1154354
2020-12-29,27036.832984,27387.310760,25832.269524,27360.005185,5.558736e+09,0.000164,8.085709e+08,950.0,1.865880,1.683068e+11,...,20852.655461,1.859959e+13,5.028255e+11,69.442467,2.854272e+06,0.000313,114.333016,152,0.765763,1146131
2020-12-30,27363.633892,28999.010850,27337.600922,28886.315853,7.442424e+09,0.000171,1.309471e+09,1075.0,2.111160,1.702719e+11,...,21850.191708,1.859959e+13,5.081546e+11,69.728087,3.099548e+06,0.000315,119.111340,172,0.757514,1221579


In [15]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1827 entries, 2016-01-01 to 2020-12-31
Data columns (total 61 columns):
 #   Column                                                                     Non-Null Count  Dtype  
---  ------                                                                     --------------  -----  
 0   (Price, open)                                                              1827 non-null   float64
 1   (Price, high)                                                              1827 non-null   float64
 2   (Price, low)                                                               1827 non-null   float64
 3   (Price, close)                                                             1827 non-null   float64
 4   (Price, volume)                                                            1827 non-null   float64
 5   (Median Transaction Fees (Native Units), transaction_fee_median)           1827 non-null   float64
 6   (Deposits on Exchanges, flow_in_usd)  

In [25]:
data.to_csv('../raw/data.csv')