# Get blockChain Data
> API: https://www.blockchain.com/api/charts_api    
> Github: https://github.com/blockchain/api-v1-client-python

* **Description**: Get blockchain data from "https://www.blockchain.com/"
* **Author**: Aaron
* **Version**: 0.2

**Updates**:
N/A

**Issues:**
1. As each blockChain Data could in different situation(interval/start_date/timestamp), it's different to write an unified **Func()** to collect data.
2. **mempool** data start from 2016, therefore I didn't cover it.

In [65]:
# Python Packages
import pandas as pd
import functools

In [66]:
# ALL the metrics from “https://www.blockchain.com/charts/”.
# Can be used for further Analysis

all_metrics = \
[
##Currency Statistics##
    "total-bitcoins", #Total Circulating Bitcoin: The total number of mined bitcoin that are currently circulating on the network.
    "market-price", #Market Price: The average USD market price across major bitcoin exchanges.
    "market-cap", #Market Capitalization (USD): The total USD value of bitcoin in circulation.
    "trade-volume",#Exchange Trade Volume (USD): The total USD value of trading volume on major bitcoin exchanges.
    
##Block Details##
    "blocks-size",#Blockchain Size (MB): The total size of the blockchain minus database indexes in megabytes.
    "avg-block-size",#Average Block Size (MB): The average block size over the past 24 hours in megabytes.
    "n-transactions-per-block",#Average Transactions Per Block: The average number of transactions per block over the past 24 hours.
    "n-payments-per-block",#Average Payments Per Block: The average number of payments per block over the past 24 hours.
    "n-transactions-total",#Total Number of Transactions: The total number of transactions on the blockchain.
    "median-confirmation-time",#Median Confirmation Time: The median time for a transaction with miner fees to be included in a mined block and added to the public ledger.
    "Average Confirmation Time",#The average time for a transaction with miner fees to be included in a mined block and added to the public ledger.
    
##Mining Information##
    "hash-rate",#Total Hash Rate (TH/s): The estimated number of terahashes per second the bitcoin network is performing in the last 24 hours.
    "difficulty",#Network Difficulty: A relative measure of how difficult it is to mine a new block for the blockchain.
    "miners-revenue",#Miners Revenue (USD): Total value of coinbase block rewards and transaction fees paid to miners.
    "transaction-fees-usd",#Total Transaction Fees (USD): The total USD value of all transaction fees paid to miners. This does not include coinbase block rewards.
    "fees-usd-per-transaction",#Fees Per Transaction (USD): Average transaction fees in USD per transaction.
    "cost-per-transaction-percent",#Cost % of Transaction Volume: A chart showing miners revenue as percentage of the transaction volume.
    "cost-per-transaction",#A chart showing miners revenue divided by the number of transactions.
    
##Network Activity##
    "n-unique-addresses",#The total number of unique addresses used on the blockchain.
    "n-transactions",#Confirmed Transactions Per Day: The total number of confirmed transactions per day.
    "n-payments",#Confirmed Payments Per Day: The total number of confirmed payments per day.
    "transactions-per-second",#Transaction Rate Per Second: The number of transactions added to the mempool per second.
    "output-volume",#Output Value Per Day: The total value of all transaction outputs per day. This includes coins returned to the sender as change.
    "mempool-count",#Mempool Transaction Count: The total number of unconfirmed transactions in the mempool.
    "mempool-growth",#Mempool Size Growth: The rate at which the mempool is growing in bytes per second.
    "mempool-size",#The aggregate size in bytes of transactions waiting to be confirmed.
    "utxo-count",#Unspent Transaction Outputs: The total number of valid unspent transactions outputs. This excludes invalid UTXOs with opcode OP_RETURN
    "n-transactions-excluding-popular",#Transactions Excluding Popular Addresses: The total number of transactions excluding those involving the network's 100 most popular addresses.
    "estimated-transaction-volume-usd",#Estimated Transaction Value (USD): The total estimated value in USD of transactions on the blockchain. This does not include coins returned as change.

##Blockchain.com Wallet Activity##
    "my-wallet-n-users",#Blockchain.com Wallets: The total number of unique Blockchain.com wallets created.

##Market Signals##
    "mvrv",#Market Value to Realised Value: MVRV is calculated by dividing Market Value by Realised Value. In Realised Value, BTC prices are taken at the time they last moved, instead of the current price like in Market Value
    "nvt",#Network Value to Transactions: NVT is computed by dividing the Network Value (= Market Value) by the total transactions volume in USD over the past 24hour.
    "nvts"#Network Value to Transactions Signal: NVTS is a more stable measure of NVT, with the denominator being the moving average over the last 90 days of NVT's denominator
]

In [67]:
'''
Description: Gain Blockchain Data from Blockchain API
Args:
    timespan: Duration of the data(if <= 6years, usually can get the data by 1day interval)
    metrics: Metrics represent each Blockchain Data
    start_date: The start date 
    continue_date: The continue date (because the Max timespan==6years, so manually calculate the continue_date)
Return: 
    all_data: All the chosen Blockchain Data combined in a Pandas Dataframe
'''
def blkChainCrawler(timespan, metrics, start_date, continue_date):
    
    # API Info
    url1 = f'https://api.blockchain.info/charts/{metrics}?timespan={timespan}&start={start_date}&format=csv'
    url2 = f'https://api.blockchain.info/charts/{metrics}?timespan={timespan}&start={continue_date}&format=csv'
    
    # Obtain Data
    data1 = pd.read_csv(url1,names=['Timestamp',metrics])
    data2 = pd.read_csv(url2,names=['Timestamp',metrics])
    
    # Concat by rows
    all_data = pd.concat([data1,data2])
    
    # Transform "Timestamp" to datetime type
    all_data['Timestamp'] = pd.to_datetime(all_data["Timestamp"])
    
    # Keep the same end date with Bitcoin data(Lurr's work, manually set the date)
    all_data = all_data[(all_data['Timestamp'] < '2021-04-01')]
    
    return all_data

In [68]:
# Define the parameters
timespan = "6years"
start_date = "2012-01-01"
continue_date = "2017-12-31"

# Metrics which suggest by literature review.
metrics = [
            "hash-rate",#Hash Rate
            "difficulty",#Mining Difficulty
            "n-transactions",#Confirmed transactions per day
            "estimated-transaction-volume-usd",#Estimated transaction value
            "transaction-fees-usd",#Total transaction fees
            "miners-revenue"#Miners Revenue (USD)
]

In [69]:
# Merge the data
merge = functools.partial(pd.merge, on='Timestamp')
df1 = functools.reduce(merge, [blkChainCrawler(timespan, metric, start_date, continue_date) for metric in metrics])
df1

Unnamed: 0,Timestamp,hash-rate,difficulty,n-transactions,estimated-transaction-volume-usd,transaction-fees-usd,miners-revenue
0,2012-01-01,8.591401e+00,1.159929e+06,5001.0,1.016110e+06,1.851638e+01,4.260652e+04
1,2012-01-02,8.764382e+00,1.159929e+06,5410.0,7.508830e+05,3.598932e+01,6.301249e+04
2,2012-01-03,9.340986e+00,1.159929e+06,5773.0,6.037982e+05,3.056013e+01,4.662806e+04
3,2012-01-04,8.879703e+00,1.159929e+06,5731.0,7.495462e+05,7.808277e+01,4.706558e+04
4,2012-01-05,8.476080e+00,1.159929e+06,6994.0,1.614569e+06,4.469720e+01,5.369470e+04
...,...,...,...,...,...,...,...
3373,2021-03-27,1.728239e+08,2.186556e+13,288259.0,3.738512e+09,3.697781e+06,5.921297e+07
3374,2021-03-28,1.728239e+08,2.186556e+13,248605.0,2.746380e+09,3.241129e+06,5.877202e+07
3375,2021-03-29,1.641284e+08,2.186556e+13,303305.0,6.604751e+09,4.649751e+06,5.973204e+07
3376,2021-03-30,1.586936e+08,2.186556e+13,321090.0,6.286007e+09,5.655379e+06,5.897879e+07


In [70]:
# Check duplicated rows
len(df1['Timestamp'].unique())

3378

In [71]:
#Market capitalization - "market-cap"
df2 = blkChainCrawler(timespan, 'market-cap', start_date, continue_date)
df2

Unnamed: 0,Timestamp,market-cap
0,2012-01-01 00:00:01,4.032958e+07
1,2012-01-02 13:34:31,4.223035e+07
2,2012-01-04 00:14:03,4.309479e+07
3,2012-01-05 15:23:53,4.661373e+07
4,2012-01-06 23:04:03,5.311746e+07
...,...,...
1438,2021-03-28 13:25:45,1.038887e+12
1439,2021-03-29 08:44:35,1.050932e+12
1440,2021-03-30 05:27:23,1.073203e+12
1441,2021-03-30 23:59:47,1.095385e+12


In [72]:
# Check duplicated rows
len(df2['Timestamp'].unique())

2943

In [73]:
# Wipe off the Timestamp‘s h:m:s.
df2['Timestamp'] = pd.to_datetime(df2["Timestamp"]).dt.normalize()
# Drop the duplicates in column "Timestamp", keep the last value
df2.drop_duplicates(subset="Timestamp", keep="last", inplace=True)

In [74]:
df2

Unnamed: 0,Timestamp,market-cap
0,2012-01-01,4.032958e+07
1,2012-01-02,4.223035e+07
2,2012-01-04,4.309479e+07
3,2012-01-05,4.661373e+07
4,2012-01-06,5.311746e+07
...,...,...
1437,2021-03-27,1.041476e+12
1438,2021-03-28,1.038887e+12
1439,2021-03-29,1.050932e+12
1441,2021-03-30,1.095385e+12


In [75]:
# Check duplicated rows
len(df2['Timestamp'].unique())

2684

In [76]:
# Add the Market capitalization data
all_data = pd.merge(df1, df2, how="left", on='Timestamp')
all_data = all_data.interpolate()
all_data

Unnamed: 0,Timestamp,hash-rate,difficulty,n-transactions,estimated-transaction-volume-usd,transaction-fees-usd,miners-revenue,market-cap
0,2012-01-01,8.591401e+00,1.159929e+06,5001.0,1.016110e+06,1.851638e+01,4.260652e+04,4.032958e+07
1,2012-01-02,8.764382e+00,1.159929e+06,5410.0,7.508830e+05,3.598932e+01,6.301249e+04,4.223035e+07
2,2012-01-03,9.340986e+00,1.159929e+06,5773.0,6.037982e+05,3.056013e+01,4.662806e+04,4.266257e+07
3,2012-01-04,8.879703e+00,1.159929e+06,5731.0,7.495462e+05,7.808277e+01,4.706558e+04,4.309479e+07
4,2012-01-05,8.476080e+00,1.159929e+06,6994.0,1.614569e+06,4.469720e+01,5.369470e+04,4.661373e+07
...,...,...,...,...,...,...,...,...
3373,2021-03-27,1.728239e+08,2.186556e+13,288259.0,3.738512e+09,3.697781e+06,5.921297e+07,1.041476e+12
3374,2021-03-28,1.728239e+08,2.186556e+13,248605.0,2.746380e+09,3.241129e+06,5.877202e+07,1.038887e+12
3375,2021-03-29,1.641284e+08,2.186556e+13,303305.0,6.604751e+09,4.649751e+06,5.973204e+07,1.050932e+12
3376,2021-03-30,1.586936e+08,2.186556e+13,321090.0,6.286007e+09,5.655379e+06,5.897879e+07,1.095385e+12


In [77]:
# Check nan value
all_data[all_data.isnull().T.any()]

Unnamed: 0,Timestamp,hash-rate,difficulty,n-transactions,estimated-transaction-volume-usd,transaction-fees-usd,miners-revenue,market-cap


In [None]:
# Check duplicated rows
len(all_data['Timestamp'].unique())

In [88]:
all_data

Unnamed: 0_level_0,hash-rate,difficulty,n-transactions,estimated-transaction-volume-usd,transaction-fees-usd,miners-revenue,market-cap
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2012-01-01,8.591401e+00,1.159929e+06,5001.0,1.016110e+06,1.851638e+01,4.260652e+04,4.032958e+07
2012-01-02,8.764382e+00,1.159929e+06,5410.0,7.508830e+05,3.598932e+01,6.301249e+04,4.223035e+07
2012-01-03,9.340986e+00,1.159929e+06,5773.0,6.037982e+05,3.056013e+01,4.662806e+04,4.266257e+07
2012-01-04,8.879703e+00,1.159929e+06,5731.0,7.495462e+05,7.808277e+01,4.706558e+04,4.309479e+07
2012-01-05,8.476080e+00,1.159929e+06,6994.0,1.614569e+06,4.469720e+01,5.369470e+04,4.661373e+07
...,...,...,...,...,...,...,...
2021-03-27,1.728239e+08,2.186556e+13,288259.0,3.738512e+09,3.697781e+06,5.921297e+07,1.041476e+12
2021-03-28,1.728239e+08,2.186556e+13,248605.0,2.746380e+09,3.241129e+06,5.877202e+07,1.038887e+12
2021-03-29,1.641284e+08,2.186556e+13,303305.0,6.604751e+09,4.649751e+06,5.973204e+07,1.050932e+12
2021-03-30,1.586936e+08,2.186556e+13,321090.0,6.286007e+09,5.655379e+06,5.897879e+07,1.095385e+12


In [85]:
# Upsampling to 1min by interpolate
#all_data.set_index('Timestamp',inplace=True)
result = all_data.resample('1T').interpolate()

In [89]:
# Export to csv
result.to_csv("blockChain_10y_1min_interpolate.csv")