# Exploring Crypto Data Using the Coin Metrics Python API Client 
## Using Coin Metrics Community Network and Market Data

*February 17, 2022*

![CM](https://cdn.substack.com/image/fetch/w_96,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4430351a-a92c-4505-8c8f-3822d76715df_256x256.png)

This notebook provides some examples using the Coin Metrics' Python API Client analyzing both on-chain (network) data and market data. For more information on Coin Metrics, the API client, and general Python information, see the resources below and make sure to follow us on Twitter [@coinmetrics](https://twitter.com/coinmetrics) and check out our free [charting tools](https://charts.coinmetrics.io/network-data/).

## Resources

- The [Coin Metrics API v4](https://docs.coinmetrics.io/api/v4) website contains the full set of endpoints and data offered by Coin Metrics.
- The [API Spec](https://coinmetrics.github.io/api-client-python/site/api_client.html) contains a full list of functions, documentation, and the syntax for the Python API client.
- The [Coin Metrics Knowledge Base](https://docs.coinmetrics.io/info) provides a list of available assets, metrics, and what is available to Community and Pro users.
- The [Coin Metrics Data Encyclopedia](https://docs.coinmetrics.io/) gives detailed, conceptual explanations of the data and metrics that Coin Metrics offers.

## Coin Metrics Research 
- [Coin Metrics State of the Network](https://coinmetrics.substack.com/), our weekly data-driven newsletter highlighting on-chain (network) data
- [Coin Metrics State of the Market](https://coinmetrics.io/insights/state-of-the-market/), weekly report contextualizing the week’s crypto market movement
- [Other Original Research](https://coinmetrics.io/insights/original-research/) and Long-form Reports

## General Python / Data Science Resources
- [Kaggle Courses](https://www.kaggle.com/learn) on Python & Pandas 
- [Pandas Tutorials](https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html)

## Setup

In [None]:
from os import environ
import sys
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.image as image
from matplotlib.ticker import FuncFormatter
import matplotlib.dates as mdates
import plotly.express as px
import logging
from datetime import date, datetime, timedelta
from coinmetrics.api_client import CoinMetricsClient
import json
import logging
%matplotlib inline

In [None]:
# for plotting
sns.set_theme()
sns.set(rc={'figure.figsize':(12,8)})

In [None]:
logging.basicConfig(
    format='%(asctime)s %(levelname)-8s %(message)s',
    level=logging.INFO,
    datefmt='%Y-%m-%d %H:%M:%S'
)

In [None]:
# We recommend privately storing your API key in your local environment.
try:
    api_key = environ["CM_API_KEY"]
    logging.info("Using API key found in environment")
except KeyError:
    api_key = ""
    logging.info("API key not found. Using community client")

# Initialize the client
client = CoinMetricsClient(api_key)

## Example 1: Pulling Crypto Prices using Coin Metrics' Reference Rates

Data on crypto asset prices are incredibly important. CM's Reference Rates provide prices for over 300 crypto assets calculated in U.S. Dollars and Euros using a transparent and independent methodology, robust to manipulation and derived from high quality constituent markets.

The code below shows how to pull CM reference rates for a given asset and time period using the `get_asset_metrics` [endpoint.](https://coinmetrics.github.io/api-client-python/site/api_client.html#get_asset_metrics)

In [None]:
# Query API for bitcoin (BTC) prices, daily CM reference rates as dataframe
metrics    = "ReferenceRateUSD"
frequency  = "1d"
start_time = "2021-01-01"
end_time   = "2022-02-16"
asset      = ["btc"]

logging.info("Getting prices...")
df_prices = client.get_asset_metrics(assets    =asset,
                                     metrics   =metrics,
                                     frequency =frequency,
                                     start_time=start_time,
                                     end_time  =end_time).to_dataframe()
# Assign datatypes
df_prices["ReferenceRateUSD"] = df_prices["ReferenceRateUSD"].astype(float)
df_prices["time"] = pd.to_datetime(df_prices["time"])

In [None]:
# lets take a look at the data... note that the data will be as of midnight UTC
df_prices.head()

In [None]:
# plot
df_prices.plot(kind='line',x="time",y="ReferenceRateUSD",color="orange",linewidth=3,fontsize=20,xlabel="",ylabel="Price ($)", title="BTC Price (USD), Coin Metrics Reference Rate")

## Example 2: Getting Returns for Many Assets

One of the key benefits to using the API client is the ability to pull large amounts of data quickly and manipulate it efficiently. The example below shows how to pull reference rates for many more assets.

In [None]:
#Query API for prices, daily CM reference rates as dataframe
assets = ['1INCH','ADA', 'ALGO', 'ATOM', 'AVAX','AXS','BCH',
         'BNB','BSV','BTC','CELO','COMP','CRO','CRV','DASH','DOGE',
         'DOT','ETC','ETH','FIL','FTM','FTT','GRT','HBAR','HNT',
         'HT','LINK','LTC', 'LUNA','MANA','MATIC','MKR', 'NEAR',
         'SAND','SNX','SOL','SUSHI','TRX','UNI','XLM','XMR','XRP','XTZ','ZEC']

# Note if you want to see all assets that have a reference rate you can use the catalog endpoints, the lines below get all assets with a ref rate
# assets_refrate = client.catalog_metrics("ReferenceRateUSD")
# asset_with_ref_rates = assets_refrate[0]["frequencies"][0]["assets"]

metrics = "ReferenceRateUSD"
frequency = "1d"
start_time = "2021-01-01"
end_time = "2021-12-31"

logging.info("Getting prices...")
df_prices = client.get_asset_metrics(assets=assets,
                                     metrics=metrics,
                                     frequency=frequency,
                                     start_time=start_time,
                                     end_time=end_time).to_dataframe()
# Assign datatypes
df_prices["time"] = pd.to_datetime(df_prices.time)
df_prices["ReferenceRateUSD"] = df_prices.ReferenceRateUSD.astype(float)
# Sort on time and asset
df_prices.sort_values(["asset","time"],inplace=True)

In [None]:
df_prices.shape

In [None]:
# Manipulate data set, get returns from prices

# Reshape dataset so assets are in columns, date is row, and values are prices
df_prices_pivot = df_prices.pivot(index="time",
                                  columns="asset",
                                  values="ReferenceRateUSD")

# Create monthly and full year returns
df_prices_pivot["month"] = df_prices_pivot.index.month
monthly_returns = ((df_prices_pivot.groupby("month").last() / df_prices_pivot.groupby("month").first())-1)
year_returns     = ((df_prices_pivot.iloc[-1]/df_prices_pivot.iloc[0])-1)
monthly_returns.loc["Year"]=year_returns

The following heat map of returns by month in 2021 was used in [*State of the Network* Issue 134](https://coinmetrics.substack.com/p/state-of-the-network-issue-134) for our 2021 year in review summary.

In [None]:
sns.set(font_scale=1.9)
d = monthly_returns.transpose()    
d.columns = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","2021"]

plt.subplots(figsize=(24,20))
sns.heatmap(d,
            annot=True,
            linewidths=.5,
            cbar=False,
            cmap="RdYlGn",
            vmin=-1,
            vmax=1,
            fmt=",.0%")
plt.xticks(fontsize=30)
plt.yticks(fontsize=26)
plt.ylabel("")
plt.title("Selected Crypto Asset Returns by Month, 2021",fontsize=35,y=1.04,fontweight='bold')

g_fig = plt.gcf()
im = image.imread('https://cdn.substack.com/image/fetch/w_96,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4430351a-a92c-4505-8c8f-3822d76715df_256x256.png')
g_fig.figimage(im, 1320, 1130, zorder=3)

## Example 3: Analyzing the Bitcoin Mining Crackdown in China

The Python API client can be very helpful in pulling CM network data to research and add context to important events. In this example, we pull Bitcoin mining data and analyze the data around May 2021 when miners were fleeing China due to new regulations. This was one of the most significant developments in crypto in 2021, shifting the map on Bitcoin mining and giving the Bitcoin network a stress test to abrupt, wide-sweeping change.

![alt](https://miro.medium.com/max/1400/1*VtWg4MwkOsNlig3NVCGauQ.png)
*Photo by [郑 无忌](https://unsplash.com/@godslar)*

We again use the `get_asset_metrics` endpoint to pull `HashRate`, `HashRate30d` (30d Moving Average), and `DiffLast`.

Note that Hash Rate is the speed at which miners are solving hashes. Hash rate provides an estimate of the total computational resources being allocated to the proof-of-work network. Difficulty is a network parameter that sets how hard it is to find a new block. It adjusts roughly every 2 weeks (2,016 Bitcoin blocks) to target a 10-min block time.

In [None]:
metrics = ["HashRate","HashRate30d","DiffLast"]
frequency = "1d"
assets  = "btc"

logging.info("Getting mining data...")
df_btc_mining_data = client.get_asset_metrics(assets=assets, #note that omitting start/end time will return full history
                                              metrics=metrics,
                                              frequency=frequency).to_dataframe()

# Assign Data Types
df_btc_mining_data["time"]=pd.to_datetime(df_btc_mining_data.time)
df_btc_mining_data["HashRate"]=df_btc_mining_data["HashRate"].astype(float)
df_btc_mining_data["HashRate30d"]=df_btc_mining_data["HashRate30d"].astype(float)
df_btc_mining_data["DiffLast"]   = df_btc_mining_data["DiffLast"].astype(float)
# Sort on time
df_btc_mining_data.sort_values("time",inplace=True)

In [None]:
df_btc_mining_data.tail()

The graph below shows the significant drop in Bitcoin hash rate around the May 2021 China Crackdown.

In [None]:
def human_format(num,pos):
    """ Return human-readable suffixed data"""
    magnitude = 0
    while abs(num) >= 1000:
        magnitude += 1
        num /= 1000.0
    # add more suffixes if you need them
    return '%.0f%s' % (num, ['', 'K', 'M', 'B', 'T', 'P'][magnitude])

sns.set(font_scale=1)
sns.set(rc={'figure.figsize':(12,8)})
formatter = FuncFormatter(human_format)

fig,ax = plt.subplots(figsize=(12,8))

# Plot zoomed in 
plt.plot(df_btc_mining_data.time,df_btc_mining_data.HashRate,label="Hash Rate (1d)",linewidth=3,alpha=0.6)
plt.plot(df_btc_mining_data.time,df_btc_mining_data.HashRate30d, label="Hash Rate (30d Moving Avg.)",linewidth=3)
plt.xlim(datetime(2021,1,1), datetime(2021,12,31))
plt.ylim(.5e8,2e8)
plt.title("Bitcoin Hash Rate, 2021",fontsize=20)
plt.ylabel('TH/s',fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.yaxis.set_major_formatter(formatter)
plt.legend(fontsize=18)

# Create zoom-out plot
ax_new = fig.add_axes([1, 0.4, 0.4, 0.4]) # the position of zoom-out plot compare to the ratio of zoom-in plot 
plt.plot(df_btc_mining_data.time,df_btc_mining_data.HashRate)
plt.plot(df_btc_mining_data.time,df_btc_mining_data.HashRate30d)
plt.fill_between([datetime(2021,1,1), datetime(2021,12,31)],2.5e8,alpha=0.4,color='tan',label="2021")
plt.title("Bitcoin Hash Rate, 2009-2022")
ax_new.yaxis.set_major_formatter(formatter)
plt.ylabel('TH/s')
plt.legend(fontsize=16)

Note that Hash Rate is inferred from on-chain data and statistically noisy in short-term intervals, so it is always best for researchers to look at **moving averages**. For more information, check out Coin Metrics' own Lucas Nuzzi's article on estimating Hash Rate and the implications of the exodus of Bitcoin mining out of China [here.](https://medium.com/coinmetrics/bitcoin-miners-are-escaping-china-d3937e8f018c)
![*hash rate formula*](https://miro.medium.com/max/1400/1*TxF9YP2Vpq5WNFOSkw8AFg.png)
*hash rate formula*

In [None]:
# Get days where mining difficulty changed
df_btc_mining_data["DiffLast_lag"] = df_btc_mining_data["DiffLast"].shift(1) 
df_btc_mining_data["DiffLast_chg_pct"] = (df_btc_mining_data["DiffLast"] / df_btc_mining_data["DiffLast_lag"])-1
df_btc_difficulty_change_days = df_btc_mining_data[(df_btc_mining_data["DiffLast_chg_pct"] != 0)].iloc[1:].copy()

In [None]:
# Plot change in mining difficulty
plt.style.use('seaborn-whitegrid')
ax = df_btc_difficulty_change_days.plot(kind='line', x='time',y='DiffLast_chg_pct', label="% Change in Difficulty", figsize=(10, 7.5), linewidth=2, alpha=0.9, color=[247/255,147/255,26/255])

plt.title("Percent Change in Bitcoin Mining Difficulty",fontsize=20)
plt.xlabel("",fontsize=20)
plt.ylabel('',fontsize=20)
plt.ylim(-.4,.4)
plt.xlim(datetime(2010,1,1),datetime(2022,2,15))
plt.hlines(0,datetime(2010,1,1),datetime(2022,2,15),color='black')

plt.gca().set_yticklabels(['{:,.0f}%'.format(x*100) for x in plt.gca().get_yticks()])
plt.xticks(fontsize=15,rotation = 45)
plt.yticks(fontsize=15)

#biggest decline 
point = (datetime(2021,7,3),-.28)
circle_rad = 15  # This is the radius, in points
ax.plot(point[0], point[1], 'o',
        ms=circle_rad * 2, mec='r', mfc='none', mew=2)
ax.annotate('July 3, 2021\n28% Decline', xy=point, xytext=(30, 60),
            textcoords='offset points',
            color='black', size='large',
            arrowprops=dict(
                arrowstyle='simple,tail_width=0.3,head_width=0.8,head_length=0.8',
                facecolor='black', shrinkB=circle_rad * 1.2)
)

plt.gca().xaxis.grid(False)
plt.legend(fontsize=15)
plt.show()

# Example 4: Analyzing Market Candles of Listed Assets on Coinbase

The Coin Metrics Market Data Feed offers highly detailed data on specific exchanges and asset pairs. In 2021, we [launched](https://coinmetrics.io/coin-metrics-launches-a-market-data-community-offering/) a market data community offering covering many different types of market data including:

- Market trades, order books, quotes, open interest, funding rates, and liquidations are available for all markets for the past 24 hours
- Market candles are available for all markets with full history at daily frequency and the past 24 hours for sub-daily frequencies
- Reference rates are available for all assets in our coverage universe with full history at daily frequency and the past 24 hours for sub-daily frequencies
- CMBI index levels are available with full history at daily frequency and the past 24 hours for sub-daily frequencies
- CMBI constituents and weighting data is available at hourly frequency for the past 60 days

In the example below, we use the `get_market_candles` endpoint to analyze the breakdown of daily volume on Coinbase by asset.

In [None]:
candles_coinbase = client.get_market_candles(
    markets="coinbase-*-spot", # wildcards can be passed to get all asset pairs 
    start_time="2022-01-01",
    end_time="2022-02-15",
    frequency="1d"
).to_dataframe()
candles_coinbase["candle_usd_volume"] = candles_coinbase.candle_usd_volume.astype(float)
candles_coinbase["time"] = pd.to_datetime(candles_coinbase.time)
candles_coinbase.sort_values(["market","time"],inplace=True)

# Create Addt. Cols
candles_coinbase['exchange'] = candles_coinbase.market.apply(lambda x: x.split("-")[0])
candles_coinbase['exchange-base'] = candles_coinbase.market.apply(lambda x: x.split("-")[0]+"-"+x.split("-")[1])
candles_coinbase['market_type'] = candles_coinbase.market.apply(lambda x: x.split("-")[-1])
candles_coinbase['base'] = candles_coinbase.market.apply(lambda x: x.split("-")[1])
candles_coinbase['quote'] = candles_coinbase.market.apply(lambda x: x.split("-")[2])

In [None]:
candles_coinbase.tail()

In [None]:
# Get volume by base asset by day

# Get top 10 assets by volume
total_vol_by_base = candles_coinbase.groupby('base',as_index=False).candle_usd_volume.sum()
total_vol_by_base.sort_values(by="candle_usd_volume",inplace=True)
base_top_list = total_vol_by_base.tail(10).base.tolist()
candles_coinbase["base2"] = np.where(candles_coinbase.base.isin(base_top_list),candles_coinbase.base,f"{len(total_vol_by_base)-10} others")
    
# Get sum by base asset by day
df_vol_by_base = candles_coinbase.groupby(["time","base2"],as_index=False).candle_usd_volume.sum()
df_vol_by_base['total_vol'] = df_vol_by_base.groupby("time").candle_usd_volume.transform(sum)
df_vol_by_base.columns=["time","base_asset","vol","total_vol"]
df_vol_by_base["vol_pct"]=df_vol_by_base.vol/df_vol_by_base.total_vol
df_vol_by_base.sort_values(["base_asset","time"],inplace=True)

### Volume Broken Down by Asset, %

In [None]:
# Pivot back to assets in columns
df_vol_pivot = df_vol_by_base.pivot(index='time',
                                 columns="base_asset",
                                 values="vol_pct")

In [None]:
color_map = {"btc": "#ff9900","eth":  "#37367b",f"{len(total_vol_by_base)-10} others":"lightsteelblue","ada":  "palegoldenrod","link": "blue",
            "ltc":  "cyan","matic":"deeppink","sol":  "darkmagenta","xlm":  "dimgrey","bch":  "springgreen",
            "algo": "maroon","shib": "red","doge":"gold","etc": "darkgreen","xrp": "darkgrey",
             "grt": "indigo","mana":"teal","dot": "purple"}
top_assets  = df_vol_by_base[~df_vol_by_base.base_asset.isin(["btc","eth",f"{len(total_vol_by_base)-10} others"])].groupby("base_asset").vol.sum().sort_values(ascending=False).index.tolist()
top_assets  = ["btc","eth",f"{len(total_vol_by_base)-10} others"] + top_assets

fig = px.area(df_vol_pivot,color_discrete_map=color_map,category_orders={"base_asset":top_assets},title="Percentage of Daily USD Spot Volume on Coinbase by Asset")
fig.update_yaxes(tickformat="%",range=[0,1],title="% of Daily USD Spot Volume")
fig.update_xaxes(title="")
fig.show()

### Volume Broken Down by Asset, All USD

In [None]:
# Pivot back to assets in columns
df_vol_pivot = df_vol_by_base.pivot(index='time',
                                 columns="base_asset",
                                 values="vol")

In [None]:
color_map = {"btc": "#ff9900","eth":  "#37367b",f"{len(total_vol_by_base)-10} others":"lightsteelblue","ada":  "palegoldenrod","link": "blue",
            "ltc":  "cyan","matic":"deeppink","sol":  "darkmagenta","xlm":  "dimgrey","bch":  "springgreen",
            "algo": "maroon","shib": "red","doge":"gold","etc": "darkgreen","xrp": "darkgrey",
             "grt": "indigo","mana":"teal","dot": "purple"}
top_assets  = df_vol_by_base[~df_vol_by_base.base_asset.isin(["btc","eth",f"{len(total_vol_by_base)-10} others"])].groupby("base_asset").vol.sum().sort_values(ascending=False).index.tolist()
top_assets  = ["btc","eth",f"{len(total_vol_by_base)-10} others"] + top_assets

fig = px.bar(df_vol_pivot,color_discrete_map=color_map,category_orders={"base_asset":top_assets},title="Daily USD Spot Volume on Coinbase by Asset")
fig.update_yaxes(title="Daily USD Spot Volume")
fig.update_xaxes(title="")
fig.show()

With so much community data, there is plenty to work with. In the past members of the CM community have created dashboards ([BitcoinKPIs](http://www.bitcoinkpis.com/security)) and have used CM data in original research.

Our data might also be useful to journalists, academics, and more.

For any additional questions feel free to email us [here](https://coinmetrics.io/support/) or give us a shout on Twitter @coinmetrics. 