# Data Source comparison
This Jupyter notebook aims to serve as a comparative analysis of two well-known sources of cryptocurrency price data: **CoinMetrics** and **CoinGecko**. The primary goal is to assess and compare the reliability, accuracy and consistency of the data provided by these two platforms.

## Data Sources

1. **CoinMetrics**: CoinMetrics offers a daily update of their data through a [GitHub repository](https://github.com/coinmetrics/data/tree/master), organized into CSV files. We will be fetching and processing this data for our comparative analysis.

2. **CoinGecko**: CoinGecko provides a comprehensive suite of information about several cryptocurrencies through their API. We'll be using their ["coingecko-api"](https://www.coingecko.com/en/api/documentation?) to fetch the necessary data.

In [1]:
from portfolio_optimization.data_processing import *
from portfolio_optimization.data_collection import *
from datetime import datetime, timezone, timedelta
from tokens.get_assets import *
import pandas as pd
import numpy as np

Fetching data from CoinMetrics is very straightforward as it is included as a submodule of the current repository.

In [2]:
asset_list = ["btc"]
_df = get_historical_prices_for_assets(asset_list, time_range=timedelta(days=365 * 3), interested_columns=["ReferenceRate", "CapMrktEstUSD"])

# Filter out all columns containing `_` in their name
df = _df.loc[:, ~_df.columns.str.contains("_")]

# Get all the market caps
mcaps = _df.loc[:, _df.columns.str.contains("CapMrktEstUSD")]
mcaps.columns = mcaps.columns.str.replace("_CapMrktEstUSD", "")
mcaps.replace(np.nan, 0, inplace=True)
# Print comprehensive statistical summary
df

  df = pd.read_csv(file)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mcaps.replace(np.nan, 0, inplace=True)


Unnamed: 0,btc
2020-08-15,11774.408252
2020-08-16,11866.910361
2020-08-17,11899.642754
2020-08-18,12315.762417
2020-08-19,11992.695996
...,...
2023-08-11,29430.159912
2023-08-12,29399.784740
2023-08-13,29415.733902
2023-08-14,29286.117611


Here, we'll use the CoinGecko python package to fetch the prices

In [3]:
from pycoingecko import CoinGeckoAPI

# Connect to CoinGecko API
cg = CoinGeckoAPI()

# Get historical data for BTC
data = cg.get_coin_market_chart_by_id(id='bitcoin', vs_currency='usd', days='max')

# Prepare the data for DataFrame
price_data = data['prices']
df_gecko = pd.DataFrame(price_data, columns=['time', 'price'])
df_gecko['time'] = pd.to_datetime(df_gecko['time'], unit='ms')  # convert the timestamp data to datetime
df_gecko.set_index('time', inplace=True)  # set the datetime as index
# Rename the column to BTC_gecko
df_gecko.rename(columns={'price': 'BTC_gecko'}, inplace=True)

df_gecko

Unnamed: 0_level_0,BTC_gecko
time,Unnamed: 1_level_1
2013-04-28 00:00:00,135.300000
2013-04-29 00:00:00,141.960000
2013-04-30 00:00:00,135.300000
2013-05-01 00:00:00,117.000000
2013-05-02 00:00:00,103.430000
...,...
2023-10-08 00:00:00,27977.543491
2023-10-09 00:00:00,27948.103652
2023-10-10 00:00:00,27593.782534
2023-10-11 00:00:00,27392.247703


In [4]:
# Merge the two dataframes
df_merged = pd.merge(df, df_gecko, left_index=True, right_index=True)
df_merged

Unnamed: 0,btc,BTC_gecko
2020-08-15,11774.408252,11777.391322
2020-08-16,11866.910361,11864.905810
2020-08-17,11899.642754,11901.776488
2020-08-18,12315.762417,12272.465808
2020-08-19,11992.695996,11949.610971
...,...,...
2023-08-11,29430.159912,29423.818916
2023-08-12,29399.784740,29396.847971
2023-08-13,29415.733902,29412.142275
2023-08-14,29286.117611,29284.969714


In [5]:
# Add a column for the difference between the two prices
df_merged['diff'] = df_merged['btc'] - df_merged['BTC_gecko']
df_merged['diff_pct'] = df_merged['diff'] / df_merged['BTC_gecko']
df_merged

Unnamed: 0,btc,BTC_gecko,diff,diff_pct
2020-08-15,11774.408252,11777.391322,-2.983071,-0.000253
2020-08-16,11866.910361,11864.905810,2.004551,0.000169
2020-08-17,11899.642754,11901.776488,-2.133735,-0.000179
2020-08-18,12315.762417,12272.465808,43.296608,0.003528
2020-08-19,11992.695996,11949.610971,43.085026,0.003606
...,...,...,...,...
2023-08-11,29430.159912,29423.818916,6.340996,0.000216
2023-08-12,29399.784740,29396.847971,2.936768,0.000100
2023-08-13,29415.733902,29412.142275,3.591628,0.000122
2023-08-14,29286.117611,29284.969714,1.147897,0.000039


In [6]:
# Statistics
df_merged.describe()

Unnamed: 0,btc,BTC_gecko,diff,diff_pct
count,1096.0,1096.0,1096.0,1096.0
mean,32516.631942,32541.806866,-25.174925,-0.000606
std,14148.227447,14171.021457,108.827881,0.002651
min,10108.63545,10125.014956,-819.013038,-0.017555
25%,20570.863198,20587.141054,-50.873286,-0.001596
50%,29785.959531,29810.24593,-9.593433,-0.000399
75%,42816.358362,42972.470309,9.241147,0.000404
max,67541.755508,67617.015545,967.592898,0.01935


# Conclusion

After conducting a meticulous comparison between the Bitcoin (BTC) price data from **CoinMetrics** and **CoinGecko** over a span of 1096 days, we can confidently endorse CoinMetrics as a reliable data source for cryptocurrency prices.

The average BTC prices reported by CoinGecko were slightly higher than CoinMetrics, resulting in an average difference of approximately -25.17. This ultimately corresponds to a very minor percentage difference of roughly -0.06%. While the maximum observed difference did reach up to 967.59, it's crucial to recognize that such instances represent outliers.

A vast majority of the time, the differences in reported price remain relatively marginal (as seen by the 75% percentile data staying within approximately +/- 50), indicating CoinMetrics' data aligns quite closely with CoinGecko's. The minor discrepancies identified are immaterial for all but the most precise applications.

In conclusion, CoinMetrics has proved itself to be a credible and reliable source for BTC price data. It matches up well with the data from CoinGecko, a well-regarded resource in the cryptocurrency space. While always import to be attuned to potential minor discrepancies, users can certainly rely on CoinMetrics for accurate and dependable cryptocurrency data.