In [296]:
%load_ext autoreload

import utils as ut
import plot_tools as plt
import stat_tools as st
import pandas as pd
import plotly

%autoreload 2

plotly.offline.init_notebook_mode(connected=True)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Notes

The data is extracted from the Binance API.

https://github.com/binance-exchange/binance-official-api-docs/blob/master/rest-api.md#klinecandlestick-data

Connection is made via the binance.py script, which enables local dump of the data

The data precision is 1 minute

## Study

In [297]:
dataPath = "/home/charbel/Documents/Stanford/Project/data/"

In [298]:
data = ut.load_data(dataPath, "2019-01-01", "2019-06-30")
##PAXUSDT is a tether, we remove it
data = data[data.ticker != "PAXUSDT"]

100%|██████████| 181/181 [00:01<00:00, 115.80it/s]


In [299]:
##Indexing the data
data['close_time'] += 1
ut.index(data)

In [300]:
## Columns 
cols = ['close_time', 'ticker','close','volume_quote','buy_vol_quote','num_trades']
data = data[cols]
data.rename(columns = {'close_time':'t', 'close':'price', 'volume_quote':'volume', 'buy_vol_quote':'buy_vol'}, inplace=True)
data.eval('sell_vol = volume - buy_vol', inplace=True)

In [301]:
## Selecting tickers available from a certain date
start_dates = pd.to_datetime(data.groupby('ticker').first().t, unit='ms').sort_values()
tickers = start_dates[start_dates<='2019-01-02'].index
data.query('ticker in @tickers', inplace=True)

In [302]:
##Just to check if duplicates (API seems to send duplicates when data not available)
#t=data.groupby(['datetime','ticker']).count()
#t[t>=2].dropna()

In [303]:
data = data.drop_duplicates()

We compute the returns for 5 min, 1 hour, and 12 hours (different timescales)

In [305]:
st.compute_return(data, 'price', 'r5', 5)
st.compute_return(data, 'price', 'r60', 60)
st.compute_return(data, 'price', 'r720', 720)

#### Cross-sectional correlations

- The returns become more correlated when the lookback is bigger (as expected)
- The binance coin (BNB) seems more isolated from the other coins

In [306]:
plt.plot_corr_heatmap(st.cross_correlation(data, 'r5'))
plt.plot_corr_heatmap(st.cross_correlation(data, 'r60'))
plt.plot_corr_heatmap(st.cross_correlation(data, 'r720'))

#### Lagged-sectional correlations

- Lagged returns (5min/1H/12H) on x-axis
- Blue diagonal represents self mean reversion
- No particular Bitcoin lead lag effect on smaller cryptocurrencies

In [307]:
plt.plot_corr_heatmap(st.lagged_correlation(data, 'r5', 5), False)
plt.plot_corr_heatmap(st.lagged_correlation(data, 'r60', 60), False)
plt.plot_corr_heatmap(st.lagged_correlation(data, 'r720', 720), False)

#### PCA

These heatmaps represent the principal components of the cross-sectional returns

The x-axis is labeled as the cumulative percentage of variance predited

- First component is stronger for longer lookbacks (in line with the higher observed correlations)
- BNB coin singularity can be seen in the second component
- Approximately equal weighting on the first component (market)

In [311]:
plt.plot_pca(data, 'r5')
plt.plot_pca(data, 'r60')
plt.plot_pca(data, 'r720')

In [312]:
##Cross sectional centering of the returns
st.cross_center(data, 'r5')
st.cross_center(data, 'r60')
st.cross_center(data, 'r720')

#### Residual reversion

We now center our returns cross-sectionally:

- We take the nonweighted average of returns per timestamp (the market return)
- We substract it from the individual returns

The lagged correlation plot of the residuals is shown below

- Much clearer self mean reversion on this plot for 5m and 1H time frame
- We start to see more clearly the leadlag of BTC, especially on ETH, LTC (GDAX exchange?)

In [314]:
plt.plot_corr_heatmap(st.lagged_correlation(data, 'r5-c', 5), False)
plt.plot_corr_heatmap(st.lagged_correlation(data, 'r60-c', 60), False)
plt.plot_corr_heatmap(st.lagged_correlation(data, 'r720-c', 720), False)