# Crypto forecasting


## Dataset

https://www.kaggle.com/c/g-research-crypto-forecasting

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import gc
%matplotlib inline

In [2]:
!ls ../data

asset_details.csv	       gresearch_crypto			  train.csv
example_sample_submission.csv  g-research-crypto-forecasting.zip
example_test.csv	       supplemental_train.csv


In [6]:
asset = pd.read_csv("../data/asset_details.csv")

In [7]:
asset

Unnamed: 0,Asset_ID,Weight,Asset_Name
0,2,2.397895,Bitcoin Cash
1,0,4.304065,Binance Coin
2,1,6.779922,Bitcoin
3,5,1.386294,EOS.IO
4,7,2.079442,Ethereum Classic
5,6,5.894403,Ethereum
6,9,2.397895,Litecoin
7,11,1.609438,Monero
8,13,1.791759,TRON
9,12,2.079442,Stellar


In [3]:
df = pd.read_csv("../data/train.csv")

In [4]:
df.head()

Unnamed: 0,timestamp,Asset_ID,Count,Open,High,Low,Close,Volume,VWAP,Target
0,1514764860,2,40.0,2376.58,2399.5,2357.14,2374.59,19.233005,2373.116392,-0.004218
1,1514764860,0,5.0,8.53,8.53,8.53,8.53,78.38,8.53,-0.014399
2,1514764860,1,229.0,13835.194,14013.8,13666.11,13850.176,31.550062,13827.062093,-0.014643
3,1514764860,5,32.0,7.6596,7.6596,7.6567,7.6576,6626.71337,7.657713,-0.013922
4,1514764860,7,5.0,25.92,25.92,25.874,25.877,121.08731,25.891363,-0.008264


We can see the different features included in the dataset. Specifically, the features included per asset are the following:

- timestamp: All timestamps are returned as second Unix timestamps (the number of seconds elapsed since 1970-01-01 00:00:00.000 UTC). Timestamps in this dataset are multiple of 60, indicating minute-by-minute data.
- Asset_ID: The asset ID corresponding to one of the crytocurrencies (e.g. Asset_ID = 1 for Bitcoin). The mapping from Asset_ID to crypto asset is contained in asset_details.csv.
- Count: Total number of trades in the time interval (last minute).
- Open: Opening price of the time interval (in USD).
- High: Highest price reached during time interval (in USD).
- Low: Lowest price reached during time interval (in USD).
- Close: Closing price of the time interval (in USD).
- Volume: Quantity of asset bought or sold, displayed in base currency USD.
- VWAP: The average price of the asset over the time interval, weighted by volume. VWAP is an aggregated form of trade data.
- Target: Residual log-returns for the asset over a 15 minute horizon.

In [5]:
df

Unnamed: 0,timestamp,Asset_ID,Count,Open,High,Low,Close,Volume,VWAP,Target
0,1514764860,2,40.0,2376.580000,2399.500000,2357.140000,2374.590000,1.923301e+01,2373.116392,-0.004218
1,1514764860,0,5.0,8.530000,8.530000,8.530000,8.530000,7.838000e+01,8.530000,-0.014399
2,1514764860,1,229.0,13835.194000,14013.800000,13666.110000,13850.176000,3.155006e+01,13827.062093,-0.014643
3,1514764860,5,32.0,7.659600,7.659600,7.656700,7.657600,6.626713e+03,7.657713,-0.013922
4,1514764860,7,5.0,25.920000,25.920000,25.874000,25.877000,1.210873e+02,25.891363,-0.008264
...,...,...,...,...,...,...,...,...,...,...
24236801,1632182400,9,775.0,157.181571,157.250000,156.700000,156.943857,4.663725e+03,156.994319,
24236802,1632182400,10,34.0,2437.065067,2438.000000,2430.226900,2432.907467,3.975460e+00,2434.818747,
24236803,1632182400,13,380.0,0.091390,0.091527,0.091260,0.091349,2.193732e+06,0.091388,
24236804,1632182400,12,177.0,0.282168,0.282438,0.281842,0.282051,1.828508e+05,0.282134,


In [8]:
btc = df[df.Asset_ID == 1].reset_index(drop=True)

In [10]:
del df
gc.collect()

66

In [11]:
btc.head()

Unnamed: 0,timestamp,Asset_ID,Count,Open,High,Low,Close,Volume,VWAP,Target
0,1514764860,1,229.0,13835.194,14013.8,13666.11,13850.176,31.550062,13827.062093,-0.014643
1,1514764920,1,235.0,13835.036,14052.3,13680.0,13828.102,31.046432,13840.362591,-0.015037
2,1514764980,1,528.0,13823.9,14000.4,13601.0,13801.314,55.06182,13806.068014,-0.010309
3,1514765040,1,435.0,13802.512,13999.0,13576.28,13768.04,38.780529,13783.598101,-0.008999
4,1514765100,1,742.0,13766.0,13955.9,13554.44,13724.914,108.501637,13735.586842,-0.008079


In [12]:
pd.to_datetime(btc.timestamp.head())

0   1970-01-01 00:00:01.514764860
1   1970-01-01 00:00:01.514764920
2   1970-01-01 00:00:01.514764980
3   1970-01-01 00:00:01.514765040
4   1970-01-01 00:00:01.514765100
Name: timestamp, dtype: datetime64[ns]