# This dataset contains information on historic trades for several cryptoassets, such as Bitcoin and Ethereum. Your challenge is to predict their future returns.

As historic cryptocurrency prices are not confidential this will be a forecasting competition using the time series API. Furthermore the public leaderboard targets are publicly available and are provided as part of the competition dataset. Expect to see many people submitting perfect submissions for fun. Accordingly, THE PUBLIC LEADERBOARD FOR THIS COMPETITION IS NOT MEANINGFUL and is only provided as a convenience for anyone who wants to test their code. The final private leaderboard will be determined using real market data gathered after the submission period closes.

train.csv - The training set

- timestamp - A timestamp for the minute covered by the row.
- Asset_ID - An ID code for the cryptoasset.
- Count - The number of trades that took place this minute.
- Open - The USD price at the beginning of the minute.
- High - The highest USD price during the minute.
- Low - The lowest USD price during the minute.
- Close - The USD price at the end of the minute.
- Volume - The number of cryptoasset u units traded during the minute.
- VWAP - The volume-weighted average price for the minute.
- Target - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
- Weight - Weight, defined by the competition hosts here
- Asset_Name - Human readable Asset name.

example_test.csv - An example of the data that will be delivered by the time series API.

example_sample_submission.csv - An example of the data that will be delivered by the time series API. The data is just copied from train.csv.

asset_details.csv - Provides the real name and of the cryptoasset for each Asset_ID and the weight each cryptoasset receives in the metric.

supplemental_train.csv - After the submission period is over this file's data will be replaced with cryptoasset prices from the submission period. In the Evaluation phase, the train, train supplement, and test set will be contiguous in time, apart from any missing data. The current copy, which is just filled approximately the right amount of data from train.csv is provided as a placeholder.

    📌 There are 14 coins in the dataset

    📌 There are 4 years in the [full] dataset


# Code

In [1]:
%matplotlib inline
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm

import time

In [2]:
# check if CUDA is available
use_cuda = torch.cuda.is_available()
print(use_cuda)

False


# Load data

In [4]:
pd.set_option('display.max_rows', None)
train_csv = pd.read_csv("train.csv")
print(len(train_csv))
test_csv = pd.read_csv("example_test.csv")
print(len(test_csv))
samples_submission_csv = pd.read_csv("example_sample_submission.csv")

24236806
56


# Memory review

In [6]:
train_csv.info(memory_usage = "deep")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24236806 entries, 0 to 24236805
Data columns (total 10 columns):
 #   Column     Dtype  
---  ------     -----  
 0   timestamp  int64  
 1   Asset_ID   int64  
 2   Count      float64
 3   Open       float64
 4   High       float64
 5   Low        float64
 6   Close      float64
 7   Volume     float64
 8   VWAP       float64
 9   Target     float64
dtypes: float64(8), int64(2)
memory usage: 1.8 GB


# Memory Size Reduction

In [14]:
for column in train_csv:
    print(column)
    if train_csv[column].dtype == 'float64':
        train_csv[column]=pd.to_numeric(train_csv[column], downcast='float')
    if train_csv[column].dtype == 'int64':
        train_csv[column]=pd.to_numeric(train_csv[column], downcast='integer')

timestamp
Asset_ID
Count
Open
High
Low
Close
Volume
VWAP
Target


In [15]:
train_csv.info(memory_usage = "deep")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24236806 entries, 0 to 24236805
Data columns (total 10 columns):
 #   Column     Dtype  
---  ------     -----  
 0   timestamp  int32  
 1   Asset_ID   int8   
 2   Count      float32
 3   Open       float32
 4   High       float32
 5   Low        float32
 6   Close      float32
 7   Volume     float32
 8   VWAP       float32
 9   Target     float32
dtypes: float32(8), int32(1), int8(1)
memory usage: 855.2 MB


# Dataset description

In [16]:
dtf_description = train_csv.describe()
dtf_description

Unnamed: 0,timestamp,Asset_ID,Count,Open,High,Low,Close,Volume,VWAP,Target
count,24236810.0,24236810.0,24236810.0,24236810.0,24236810.0,24236810.0,24236810.0,24236810.0,24236800.0,23486470.0
mean,1577120000.0,6.292544,286.4594,1432.64,1436.351,1429.568,1432.642,286852.9,,7.12175e-06
std,33233500.0,4.091861,867.3981,6029.605,6039.482,6020.261,6029.611,2433935.0,,0.005679042
min,1514765000.0,0.0,1.0,0.0011704,0.001195,0.0002,0.0011714,-0.3662812,-inf,-0.509351
25%,1549011000.0,3.0,19.0,0.26765,0.26816,0.2669,0.2676484,141.0725,0.2676368,-0.001694354
50%,1578372000.0,6.0,64.0,14.2886,14.3125,14.263,14.2892,1295.415,14.28769,-4.289844e-05
75%,1606198000.0,9.0,221.0,228.8743,229.3,228.42,228.8729,27297.64,228.8728,0.00160152
max,1632182000.0,13.0,165016.0,64805.95,64900.0,64670.53,64808.54,759755400.0,inf,0.9641699


In [4]:
train_csv

pandas.core.frame.DataFrame

In [5]:
train_csv.iloc[1:10]

Unnamed: 0,timestamp,Asset_ID,Count,Open,High,Low,Close,Volume,VWAP,Target
1,1514764860,0,5.0,8.53,8.53,8.53,8.53,78.38,8.53,-0.014399
2,1514764860,1,229.0,13835.194,14013.8,13666.11,13850.176,31.550062,13827.062093,-0.014643
3,1514764860,5,32.0,7.6596,7.6596,7.6567,7.6576,6626.71337,7.657713,-0.013922
4,1514764860,7,5.0,25.92,25.92,25.874,25.877,121.08731,25.891363,-0.008264
5,1514764860,6,173.0,738.3025,746.0,732.51,738.5075,335.987856,738.839291,-0.004809
6,1514764860,9,167.0,225.33,227.78,222.98,225.206667,411.896642,225.197944,-0.009791
7,1514764860,11,7.0,329.09,329.88,329.09,329.46,6.63571,329.454118,
8,1514764920,2,53.0,2374.553333,2400.9,2354.2,2372.286667,24.050259,2371.434498,-0.004079
9,1514764920,0,7.0,8.53,8.53,8.5145,8.5145,71.39,8.520215,-0.015875


In [7]:
test_csv

Unnamed: 0,timestamp,Asset_ID,Count,Open,High,Low,Close,Volume,VWAP,group_num,row_id
0,1623542400,3,1201.0,1.478556,1.48603,1.478,1.483681,654799.6,1.481439,0,0
1,1623542400,2,1020.0,580.306667,583.89,579.91,582.276667,1227.988,581.697038,0,1
2,1623542400,0,626.0,343.7895,345.108,343.64,344.598,1718.833,344.441729,0,2
3,1623542400,1,2888.0,35554.289632,35652.46465,35502.67,35602.004286,163.8115,35583.469303,0,3
4,1623542400,4,433.0,0.312167,0.3126,0.31192,0.312208,585577.4,0.312154,0,4
5,1623542400,5,359.0,4.83255,4.8459,4.8229,4.837583,47143.55,4.836607,0,5
6,1623542400,7,541.0,55.22308,55.494,55.182,55.34468,6625.202,55.298816,0,6
7,1623542400,6,2186.0,2371.194286,2379.2,2369.67,2374.380714,1214.129,2374.335307,0,7
8,1623542400,8,35.0,1.00315,1.0198,0.9873,1.0033,7061.928,1.002936,0,8
9,1623542400,9,560.0,161.933429,162.48,161.73,162.214714,1485.009,162.23131,0,9


In [8]:
samples_submission_csv

Unnamed: 0,group_num,row_id,Target
0,0,0,0
1,0,1,0
2,0,2,0
3,0,3,0
4,0,4,0
5,0,5,0
6,0,6,0
7,0,7,0
8,0,8,0
9,0,9,0


In [9]:
asset_csv = pd.read_csv("asset_details.csv")

In [10]:
asset_csv

Unnamed: 0,Asset_ID,Weight,Asset_Name
0,2,2.397895,Bitcoin Cash
1,0,4.304065,Binance Coin
2,1,6.779922,Bitcoin
3,5,1.386294,EOS.IO
4,7,2.079442,Ethereum Classic
5,6,5.894403,Ethereum
6,9,2.397895,Litecoin
7,11,1.609438,Monero
8,13,1.791759,TRON
9,12,2.079442,Stellar
