# Forecasting the Cryptocurrency Ecosystem with Deep Learning
In this notebook we will collect price history data for the top 100 coins by marketcap. We will chunk data into 2D tensors of 90x5 where 90 is number of days of input, and 5 is  the number of feature points per coin per day. Feature points are: ['Open', 'High', 'Low', 'Close', 'DayOfYear'].

From this time series data, our model will learn to predict how prices will change 1, 10, 30, and 90 days in the future. Our experiments include both an LSTM and a CNN implementation, both of which take the same data as input. 

We will do three types of price forecasting:
1. Binary Prediction: 
For each coin, predict if the price of that coin will be higher or lower in n days.
2. Linear Prediction:
Predict the actual price change of a coin at a future date.
3. Stochastic Forecasting: 
Produce a stochastic evaluation over all coins. That is, produce a vector of length equal to number of coins, whos elements are non-zero and sum to one. Each element of the vector is a scalar evaluation of a coin. This vector can be thought of as a confidence distribution over the coins, where the higher the evaluation, the higher the confidence of gain in price. This vector can also be used to simulate an investment portfolio, as a portfolio can be seen as a stochastiv vector of assets.

In [209]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
import pandas as pd
import xarray as xr
import seaborn as sns
import matplotlib.pyplot as plt
import pickle
import tqdm

DAYS_BLK  = 90 # Number of input days for learning
TARGETS   = [1, 10, 30, 90] # Number of days to predict in the future
TEST_DAYS = 30 # Number of most recent days to withold from the training set for testing

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 1. Data Acquisition and Cleaning

In [210]:
def compile_samples(coins, DAYS_BLK, TARGETS):
    # Compile a list of training examples, accompanied by target values for future prices
    train_samples = []
    for coin in tqdm.tqdm(coins):
        # Get all available price history for a given coin
        coin_data = coin.read_history()

        # Drop unneccessary columns
        coin_data = coin_data.drop(['Volume', 'Market Cap', 'Unnamed: 0'], 1) 

        # Reverse date ordering so we predict the future instead of the past
        coin_data['Date'] = pd.to_datetime(coin_data['Date'])
        coin_data = coin_data.sort_values(['Date']) 

        # Convert Date to Day of Year(DOY)
        coin_data['DOY'] = coin_data['Date'].apply(lambda x: x.dayofyear)
        coin_data = coin_data.drop(['Date'], 1)

        # Chunk coin history into input/target train pairs
        for t in range(len(coin_data) - DAYS_BLK - TARGETS[-1] - TEST_DAYS):

            # X is a 90 day chunk of shape: (90, 5)
            X = coin_data.iloc[t:t+DAYS_BLK].copy()
            # y is a list of future prices [1, 10, 30, 90] days after X
            y = [coin_data.iloc[t+DAYS_BLK+n]['Close'] for n in TARGETS]

            # Normalize prices relative to the final input close price
            price_cols = ['Open', 'High', 'Low', 'Close']
            current_price = X.values[-1][3]
            for col in price_cols:
                X[col] = X[col] / current_price

            # Normalize future prices by current price
            y = [future_price / current_price for future_price in y]

            # Normalize DOY column
            X['DOY'] = X['DOY'] / 365

            # Sample is ready
            train_samples.append((X.values, y))
    return train_samples

In [211]:
# Get a list of coin objects for the top 100 coins from coinmarketcap.com
from Scrapers.Coinmarketcap.coinmarketcap import CoinMarketcap
cmk = CoinMarketcap()
coins = cmk.coins()

In [212]:
# Cache samples because compiling them is timely
RECOMPILE_SAMPLES = True
if RECOMPILE_SAMPLES:
    samples = compile_samples(coins, DAYS_BLK, TARGETS)
    pickle.dump(samples, open( "dataset.pickle", "wb" ) )
else:
    try:
        print("Checking Cache...")
        samples = pickle.load(open('dataset.pickle','rb'))
        print("Cached samples loaded!")
    except FileNotFoundError:
        print("Cached samples not found. Fetching data and compiling samples...")
        samples = compile_samples(coins, DAYS_BLK, TARGETS)
        print("Caching samples...")
        pickle.dump(samples, open( "dataset.pickle", "wb" ) )
        print("Done!")

100%|██████████| 100/100 [01:03<00:00,  1.58it/s]


In [213]:
len(samples)

34926

## 2. Model Definitions

In [214]:
from keras.models import Sequential
from keras.layers import Conv1D, MaxPooling1D, Dense, Flatten

# Assert input shape
N = 90 # Number of Days
F = 5  # Number of Features: ['Open', 'High', 'Low', 'Close', 'DayOfYear']
assert((N, F) == samples[0][0].shape)

In [223]:
def CNN(forecast='binary'):
    # Define our Temporal Convolutional Model
    model = Sequential()
    model.add(Conv1D(filters=4, kernel_size=3, activation='selu', input_shape=[N,F]))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Conv1D(filters=4, kernel_size=3, activation='selu'))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Conv1D(filters=16, kernel_size=21, activation='selu'))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid')) # 'linear' for price prediction, 'softmax' for stochastic forecasting
    
    # Compile our Model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy', 'mae'])
    
    return model

## 3. Binary Prediction

In [230]:
# How many days into the future would you like to predict? Options: [1,10,30,90]
n_days = 30
target_index = TARGETS.index(n_days)

# Split samples into inputs and targets
X,y = zip(*samples)

# target[0] is the price 1 day after input
y = [target[0] for target in y]

# convert target to boolean. 0 means price went down, 1 means price went up
y = [int(target>1) for target in y]

In [231]:
model = CNN()
model.fit(np.array(X), np.array(y), epochs=10, batch_size=32, validation_split=0.2)

Train on 27940 samples, validate on 6986 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f655a616898>

In [216]:
model.metrics

['mae']

### Train and Test Set Creation

### Convulutional Network

### Long Short Term Network

## 4. Stochastic Forecasting

### Train and Test Set Creation