# AC209 Project: Predicting the returns on Cryptocurrencies

by Ali Dastjerdi, Angelina Massa, Sachin Mathur & Nate Stein

### Project Goal

To predict the price return of one cryptocurrency based on the returns of other cryptocurrencies; market data, including equity indices and other benchmarks; and cryptocurrency news.

### Work Load

1. Creating basic data pipelines and evaluating different model possibilities.
2. Using NLP techniques to convert unstructured news data concerning cryptocurrencies into features that can be fed into model.

### Cryptocurrency Scope

We focus on cryptocurrencies that have data going back to 2015 or earlier so we have more data points: btc, ltc, xrp, xlm, eth.

**SOUND FAIR?**

#### Earliest date data available
For top 10 cryptocurrencies in terms of market-cap, the following dates are the earliest dates closing price data is available:

    btc	4/28/2013
    ltc	4/28/2013
    xrp	8/4/2013
    xlm	8/5/2014
    eth	8/7/2015
    neo	9/9/2016
    miota	6/13/2017
    eos	7/1/2017
    bch	7/23/2017
    ada	10/1/2017

In [1]:
import crypto_utils as crypu
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import sys

from crypto_utils import print_update

In [2]:
# Custom output options.

np.set_printoptions(precision=3, suppress=True)
pd.set_option('display.precision', 3)
sns.set_style('whitegrid')
sns.set_context('paper')
plt.rcParams['figure.figsize'] = 8, 5 # default 6x4
%matplotlib inline

# EDA

In [3]:
# Load daily rolling-returns matrix.

CRYPTO_SCOPE = ['btc', 'ltc', 'xrp', 'xlm', 'eth']

tdelta = pd.Timedelta(days=1)  # for rolling daily returns
end_date = pd.to_datetime('3/31/2018')
crypto_df = crypu.load_returns_matrix (CRYPTO_SCOPE, end_date=end_date, 
                                       tdelta=tdelta, center=True, scale=True)

In [4]:
print('Data for individual currencies available as of:')
for crypto, min_date in crypu.crypto_min_dates.items():
    print('\t{0}: {1}'.format(crypto, min_date.strftime("%m/%d/%Y")))

crypto_df.head()

Data for individual currencies available as of:
	eth: 08/07/2015
	ltc: 04/28/2013
	btc: 04/28/2013
	xlm: 08/05/2014
	xrp: 08/04/2013


Unnamed: 0_level_0,btc,ltc,xrp,xlm,eth
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2015-08-08,-1.707,-1.426,0.326,-0.14,-9.351
2015-08-09,0.277,0.121,0.32,-0.386,-0.974
2015-08-10,-0.157,0.118,-0.148,0.416,0.011
2015-08-11,0.44,0.752,-0.266,-0.357,6.373
2015-08-12,-0.46,-0.725,-0.461,-0.161,1.673


# Prepare model data

#### Definition of rolling returns

We aim to One cryptocurrency's return will be the $Y$ value while the other currencies' returns will be the $X$.

In below example, `btc` is the cryptocurrency whose price return we try to predict. Therefore, one example of row in our training data would contain:
- Index: 3/30/2018
- $Y$: `btc`'s percentage change in `close` price from 3/29/2018 -> 3/30/2018.
- $X$: The other cryptocurrencies' percentage change in `close` price from 3/28/2018 -> 3/29/2018.

i.e., we want to use the rolling returns from the other cryptocurrencies on 3/29/2018 to predict the return of `btc` on the next day (3/30/2018).

For example, the price return for `eth` on original frame at 10/3/2017 was equal to $-0.329$. After shifting the returns forward one day in the following code, the price return for `eth` on 10/3/2017 becomes $-0.316$, the value for 10/2/2017 in the previous DF. 

In [5]:
y_currency = 'btc'  # crypto whose return we want to predict
x_currencies = [c for c in crypto_df.columns if c != y_currency]
crypto_df[x_currencies] = crypto_df[x_currencies].shift(periods=1, freq=tdelta)
crypto_df.dropna(axis=0, how='any', inplace=True)
crypto_df.head()

Unnamed: 0_level_0,btc,ltc,xrp,xlm,eth
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2015-08-09,0.277,-1.426,0.326,-0.14,-9.351
2015-08-10,-0.157,0.121,0.32,-0.386,-0.974
2015-08-11,0.44,0.118,-0.148,0.416,0.011
2015-08-12,-0.46,0.752,-0.266,-0.357,6.373
2015-08-13,-0.31,-0.725,-0.461,-0.161,1.673


## Replacing cryptocurrency returns with PCA components

In [6]:
# Placeholder

## Extracting news features using NLP

In [None]:
# Placeholder

## Separate train/test

Once all feature engineering has been finished, i.e., design matrix is ready.

In [7]:
x_vars = x_currencies
train_cutoff = pd.to_datetime('6/1/2017')  # chosen arbitrarily

crypto_df_train = crypto_df[crypto_df.index <= train_cutoff]
crypto_df_test = crypto_df[crypto_df.index > train_cutoff]

X_train, X_test = crypto_df_train[x_vars], crypto_df_test[x_vars]
y_train, y_test = crypto_df_train[y_currency], crypto_df_test[y_currency]

# Build Model

## Establish Baseline Models

In [8]:
from sklearn.linear_model import LassoCV, Lasso
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import RidgeCV, Ridge

In [9]:
# Define hyperparam space for models where we will optimize hyperparams.
cv_alph_lasso = np.linspace(1e-3, 1e-1, 5)
cv_alph_ridge = range(10, 60, 10)
N_CV = 3  # k-folds over which to cross-validate

# Fit models / optimize hyperparams.
print_update('Fitting LR...')
lr = LinearRegression().fit(X_train, y_train)
print_update('Fitting Ridge...')
ridge = RidgeCV(alphas=cv_alph_ridge, cv=N_CV).fit(X_train, y_train)
print_update('Fitting Lasso...')
lasso = LassoCV(alphas=cv_alph_lasso, cv=N_CV).fit(X_train, y_train)
print_update('Finished fitting baseline models.')

Finished fitting baseline models.

In [10]:
# Evaluate model performance on test set.

models = [(lr, 'LR'), (ridge, 'Ridge'), (lasso, 'Lasso')]
df_scores = pd.DataFrame(columns=['model', 'train', 'test'])
for (model, name) in models:
    print_update('Evaluating {}...'.format(name))
    train_score = model.score(X_train, y_train)
    test_score = model.score(X_test, y_test)
    new_row = {'model': name, 'train': train_score, 'test': test_score}
    df_scores = df_scores.append(new_row, ignore_index=True)
print_update('Finished evaluating baseline models.')
    
df_scores.sort_values('test', ascending=False, inplace=True)
display(df_scores)

Finished evaluating models.      

Unnamed: 0,model,train,test
2,Lasso,0.0,-0.0005211
1,Ridge,0.007,-0.01857
0,LR,0.007,-0.0207
