# Cryptolytic Arbitrage Model Evaluation and Selection

This notebook contains the code and analysis to select models with the best performance for the Cryptolytic project. You can find more information on data processing in this [notebook](https://github.com/Cryptolytic-app/cryptolytic/blob/master/modeling/arbitrage_data_processing.ipynb) and modeling in this [notebook](https://github.com/Cryptolytic-app/cryptolytic/blob/master/modeling/arbitrage_modeling.ipynb).

#### Background on Arbitrage Models
Arbitrage models were created with the goal of predicting arbitrage 10 min before it happens in an active crypto market. The models are generated by getting all of the combinations of 2 exchanges that support the same trading pair, engineering technical analysis features, merging that data on 'closing_time', engineering more features, and creating a target that signals an arbitrage opportunity. Arbitrage signals predicted by the model have a direction indicating which direction the arbitrage occurs in. A valid arbitrage signal is when the arbitrage lasts >30 mins because it takes time to move coins from one exchange to the other in order to successfully complete the arbitrage trades.

The models predict whether there will be an arbitrage opportunity that starts 10 mins after the prediction time and lasts for at least 30 mins, giving a user enough times to execute trades.

More than 6000+ iterations of models were generated in this notebook and the best ones were selected from each possible arbitrage combination based on model selection criteria outlined later in this section. The models were Random Forest Classifier and the best model parameters varied for each dataset. The data was obtained from the respective exchanges via their api, and we did a 70/30 train/test split on 5 min candlestick data that fell anywhere in the range from Jun 2015 - Oct 2019. There was a 2 week gap left between the train and test sets to prevent data leakage. The models return 0 (no arbitrage), 1 (arbitrage from exchange 1 to exchange 2) and -1 (arbitrage from exchange 2 to exchange 1). 

The profit calculation incorporated fees like in the real world. We used mean percent profit as the profitability metric which represented the average percent profit per arbitrage trade if one were to act on all trades predicted by the model in the testing period, whether those predictions were correct or not.

#### Model Evaluation Criteria
- ROC AUC score
- Precison
- Recall
- F1 Score
- Status
- Profit



#### Model Selection
From the 6000+ iterations of models trained, the best models were narrowed down based on the following criteria:
- How often the models predicted arbitrage when it didn't exist (False positives)
- How many times the models predicted arbitrage correctly (True positives)
- How profitable the model was in the real world over the period of the test set.

#### Results and Discussion

For each of the models, show a dataframe of the LR scores, default RF scores, and hyperparm tuned RF scores.


There were 21 models that met the thresholds for model selection critera (details of these models can be found at the end of this nb). The final models were all profitable with gains anywhere from 0.2% - 2.3% within the varied testing time periods (Note: the model with >9% mean percent profit was an outlier). Visualizations for how these models performed can be viewed at https://github.com/Lambda-School-Labs/cryptolytic-ds/blob/master/finalized_notebooks/visualization/arb_performance_visualization.ipynb


#### Directory Structure

```
├── cryptolytic/                        <-- The top-level directory for all arbitrage work
│   ├── modeling/                       <-- Directory for modeling work
│   │      ├──data/                     <-- Directory with subdirectories containing 5 min candle data
│   │      │   ├─ arb_data/             <-- Directory for csv files of arbitrage model training data
│   │      │   │   └── *.csv
│   │      │   │
│   │      │   ├─ csv_data/             <-- Directory for csv files after combining datasets and FE pt.2
│   │      │   │   └── *.csv
│   │      │   │
│   │      │   ├─ ta_data/              <-- Directory for csv files after FE pt.1 
│   │      │   │   └── *.csv
│   │      │   │
│   │      │   ├─ *.zip                 <-- ZIP files of all of the data
│   │      │   
│   │      ├──final_models/             <-- Directory for final models after model selection
│   │      │      └── *.pkl
│   │      │
│   │      ├──model_perf/               <-- Directory for performance csvs after training models
│   │      │      └── *.json
│   │      │
│   │      ├──models/                   <-- Directory for all pickle models
│   │      │      └── *.pkl
│   │      │
│   │      ├─arbitrage_data_processing.ipynb      <-- Notebook for data processing and creating csvs
│   │      │
│   │      ├─arbitrage_modeling.ipynb             <-- Notebook for baseline models and hyperparam tuning
│   │      │
│   │      ├─arbitrage_model_selection.ipynb      <-- Notebook for model selection
│   │      │
│   │      ├─arbitrage_model_evaluation.ipynb     <-- Notebook for final model evaluation
│   │      │
│   │      ├─environment.yml                      <-- yml file to create conda environment
│   │      │
│   │      ├─trade_recommender_models.ipynb       <-- Notebook for trade recommender models

```

## Imports

In [1]:
import glob
import os
import pickle
import json
import itertools
from zipfile import ZipFile
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import datetime as dt

from ta import add_all_ta_features

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import precision_score, recall_score, classification_report, roc_auc_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

## Data and Models

All the arbitrage datasets that will be used in modeling

In [2]:
arb_data_paths = glob.glob('data/arb_data/*.csv')
print(len(arb_data_paths))

95


In [4]:
pd.read_csv(arb_data_paths[1], index_col=0).head()

Unnamed: 0,open_exchange_1,high_exchange_1,low_exchange_1,close_exchange_1,base_volume_exchange_1,nan_ohlcv_exchange_1,volume_adi_exchange_1,volume_obv_exchange_1,volume_cmf_exchange_1,volume_fi_exchange_1,...,year,month,day,higher_closing_price,pct_higher,arbitrage_opportunity,window_length,arbitrage_opportunity_shift,window_length_shift,target
0,0.0063,0.0064,0.0063,0.0064,25.0,0.0,26.29509,0.0,1.0,0.0,...,2016,8,17,1,1.910828,-1,5,-1.0,40.0,-1
1,0.0064,0.0064,0.0064,0.0064,5.0,0.0,25.0,0.0,0.833333,0.0,...,2016,8,17,1,1.910828,-1,10,0.0,5.0,0
2,0.0064,0.0064,0.0064,0.0064,0.0,1.0,0.0,0.0,0.833333,-0.0,...,2016,8,17,1,1.910828,-1,15,0.0,10.0,0
3,0.0064,0.0064,0.0064,0.0064,0.0,1.0,0.0,0.0,0.833333,-0.0,...,2016,8,17,1,1.910828,-1,20,0.0,15.0,0
4,0.0064,0.0064,0.0064,0.0064,0.0,1.0,0.0,0.0,0.833333,0.0,...,2016,8,17,1,1.910828,-1,25,0.0,20.0,0


In [5]:
pkls = glob.glob('models/*.pkl')
len(pkls)

3

## Modeling Functions

In [5]:
features = ['close_exchange_1','base_volume_exchange_1', 
            'nan_ohlcv_exchange_1','volume_adi_exchange_1', 'volume_obv_exchange_1',
            'volume_cmf_exchange_1', 'volume_fi_exchange_1','volume_em_exchange_1', 
            'volume_vpt_exchange_1','volume_nvi_exchange_1', 'volatility_atr_exchange_1',
            'volatility_bbhi_exchange_1','volatility_bbli_exchange_1', 
            'volatility_kchi_exchange_1', 'volatility_kcli_exchange_1',
            'volatility_dchi_exchange_1','volatility_dcli_exchange_1',
            'trend_macd_signal_exchange_1', 'trend_macd_diff_exchange_1', 
            'trend_adx_exchange_1', 'trend_adx_pos_exchange_1', 
            'trend_adx_neg_exchange_1', 'trend_vortex_ind_pos_exchange_1', 
            'trend_vortex_ind_neg_exchange_1', 'trend_vortex_diff_exchange_1', 
            'trend_trix_exchange_1', 'trend_mass_index_exchange_1', 
            'trend_cci_exchange_1', 'trend_dpo_exchange_1', 'trend_kst_sig_exchange_1',
            'trend_kst_diff_exchange_1', 'trend_aroon_up_exchange_1',
            'trend_aroon_down_exchange_1', 'trend_aroon_ind_exchange_1',
            'momentum_rsi_exchange_1', 'momentum_mfi_exchange_1',
            'momentum_tsi_exchange_1', 'momentum_uo_exchange_1',
            'momentum_stoch_signal_exchange_1', 'momentum_wr_exchange_1', 
            'momentum_ao_exchange_1', 'others_dr_exchange_1', 'close_exchange_2',
            'base_volume_exchange_2', 'nan_ohlcv_exchange_2',
            'volume_adi_exchange_2', 'volume_obv_exchange_2',
            'volume_cmf_exchange_2', 'volume_fi_exchange_2',
            'volume_em_exchange_2', 'volume_vpt_exchange_2',
            'volume_nvi_exchange_2', 'volatility_atr_exchange_2',
            'volatility_bbhi_exchange_2', 'volatility_bbli_exchange_2',
            'volatility_kchi_exchange_2', 'volatility_kcli_exchange_2',
            'volatility_dchi_exchange_2', 'volatility_dcli_exchange_2',
            'trend_macd_signal_exchange_2',
            'trend_macd_diff_exchange_2', 'trend_adx_exchange_2',
            'trend_adx_pos_exchange_2', 'trend_adx_neg_exchange_2',
            'trend_vortex_ind_pos_exchange_2',
            'trend_vortex_ind_neg_exchange_2',
            'trend_vortex_diff_exchange_2', 'trend_trix_exchange_2',
            'trend_mass_index_exchange_2', 'trend_cci_exchange_2',
            'trend_dpo_exchange_2', 'trend_kst_sig_exchange_2',
            'trend_kst_diff_exchange_2', 'trend_aroon_up_exchange_2',
            'trend_aroon_down_exchange_2',
            'trend_aroon_ind_exchange_2',
            'momentum_rsi_exchange_2', 'momentum_mfi_exchange_2',
            'momentum_tsi_exchange_2', 'momentum_uo_exchange_2',
            'momentum_stoch_signal_exchange_2',
            'momentum_wr_exchange_2', 'momentum_ao_exchange_2',
            'others_dr_exchange_2', 'year', 'month', 'day',
            'higher_closing_price', 'pct_higher', 
            'arbitrage_opportunity', 'window_length']

#### Functions for calculating profit

In [6]:
# specifying arbitrage window length to target, in minutes
interval = 30

def get_higher_closing_price(df):
    """
    Returns the exchange with the higher closing price
    """
    # exchange 1 has higher closing price
    if (df['close_exchange_1'] - df['close_exchange_2']) > 0:
        return 1
    
    # exchange 2 has higher closing price
    elif (df['close_exchange_1'] - df['close_exchange_2']) < 0:
        return 2
    
    # closing prices are equivalent
    else:
        return 0

def get_close_shift(df, interval=interval):
    """
    Shifts the closing prices by the selected interval +
    10 mins.
    
    Returns a df with new features:
    - close_exchange_1_shift
    - close_exchange_2_shift
    """
    
    rows_to_shift = int(-1*(interval/5))
    
    df['close_exchange_1_shift'] = df['close_exchange_1'].shift(
        rows_to_shift - 2)
    
    df['close_exchange_2_shift'] = df['close_exchange_2'].shift(
        rows_to_shift - 2)
    
    return df

def get_profit(df):
    """
    Calculates the profit of an arbitrage trade.
    
    Returns df with new profit feature.
    """
    
    # if exchange 1 has the higher closing price
    if df['higher_closing_price'] == 1:
        
        # return how much money you would make if you bought 
        # on exchange 2, sold on exchange 1, and took account 
        # of 0.55% fees
        return (((df['close_exchange_1_shift'] / 
                 df['close_exchange_2'])-1)*100)-.55
    
    # if exchange 2 has the higher closing price
    elif df['higher_closing_price'] == 2:
        
        # return how much money you would make if you bought 
        # on exchange 1, sold on exchange 2, and took account 
        # of 0.55% fees
        return (((df['close_exchange_2_shift'] / 
                 df['close_exchange_1'])-1)*100)-.55
    
    # if the closing prices are the same
    else:
        return 0 # no arbitrage

def profit(X_test, y_preds):  
    # creating dataframe from test set to calculate profitability
    test_with_preds = X_test.copy()

    # add column with higher closing price
    test_with_preds['higher_closing_price'] = test_with_preds.apply(
            get_higher_closing_price, axis=1)

    # add column with shifted closing price
    test_with_preds = get_close_shift(test_with_preds)

    # adding column with predictions
    test_with_preds['pred'] = y_preds

    # adding column with profitability of predictions
    test_with_preds['pct_profit'] = test_with_preds.apply(
            get_profit, axis=1).shift(-2)

    # filtering out rows where no arbitrage is predicted
    test_with_preds = test_with_preds[test_with_preds['pred'] != 0]

    # calculating mean profit where arbitrage predicted...
    pct_profit_mean = round(test_with_preds['pct_profit'].mean(), 2)

    # calculating median profit where arbitrage predicted...
    pct_profit_median = round(test_with_preds['pct_profit'].median(), 2)
    
    return pct_profit_mean, pct_profit_median

#### Function for train/test split

In [14]:
def feat_n_params(pkl, filename):
    
    df = pd.read_csv(filename, index_col=0)
    
    if pkl.split('.')[0][-2:] in ['rf', 'lr']:
        params = {}
        features = df.drop(
            labels=['target', 'closing_time'], 
                axis=1).columns.to_list()
        print("no feat")

    else:
        params = pkl.split('/')[1].split('.')[0]
        params = params.split('_')[-3:]
        max_feat = params[0]
        max_depth = params[1]
        n_estimators = params[2]
        params = {
            'max_features': max_feat,
            'max_depth': max_depth,
            'n_estimators': n_estimators
        }
        features = df.drop(
            labels=['target', 'closing_time'], 
                axis=1).columns.to_list()
        # TODO: turn this into the .txt file
    return df, features, params

def tts(df, features):
    """
    Retrieve CSV
    Train/Test Split CSV
    Returns:
        X_train
        X_test
        y_train
        y_test
    
    """
  
    # change 'closing_time' to datetime
    df['closing_time'] = pd.to_datetime(df['closing_time'])
    
    target = 'target'
    
    ## train test split
    tt_split_row = round(len(df)*.82)
    tt_split_time = df['closing_time'][tt_split_row]
    cutoff_time = tt_split_time - dt.timedelta(days=14)

    # train and test subsets
    train = df[df['closing_time'] < cutoff_time]
    test = df[df['closing_time'] > tt_split_time]

    # X, y matrix
    X_train = train[features]
    X_test = test[features]
    y_train = train[target]
    y_test = test[target]
    print(X_test.columns.to_list()[0])
    
    return X_train, X_test, y_train, y_test


def predictions(pkl, X_test, y_test):
    
    with open(pkl, 'rb') as f:
        model = pickle.load(f)

    # make predictions
    y_preds = model.predict(X_test)

    return y_preds

def confusion_feat(y_test, y_preds):
    """
    
    """
    
    # labels for confusion matrix
    unique_y_test = y_test.unique().tolist()
    unique_y_preds = list(set(y_preds))
    labels = list(set(unique_y_test + unique_y_preds))
    labels.sort()
    columns = [f'Predicted {label}' for label in labels]
    index = [f'Actual {label}' for label in labels]

    # create confusion matrix
    conf_mat = pd.DataFrame(confusion_matrix(y_test, y_preds),
                             columns=columns, index=index)
    print(conf_mat, '\n')
    
    # Some models never predicted -1, some never predicted 1, and 
    # some never predicted 1 or -1, meaning that they never predicted
    # arbitrage at all. Each case needs to be handled with a conditional.
    # confusion matrix has -1, 0, 1 predictions
    if 'Predicted 1' in conf_mat.columns and 'Predicted -1' in conf_mat.columns:
        correct_arb_neg1 = conf_mat['Predicted -1'][0]
#         print('correct_arb_neg1', correct_arb_neg1)

        correct_arb_1 = conf_mat['Predicted 1'][2]
#         print('correct_arb_1', correct_arb_1)

        correct_arb = correct_arb_neg1 + correct_arb_1

    # confusion matrix has 0, 1 predictions
    elif 'Predicted 1' in conf_mat.columns:
        correct_arb_neg1 = 0
        correct_arb_1 = conf_mat['Predicted 1'][1]
#         print('correct_arb_1', correct_arb_1)
        correct_arb = correct_arb_neg1 + correct_arb_1


    # confusion matrix has -1, 0 predictions
    elif 'Predicted -1' in conf_mat.columns:
        correct_arb_neg1 = conf_mat['Predicted -1'][0]
#         print('correct_arb_neg1', correct_arb_neg1)
        correct_arb_1 = 0
        correct_arb = correct_arb_neg1 + correct_arb_1


    # confusion matrix has only 0
    else:
        correct_arb = 0
    
    return correct_arb

#### Function for performance metrics

In [15]:
def performance_metrics(pkls, features):
    """
    
    """
    # instantiate performance df
    columns = ['filename', 'model_id', 'parameters',
                'accuracy_score', 'mean_pct_profit',
                'precision', 'recall', 'f1_score',
                'support', 'correct_arb_preds']
    perf_df = pd.DataFrame(columns=columns)
    
    for pkl in pkls:
        
        # naming 
        file = '_'.join(pkl.split('/')[1].split('_')[:4])
        filepath = f'data/arb_data/{file}.csv'
        model_id = pkl.split('/')[1].split('.')[0]
        print('model_id:', model_id)
        
        # get features and parameters
        df, features, params = feat_n_params(pkl, filepath)
        
        # train/test split and predict
        X_train, X_test, y_train, y_test = tts(df, features)
        y_preds = predictions(pkl, X_test, y_test)
        
        # calculate stats
        pct_prof_mean, pct_prof_median = profit(X_test, y_preds)
        correct_arb_preds = confusion_feat(y_test, y_preds)
        cl_report = classification_report(y_test, y_preds, output_dict=True)
        print(classification_report(y_test, y_preds, output_dict=True))
        print(classification_report(y_test, y_preds))

        # append to perf_df
        perf_dict = {
            'filename': file,
            'model_id': model_id,
            'parameters': params,
            'accuracy_score': cl_report['accuracy'],
            'mean_pct_profit': pct_prof_mean,
            'precision': 0,
            'recall': 0,
            'f1_score': 0,
            'support': 0,
            'correct_arb_preds': correct_arb_preds
        }
        perf_df = perf_df.append(perf_dict, ignore_index=True)
        
    return perf_df, y_preds, y_test

arb_data_paths = glob.glob('data/arb_data/*.csv')
pkls = glob.glob('models/*.pkl')
perf_df, y_preds, y_test = performance_metrics(pkls, features)  

model_id: kraken_bitfinex_ltc_btc_lr
no feat
open_exchange_1
           Predicted -1  Predicted 0  Predicted 1
Actual -1             0            1            0
Actual 0              1         2484            0
Actual 1              0            6            0 

{'-1': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1}, '0': {'precision': 0.9971898835808912, 'recall': 0.9995975855130784, 'f1-score': 0.9983922829581993, 'support': 2485}, '1': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'accuracy': 0.9967897271268058, 'macro avg': {'precision': 0.33239662786029706, 'recall': 0.33319919517102614, 'f1-score': 0.33279742765273307, 'support': 2492}, 'weighted avg': {'precision': 0.9943887884022933, 'recall': 0.9967897271268058, 'f1-score': 0.9955878102532605, 'support': 2492}}
              precision    recall  f1-score   support

          -1       0.00      0.00      0.00         1
           0       1.00      1.00      1.00      2485
           1       

In [17]:
perf_df

Unnamed: 0,filename,model_id,parameters,accuracy_score,mean_pct_profit,precision,recall,f1_score,support,correct_arb_preds
0,kraken_bitfinex_ltc_btc,kraken_bitfinex_ltc_btc_lr,{},0.99679,-0.52,0,0,0,0,0
1,cbpro_hitbtc_bch_btc,cbpro_hitbtc_bch_btc_lr,{},0.996246,,0,0,0,0,0
2,kraken_hitbtc_ltc_btc,kraken_hitbtc_ltc_btc_auto_15_50,"{'max_features': 'auto', 'max_depth': '15', 'n...",0.998395,,0,0,0,0,0
3,kraken_hitbtc_ltc_btc,kraken_hitbtc_ltc_btc_rf,{},0.998395,,0,0,0,0,0
4,cbpro_bitfinex_bch_btc,cbpro_bitfinex_bch_btc_lr,{},0.989438,-0.01,0,0,0,0,2
5,cbpro_bitfinex_ltc_usd,cbpro_bitfinex_ltc_usd_lr,{},0.109485,0.06,0,0,0,0,6665
6,cbpro_hitbtc_dash_btc,cbpro_hitbtc_dash_btc_lr,{},0.000402,-0.4,0,0,0,0,1
7,cbpro_bitfinex_bch_btc,cbpro_bitfinex_bch_btc_rf,{},0.998538,9.4,0,0,0,0,67
8,cbpro_bitfinex_ltc_usd,cbpro_bitfinex_ltc_usd_rf,{},0.996847,3.09,0,0,0,0,6902
9,cbpro_hitbtc_dash_btc,cbpro_hitbtc_dash_btc_rf,{},0.002012,-0.4,0,0,0,0,1


## Model Evaluation and Selection 

In [None]:
# note 1 
# the modeling function should have parameters called
# export_preds and export_model that can be set to true or 
# false (default false) so that we can use that later in the 
# evaluation notebook to actually export the preds and models

# note 2
# the modeling function should have a parameter called filename 
# that takes a filename for performance csv otherwise it'll overwrite
# the original when we retrain after model evaluation

# After hyperparameter tuning ...

# 1
# Get the best performing models
    # import performance df
    # filter for:
        # - minimum number of arb
        # - minimum precision
        # - minimum recall
        # - minimum profit
    # sort by profit
    # save as top_models_df

# 2
# function to retrain best models and export preds csv 
#     - uses filename, model_label, and params from filtered perf df
#     - for each row in df:
#         - pass that info into the original function to retrain
#             this part will happen in the modeling function by 
#                 setting export_preds and export_model to true:
#              - merge X_test, y_test, and y_preds into a df
#              - export preds csv into a new folder data/arb_preds_test_data/
#                  - this needs to have some kind of naming convention 
#                      w/ model type bc we need to train models for all 3 sets
#              - export model into models/


# 3
# function to duplicate arb csv's of best models into a new folder
#      - for row in df, move csv to arb_best_data/


# 4
# download from sagemaker:
#     - all models
#     - all good arb csv
#     - all arb preds csv
#     - performance csv


# 5
# function to create visualization (for only one model set, 1 viz):
#         - takes the base csv_name for that model set and finds the 
#             3 matching csvs in arb_preds_test_data
#         - creates visualization that has 4 lines (trading 10K):
#             - cumulative value if holding bitcoin in that time period
#             - cumulative value if trading on arbitrage preds from best model
#             - cumulative value if trading on arbitrage preds from rf default
#             - cumulative value if trading on arbitrage preds from lr default
#         - display the visualization
#         - export the visualization into assets/visualizations/
#         - doesnt need to return anything

        
# 6       
# function to create the viz for all model sets:
#         - iterate through each row in performance df 
#             - define base model
#         - call visualization function for that base model