# In this notebook we try to predict the labels of the test set with our models:

In [1]:
import pandas as pd
import numpy as np
import lightgbm as lgb
import seaborn as sns
import pickle
import tensorflow as tf
from tensorflow import keras

# Importing Testing class:
from TestingPipeline.TestModel import *

## Loading Data:

In [2]:
test_data = pd.read_csv('Processed_data/X_test_preprocessed.csv')

In [3]:
test_traders = test_data['Trader']
X_test       = test_data.drop(columns = 'Trader')

In [4]:
X_test.head()

Unnamed: 0,OTR,OCR,OMR,10_p_time_two_events,med_time_two_events,25_p_time_two_events,75_p_time_two_events,90_p_time_two_events,max_time_two_events,min_lifetime_cancel,med_lifetime_cancel,90_p_lifetime_cancel,NbTradeVenueMic,MaxNbTradesBySecond,MeanNbTradesBySecond,mean_dt_TV1,NbSecondWithAtLeatOneTrade,Nber_shares_same_day
0,0.897059,0.014706,0.0,6.1e-05,0.00387,0.000196,0.340629,4.720414,751.59094,283.955,283.955,283.955,1,19,3.8125,514.483186,16,1
1,0.6,0.2,0.0,0.00197,0.005786,0.00197,0.008854,270.50568,270.50568,270.5223,270.5223,270.5223,1,3,3.0,0.004,1,1
2,0.918919,0.027027,0.0,0.0,0.000485,7e-05,2.182102,250.33145,272.0694,272.0694,272.0694,272.0694,1,13,5.666667,124.1945,6,1
3,0.444444,0.222222,0.0,0.0,0.004708,0.001081,230.00269,389.48483,389.48483,230.00269,230.00269,389.48483,1,3,2.0,0.261333,2,1
4,0.789474,0.052632,0.0,6.3e-05,0.000152,6.5e-05,0.339079,129.88925,549.25635,237.76761,237.76761,237.76761,1,6,3.0,4621.266333,5,1


## Testing on best gbm model:

In [5]:
clf = pickle.load(open("models/18_features_gridsearch.pkl", "rb"))

In [6]:
y_pred_test = clf.predict(X_test)

### Saving predictions in a csv:

In [7]:
testclf = TestModel(traders=test_traders, preds=y_pred_test, threshold=0, foldername="Predictions/Predictions_18_features_gridsearch2.csv")
testclf.CreatePredCSV()

Creating the Dataframe of predictions:
                 Trader  type
85299  The Magic Mirror    -1
85300  The Magic Mirror    -1
85301  The Magic Mirror    -1
85302  The Magic Mirror    -1
85303  The Magic Mirror    -1


Predicting value for each trader based on a majority vote:
              Trader  type
80           Monstro    -1
81           Morgana    -1
82      The Doorknob     1
83       The Doorman     1
84  The Magic Mirror    -1


Converting the predictions to string value:
              Trader type
80           Monstro  MIX
81           Morgana  MIX
82      The Doorknob  HFT
83       The Doorman  HFT
84  The Magic Mirror  MIX


Saving them to Predictions/Predictions_18_features_gridsearch2.csv


#### Shocasing the new labels of the flipped traders:

In [8]:
testclf.Fpreds[testclf.Fpreds['Trader'].str.contains("Armoire|Axe|Bookseller|Monstro|The Doorknob", na=False)]

Unnamed: 0,Trader,type
7,Armoire,MIX
9,Axe,MIX
23,Bookseller,HFT
80,Monstro,MIX
82,The Doorknob,HFT


In [9]:
#testclf.Ipreds.groupby('Trader').type.apply(lambda x: (x == 0).mean())

## Testing our gbm model on reduced features test data:

### Loading model:

In [9]:
clf2 = pickle.load(open("models/10_features_gridsearch.pkl", "rb"))

In [10]:
X_test_reduced = X_test.iloc[:, [0,1,2,9,17,12,7,8,10,11]]

In [11]:
y_pred_test2 = clf2.predict(X_test_reduced)
y_pred_test2

array([ 0,  0,  0, ..., -1, -1, -1])

### Saving predictions in a csv:

In [13]:
testclf = TestModel(traders=test_traders, preds=y_pred_test2, threshold=0, foldername="Predictions/Predictions_10_features_gridsearch.csv")
testclf.CreatePredCSV()

Creating the Dataframe of predictions:
                 Trader  type
85299  The Magic Mirror    -1
85300  The Magic Mirror    -1
85301  The Magic Mirror    -1
85302  The Magic Mirror    -1
85303  The Magic Mirror    -1


Predicting value for each trader based on a majority vote:
              Trader  type
80           Monstro     0
81           Morgana    -1
82      The Doorknob     1
83       The Doorman     1
84  The Magic Mirror    -1


Converting the predictions to string value:
              Trader     type
80           Monstro  NON HFT
81           Morgana      MIX
82      The Doorknob      HFT
83       The Doorman      HFT
84  The Magic Mirror      MIX


Saving them to Predictions/Predictions_10_features_gridsearch.csv


### Predictions of the best random forest model:

In [14]:
bestRF = pickle.load(open("models/best_3class_estimator.pkl", "rb"))



In [15]:
y_pred_test3 = bestRF.predict(X_test)

### Saving predictions in a csv:

In [16]:
testclf = TestModel(traders=test_traders, preds=y_pred_test3, threshold=0.85, foldername="Predictions/Predictions_randomforest_doubleflipped.csv")
testclf.CreatePredCSV()

Creating the Dataframe of predictions:
                 Trader  type
85299  The Magic Mirror    -1
85300  The Magic Mirror    -1
85301  The Magic Mirror    -1
85302  The Magic Mirror    -1
85303  The Magic Mirror    -1


Predicting value for each trader based on a majority vote:
Flipped Armoire
Flipped Axe
Flipped Bookseller
Flipped Dijon the Thief
Flipped Monstro
Flipped The Doorknob
              Trader  type
80           Monstro     1
81           Morgana    -1
82      The Doorknob    -1
83       The Doorman     1
84  The Magic Mirror    -1


Converting the predictions to string value:
              Trader type
80           Monstro  HFT
81           Morgana  MIX
82      The Doorknob  MIX
83       The Doorman  HFT
84  The Magic Mirror  MIX


Saving them to Predictions/Predictions_randomforest_doubleflipped.csv


#### Shocasing the new labels of the flipped traders:

In [17]:
testclf.Fpreds[testclf.Fpreds['Trader'].str.contains("Armoire|Axe|Bookseller|Dijon the Thief|Monstro|The Doorknob", na=False)]

Unnamed: 0,Trader,type
7,Armoire,HFT
9,Axe,HFT
23,Bookseller,MIX
38,Dijon the Thief,HFT
80,Monstro,HFT
82,The Doorknob,MIX


## Testing the MLP

In [18]:
loaded_model = keras.models.load_model("models/MLP_Batchnorm_2Hidden")

In [19]:
y_pred_test3 = loaded_model.predict(X_test)
y_pred_test3

array([[ 0.3257315 ,  1.1209224 ,  0.18258204],
       [ 0.16294184,  0.85154235,  0.7588287 ],
       [ 0.41316128,  0.9337405 ,  0.4397935 ],
       ...,
       [ 6.3699093 , -1.6792765 ,  0.60651857],
       [ 3.789208  , -5.281334  ,  1.0269186 ],
       [ 5.331189  , -5.0209804 , -0.17655973]], dtype=float32)

In [20]:
testmod = TestModel(traders=test_traders, preds=y_pred_test2, threshold=None, foldername="Predictions/Predictions_MLP_drop_batchnorm.csv")
testmod.CreatePredCSV()

Creating the Dataframe of predictions:
                 Trader  type
85299  The Magic Mirror    -1
85300  The Magic Mirror    -1
85301  The Magic Mirror    -1
85302  The Magic Mirror    -1
85303  The Magic Mirror    -1


Predicting value for each trader based on a majority vote:
              Trader  type
80           Monstro     0
81           Morgana    -1
82      The Doorknob     1
83       The Doorman     1
84  The Magic Mirror    -1


Converting the predictions to string value:
              Trader     type
80           Monstro  NON HFT
81           Morgana      MIX
82      The Doorknob      HFT
83       The Doorman      HFT
84  The Magic Mirror      MIX


Saving them to Predictions/Predictions_MLP_drop_batchnorm.csv
