# Simple Predictor for Jane-Street Market competition

Here I show how to build a very simple neural network model in Pytorch and train it on a GPU.

Pre-processing of data is done as described in [this notebook](https://www.kaggle.com/andreasthomasen/preprocessing-and-feature-selection). The main difference is that we only do PCA here and retain a lot of features. The reason is that we do not use RNNs, but instead only rely on instantaneous feature values. So this model can be trained with quite a lot of features included.

If you read this notebook from start to finish, you will learn how to
* Load data into pandas
* Do feature reduction using PCA
* Define a neural network model in pytorch
* Train the model and save it using pickle

Thanks for reading, if you like it, feel free to copy it. Nothing revolutionary in this notebook.

UPDATE: Including the training step, it took too long to run this notebook for submission. So instead it now saves the model at the end. You can run it later in a private submission.
Enjoy!

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

if torch.cuda.is_available():
    dev = torch.device("cuda")
else:
    dev = torch.device("cpu")

import pickle
    
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


/kaggle/input/jane-street-market-prediction/example_sample_submission.csv
/kaggle/input/jane-street-market-prediction/features.csv
/kaggle/input/jane-street-market-prediction/example_test.csv
/kaggle/input/jane-street-market-prediction/train.csv
/kaggle/input/jane-street-market-prediction/janestreet/competition.cpython-37m-x86_64-linux-gnu.so
/kaggle/input/jane-street-market-prediction/janestreet/__init__.py


# Load data and reduce dimensions

In [2]:
train = pd.read_csv('/kaggle/input/jane-street-market-prediction/train.csv')
batch_size = len(train)

In our predictive modeling we will train the network to return the action. Let's create columns for them.

In [3]:
train['act'] = (train['resp'] > 0).astype('int')
train['act_1'] = (train['resp_1'] > 0).astype('int')
train['act_2'] = (train['resp_2'] > 0).astype('int')
train['act_3'] = (train['resp_3'] > 0).astype('int')
train['act_4'] = (train['resp_4'] > 0).astype('int')
target = torch.tensor(train[['act','act_1','act_2','act_3','act_4']].to_numpy(),dtype=torch.float,device=dev)

The tensors below will be used later. The wrtensor is used in training. We store feature_0 in a separate tensor since it is the only integer valued feature.

In [4]:
feature_0 = train['feature_0']
itensor = torch.tensor(((train.loc[:,'feature_0']+1)//2).to_numpy(),dtype=torch.long,device=dev)

We make a separate tensor that contains all other features

In [5]:
feature_names = ['feature_'+str(i) for i in range(1,130)]
train = train[feature_names]

Let's remove outliers

In [6]:
maxindex = np.zeros((129,3))
for i in range(129):
    counts = train[feature_names[i]].value_counts()
    mean = train[feature_names[i]].mean()
    std = train[feature_names[i]].std()
    sigmas = np.abs(counts.index[0]-mean)/std
    maxindex[i] = [counts.index[0], counts.iloc[0], sigmas]
    
for i in range(129):
    if maxindex[i,1] > 100 and maxindex[i,2] > 1:
        train.replace({feature_names[i]: maxindex[i,0]},np.nan)

Now we need to deal with NaN. We impute those missing values with the mean of each column.

In [7]:
fill_val=train.mean()
train = train.fillna(fill_val)

We remove features that correlate too strongly with feature_0

In [8]:
corr = train.corrwith(feature_0)
remove_names = corr.loc[np.abs(corr) > 0.7].index
train = train.drop(remove_names,axis=1)

We compute the principal components and reduce the feature space using sklearn

In [9]:
pca_components = 60
sc = StandardScaler().fit(train.to_numpy())
train = sc.transform(train.to_numpy())
pca = PCA(n_components = pca_components).fit(train)
train=pca.transform(train)

Finally we have a tensor with the last features we will use

In [10]:
train = torch.tensor(train,dtype=torch.float,device=dev)

# Model

We will make a very simple model at first using pytorch. The idea is to have fully connected layers deal with all of the floating point features, while feature_0 is used in an embedding layer.

In [11]:
e_size = 64
fc_input = pca_components
h_dims = [512,512,256,128]
dropout_rate = 0.5
epochs = 200
minibatch_size = 100000

class MarketPredictor(nn.Module):
    def __init__(self):
        super(MarketPredictor, self).__init__()
        
        self.e = nn.Embedding(2,e_size)
        self.deep = nn.Sequential(
            nn.Linear(fc_input,h_dims[0]),
            nn.BatchNorm1d(h_dims[0]),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(h_dims[0],h_dims[1]),
            nn.BatchNorm1d(h_dims[1]),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(h_dims[1],h_dims[2]),
            nn.BatchNorm1d(h_dims[2]),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(h_dims[2],h_dims[3]),
            nn.BatchNorm1d(h_dims[3]),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(h_dims[3],e_size),
            nn.BatchNorm1d(e_size),
            nn.LeakyReLU(),
            nn.Dropout(dropout_rate)
            )
        self.reduce = nn.utils.weight_norm(nn.Linear(e_size,5))
        
    def forward(self,xi,xf):
        e_out = self.e(xi)
        f_out = self.deep(xf)
        ef_out = self.reduce(e_out+f_out)
        
        return ef_out
        

Now we train it. Let's define the loss function first. In the competition we're told that the return on day $i$ is
\begin{equation}
p_i = \sum_j (\mathit{weight}_{ij}*\mathit{resp}_{ij}*\mathit{action}_{ij})
\end{equation}
This is essentially a problem about predicting the correct actions. The output of our model defines a probability distribution for the action, which is either $0$ or $1$. Now, we have a few strategies one is just to minimize the cost function, which would involve maximizing $p_i$, however, it is more efficient to minimize the [cross-entropy](https://machinelearningmastery.com/cross-entropy-for-machine-learning/) of the probability distribution defined by our network with respect to the target values of our data. During training we can then weigh this distribution if wish with the magnitude of the provided resp and weight of the training data if we wish.

In [12]:
loss = torch.nn.BCEWithLogitsLoss().to(dev)

Let's make some torch tensors which hold the training data and apply our model to it

In [13]:
model = MarketPredictor().to(dev)
opt = optim.Adam(model.parameters())

In [14]:
minibatches = batch_size//minibatch_size

for i in range(epochs):
    permutation = torch.randperm(batch_size)
    print('Epoch is',i,'/',epochs)
    for j in range(minibatches):
        opt.zero_grad()
        s = model(itensor[permutation[j*minibatch_size:(j+1)*minibatch_size]],train[permutation[j*minibatch_size:(j+1)*minibatch_size]])
        c = loss(s,target[permutation[j*minibatch_size:(j+1)*minibatch_size]])
        c.backward()
        opt.step()
    print('Loss is',c.item())

Epoch is 0 / 200
Loss is 0.703585147857666
Epoch is 1 / 200
Loss is 0.6952752470970154
Epoch is 2 / 200
Loss is 0.6931472420692444
Epoch is 3 / 200
Loss is 0.6919404864311218
Epoch is 4 / 200
Loss is 0.6913748979568481
Epoch is 5 / 200
Loss is 0.6912364959716797
Epoch is 6 / 200
Loss is 0.690486490726471
Epoch is 7 / 200
Loss is 0.6906060576438904
Epoch is 8 / 200
Loss is 0.6905030012130737
Epoch is 9 / 200
Loss is 0.6901842355728149
Epoch is 10 / 200
Loss is 0.689601480960846
Epoch is 11 / 200
Loss is 0.689899742603302
Epoch is 12 / 200
Loss is 0.6898735165596008
Epoch is 13 / 200
Loss is 0.6896629929542542
Epoch is 14 / 200
Loss is 0.6893044710159302
Epoch is 15 / 200
Loss is 0.6895025968551636
Epoch is 16 / 200
Loss is 0.6891094446182251
Epoch is 17 / 200
Loss is 0.6896559596061707
Epoch is 18 / 200
Loss is 0.6893314123153687
Epoch is 19 / 200
Loss is 0.6890257596969604
Epoch is 20 / 200
Loss is 0.6888669729232788
Epoch is 21 / 200
Loss is 0.6891071200370789
Epoch is 22 / 200
Loss i

# Saving the model
It's pretty easy to save a pytorch model. We will use pickle and save the state dict of the model.

In [15]:
path = 'marketpredictor_state_dict_'+str(epochs)+'epochs.pt'
torch.save(model.state_dict(),path)

We will also need the standard scaler and pca objects, as well as the maxindex and fill_val for when we run things for submission later

In [16]:
with open('feature_processing.pkl','wb') as f:
    pickle.dump([sc,pca,maxindex,fill_val,remove_names],f)