<a href="https://colab.research.google.com/github/edenlum/Numerai/blob/main/making-your-first-submission-on-numerai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making your first submission on Numerai

## Introduction 
This tutorial will go over how to create your first submission on Numerai.

## Overview

1. Using this notebook
2. Download the datasets
3. Train your first model
4. Generate your first predictions
4. Make your first submission


---



## 1. Using this notebook 

This is an interactive notebook. You can execute code in each cell by pressing `shift+enter`. This requires you to login with your Google account.

In order to make changes, you need to make a copy by `File -> Save a copy in Drive`.

Let's start off by installing and importing our dependencies.

In [1]:
# install dependencies
!pip install pandas sklearn numerapi halo torch



In [1]:
# import dependencies
import pandas as pd
import numerapi
import sklearn.linear_model
import utils
import numpy as np
import download



## 2. Download the datasets

### Datasets 
*   `training_data` is used to train your model
*   `tournament_data` is used to evaluate your model

### Column descriptions
*   id: a randomized id that corresponds to a stock 
*   era: a period of time
*   data_type: either `train`, `validation`, `test`, or `live` 
*   feature_*: abstract financial features of the stock 
*   target: abstract measure of stock performance




In [2]:
# download the latest training dataset (takes around 30s)
training_data = download.download_data("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_training_data.csv.xz", "training_data")
training_data.head()

Data is up to date.


Unnamed: 0,id,era,data_type,feature_intelligence1,feature_intelligence2,feature_intelligence3,feature_intelligence4,feature_intelligence5,feature_intelligence6,feature_intelligence7,...,feature_wisdom38,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46,target
0,n000315175b67977,era1,train,0.0,0.5,0.25,0.0,0.5,0.25,0.25,...,1.0,1.0,0.75,0.5,0.75,0.5,1.0,0.5,0.75,0.5
1,n0014af834a96cdd,era1,train,0.0,0.0,0.0,0.25,0.5,0.0,0.0,...,1.0,1.0,0.0,0.0,0.75,0.25,0.0,0.25,1.0,0.25
2,n001c93979ac41d4,era1,train,0.25,0.5,0.25,0.25,1.0,0.75,0.75,...,0.25,0.5,0.0,0.0,0.5,1.0,0.0,0.25,0.75,0.25
3,n0034e4143f22a13,era1,train,1.0,0.0,0.0,0.5,0.5,0.25,0.25,...,1.0,1.0,0.75,0.75,1.0,1.0,0.75,1.0,1.0,0.25
4,n00679d1a636062f,era1,train,0.25,0.25,0.25,0.25,0.0,0.25,0.5,...,0.75,0.75,0.25,0.5,0.75,0.0,0.5,0.25,0.75,0.75




In [3]:
# download the latest tournament dataset (takes around 30s)
tournament_data = download.download_data("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_tournament_data.csv.xz", "tournament_data")
tournament_data.head()

Data is up to date.


Unnamed: 0,id,era,data_type,feature_intelligence1,feature_intelligence2,feature_intelligence3,feature_intelligence4,feature_intelligence5,feature_intelligence6,feature_intelligence7,...,feature_wisdom38,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46,target
0,n0003aa52cab36c2,era121,validation,0.25,0.75,0.5,0.5,0.0,0.75,0.5,...,0.75,0.75,1.0,0.75,0.5,0.5,1.0,0.0,0.0,0.25
1,n000920ed083903f,era121,validation,0.75,0.5,0.75,1.0,0.5,0.0,0.0,...,0.5,0.5,0.75,1.0,0.75,0.5,0.5,0.5,0.5,0.5
2,n0038e640522c4a6,era121,validation,1.0,0.0,0.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.5,0.25,0.0,0.0,0.5,0.5,0.0,1.0
3,n004ac94a87dc54b,era121,validation,0.75,1.0,1.0,0.5,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.25,0.5
4,n0052fe97ea0c05f,era121,validation,0.25,0.5,0.5,0.25,1.0,0.5,0.5,...,0.5,0.75,0.0,0.0,0.75,1.0,0.0,0.25,1.0,0.75




## 3. Train your first model
Let's create a basic model using sklearn's linear regression.

In [4]:
# find only the feature columns
feature_cols = training_data.columns[training_data.columns.str.startswith('feature')]



In [5]:
# select those columns out of the training dataset
training_features = training_data[feature_cols]



In [6]:
from sklearn.model_selection import train_test_split

x = training_features
y = training_data[['target']]

X_train, X_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=42)



In [7]:
# create a model and fit the training data (~30 sec to run)
basic_model = sklearn.linear_model.LinearRegression()
basic_model.fit(X_train, y_train)

LinearRegression()



In [8]:
import torch
import torch.utils.data as data_utils
from models import *

class FeedForward1(nn.Module):
    def __init__(self, _in: int, _out: int):
        super().__init__()
        self.fc1 = nn.Linear(_in, _out)

    def forward(self, x):
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = torch.sigmoid(self.fc1(x))
        return x

ff = FeedForward1(310, 1)
train(ff, X_train, y_train, 10, 128)

[1,  2000] loss: 0.051
[2,  2000] loss: 0.050
[3,  2000] loss: 0.050
[4,  2000] loss: 0.050
[5,  2000] loss: 0.050
[6,  2000] loss: 0.050
[7,  2000] loss: 0.050
[8,  2000] loss: 0.050
[9,  2000] loss: 0.050
[10,  2000] loss: 0.050
Finished Training


save the model:

In [9]:
PATH = './my_model.pth'
torch.save(ff.state_dict(), PATH)



check the model:

In [10]:
net = FeedForward1(310, 1)
net.load_state_dict(torch.load(PATH))

eval(net, X_val, y_val)

Accuracy of the network on the 100384 test images: 50 %


In [11]:
loader = x_y_to_dataloader(X_val, y_val, 32)
len(loader)*32*5

501920



## 4. Generate your first predictions
Now that we have a trained model, we can use it to make predictions on the tournament data.



In [12]:
# select the feature columns from the tournament data
live_features = tournament_data[feature_cols]



In [13]:
# predict the target on the live features
predictions2 = net(df_to_tensor(X_val))[:, 0].detach().numpy()
predictions = basic_model.predict(X_val)
# np.round(predictions*4)/4
predictions.shape

(100362, 1)



In [14]:
def get_corr(outputs, targets):
    df_outputs = pd.DataFrame(outputs)
    df_targets = pd.DataFrame(targets)
    ranked_outputs = df_outputs.rank(pct=True, method="first")
    corr = np.corrcoef(df_targets.iloc[:,0], ranked_outputs.iloc[:,0])[0, 1]
    return corr




In [15]:
get_corr(predictions, y_val)

0.03519302770703773



In [16]:
get_corr(predictions2, y_val)

0.03339594389457344



In [18]:
for p in net.parameters():
    print(p)

Parameter containing:
tensor([[ 0.0073, -0.0274, -0.0409, -0.0144,  0.0218,  0.0088,  0.0026, -0.0057,
         -0.0256,  0.0065, -0.0170, -0.0120,  0.0211,  0.0264, -0.0047,  0.0071,
          0.0185,  0.0174,  0.0224,  0.0132,  0.0104,  0.0020,  0.0024,  0.0043,
          0.0124,  0.0044, -0.0149, -0.0159, -0.0137, -0.0027, -0.0130, -0.0106,
          0.0044,  0.0164, -0.0056, -0.0255, -0.0158,  0.0059, -0.0371,  0.0166,
         -0.0153,  0.0281,  0.0045, -0.0012, -0.0038, -0.0078,  0.0090, -0.0047,
          0.0287, -0.0262,  0.0035,  0.0018,  0.0072,  0.0152,  0.0114,  0.0056,
          0.0076,  0.0335,  0.0136,  0.0068,  0.0059, -0.0036, -0.0202,  0.0004,
         -0.0100,  0.0007,  0.0010, -0.0016,  0.0113,  0.0080, -0.0056, -0.0224,
          0.0099,  0.0021,  0.0155,  0.0102,  0.0037, -0.0232,  0.0018,  0.0267,
         -0.0224, -0.0076,  0.0169, -0.0220, -0.0065, -0.0050, -0.0092, -0.0115,
          0.0462,  0.0107, -0.0057, -0.0021,  0.0075, -0.0200, -0.0205, -0.0187,
      

In [None]:
# predictions must have an `id` column and a `prediction` column
predictions_df = tournament_data["id"].to_frame()
predictions_df["prediction"] = predictions
predictions_df.head()

Unnamed: 0,id,prediction
0,n0003aa52cab36c2,0.472981
1,n000920ed083903f,0.492854
2,n0038e640522c4a6,0.556868
3,n004ac94a87dc54b,0.496384
4,n0052fe97ea0c05f,0.497034


## 5. Make your first submission
To enter the tournament, we must submit the predictions back to Numerai. We will use the `numerapi` library to do this.

In [12]:
# Get your API keys and model_id from https://numer.ai/notebook
public_id = "FZZLTZDEHH4T7CHF23LYMXQGSVQBMRD2"
secret_key = "7Q3PCUDJAUDTW74LG7PMNPIYSTKK542UNENVO63GKJIGDP5OIY6UEZA7AA4MBJ4U"
model_id = "fef64998-7bbe-45e5-9175-b088c16ab625"
napi = numerapi.NumerAPI(public_id=public_id, secret_key=secret_key)

In [13]:
# Upload your predictions
predictions_df.to_csv("predictions.csv", index=False)
submission_id = napi.upload_predictions("predictions.csv", model_id=model_id)

2021-11-05 21:17:14,966 INFO numerapi.base_api: uploading predictions...


# Done 🚀
Good job! You just made your first submission on Numerai!

Head back over to https://numer.ai/notebook to continue.