<a href="https://colab.research.google.com/github/edenlum/Numerai/blob/main/making-your-first-submission-on-numerai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making your first submission on Numerai

## Introduction 
This tutorial will go over how to create your first submission on Numerai.

## Overview

1. Using this notebook
2. Download the datasets
3. Train your first model
4. Generate your first predictions
4. Make your first submission


---



## 1. Using this notebook 

This is an interactive notebook. You can execute code in each cell by pressing `shift+enter`. This requires you to login with your Google account.

In order to make changes, you need to make a copy by `File -> Save a copy in Drive`.

Let's start off by installing and importing our dependencies.

In [14]:
# install dependencies
!pip install pandas sklearn numerapi halo torch



In [15]:
# import dependencies
import pandas as pd
import numerapi
import sklearn.linear_model
import utils
import numpy as np




## 2. Download the datasets

### Datasets 
*   `training_data` is used to train your model
*   `tournament_data` is used to evaluate your model

### Column descriptions
*   id: a randomized id that corresponds to a stock 
*   era: a period of time
*   data_type: either `train`, `validation`, `test`, or `live` 
*   feature_*: abstract financial features of the stock 
*   target: abstract measure of stock performance




In [None]:
# download the latest training dataset (takes around 30s)
training_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_training_data.csv.xz")
training_data.head()

In [None]:
# download the latest tournament dataset (takes around 30s)
tournament_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_tournament_data.csv.xz")
tournament_data.head()

## 3. Train your first model
Let's create a basic model using sklearn's linear regression.

In [None]:
# find only the feature columns
feature_cols = training_data.columns[training_data.columns.str.startswith('feature')]

In [None]:
# select those columns out of the training dataset
training_features = training_data[feature_cols]

In [None]:
from sklearn.model_selection import train_test_split

x = training_features
y = training_data[['target']]

X_train, X_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=42)

In [None]:
# create a model and fit the training data (~30 sec to run)
basic_model = sklearn.linear_model.LinearRegression()
basic_model.fit(training_features, training_data.target)

In [9]:
import torch
import torch.utils.data as data_utils
from models import *

ff = FeedForward(310, 1)
train(ff, X_train, y_train, 2, 128)

[1,  2000] loss: 0.052
[2,  2000] loss: 0.050
Finished Training


save the model:

In [10]:
PATH = './my_model.pth'
torch.save(ff.state_dict(), PATH)



check the model:

In [12]:
net = FeedForward(310, 1)
net.load_state_dict(torch.load(PATH))

eval(net, X_val, y_val)

Accuracy of the network on the 10000 test images: 50 %


In [17]:
loader = x_y_to_dataloader(X_val, y_val, 32)
len(loader)*32*5

501920



## 4. Generate your first predictions
Now that we have a trained model, we can use it to make predictions on the tournament data.



In [9]:
# select the feature columns from the tournament data
live_features = tournament_data[feature_cols]



In [None]:
# predict the target on the live features
predictions = basic_model.predict(live_features)
# np.round(predictions*4)/4



In [None]:
def get_corr(outputs, targets):
  df_outputs = pd.DataFrame(outputs)
  df_targets = pd.DataFrame(targets)
  ranked_outputs = df_outputs.rank(pct=True, method="first")
  corr = np.corrcoef(df_targets.iloc[:,0], ranked_outputs.iloc[:,0])[0, 1]
  return corr




In [None]:
get_corr(predictions, )

In [None]:
# predictions must have an `id` column and a `prediction` column
predictions_df = tournament_data["id"].to_frame()
predictions_df["prediction"] = predictions
predictions_df.head()

Unnamed: 0,id,prediction
0,n0003aa52cab36c2,0.472981
1,n000920ed083903f,0.492854
2,n0038e640522c4a6,0.556868
3,n004ac94a87dc54b,0.496384
4,n0052fe97ea0c05f,0.497034


## 5. Make your first submission
To enter the tournament, we must submit the predictions back to Numerai. We will use the `numerapi` library to do this.

In [12]:
# Get your API keys and model_id from https://numer.ai/notebook
public_id = "FZZLTZDEHH4T7CHF23LYMXQGSVQBMRD2"
secret_key = "7Q3PCUDJAUDTW74LG7PMNPIYSTKK542UNENVO63GKJIGDP5OIY6UEZA7AA4MBJ4U"
model_id = "fef64998-7bbe-45e5-9175-b088c16ab625"
napi = numerapi.NumerAPI(public_id=public_id, secret_key=secret_key)

In [13]:
# Upload your predictions
predictions_df.to_csv("predictions.csv", index=False)
submission_id = napi.upload_predictions("predictions.csv", model_id=model_id)

2021-11-05 21:17:14,966 INFO numerapi.base_api: uploading predictions...


# Done 🚀
Good job! You just made your first submission on Numerai!

Head back over to https://numer.ai/notebook to continue.