<a href="https://colab.research.google.com/github/Kei-Sanada/Numerai/blob/master/Making_your_first_submission_on_Numerai_20200927.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making your first submission on Numerai

## Introduction 
This tutorial will go over how to create your first submission on Numerai.

## Overview

1. Using this notebook
2. Download the datasets
3. Train your first model
4. Generate your first predictions
4. Make your first submission


---



## 1. Using this notebook 

This is an interactive notebook. You can execute code in each cell by pressing `shift+enter`. This requires you to login with your Google account.

In order to make changes, you need to make a copy by `File -> Save a copy in Drive`.

Let's start off by installing and importing our dependencies.

In [None]:
# install dependencies
!pip install pandas sklearn numerapi

Collecting numerapi
  Downloading https://files.pythonhosted.org/packages/e9/b0/4992d6de584c82297b6060260a21b94c05c37fb64438746057b612e10f07/numerapi-2.3.4-py3-none-any.whl
Installing collected packages: numerapi
Successfully installed numerapi-2.3.4


In [None]:
# import dependencies
import pandas as pd
import numerapi
import sklearn.linear_model

## 2. Download the datasets

### Datasets 
*   `training_data` is used to train your model
*   `tournament_data` is used to evaluate your model

### Column descriptions
*   id: a randomized id that corresponds to a stock 
*   era: a period of time
*   data_type: either `train`, `validation`, `test`, or `live` 
*   feature_*: abstract financial features of the stock 
*   target_kazutsugi: abstract measure of stock performance




In [None]:
# download the latest training dataset (takes around 30s)
training_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_training_data.csv.xz")
training_data.head()

Unnamed: 0,id,era,data_type,feature_intelligence1,feature_intelligence2,feature_intelligence3,feature_intelligence4,feature_intelligence5,feature_intelligence6,feature_intelligence7,feature_intelligence8,feature_intelligence9,feature_intelligence10,feature_intelligence11,feature_intelligence12,feature_charisma1,feature_charisma2,feature_charisma3,feature_charisma4,feature_charisma5,feature_charisma6,feature_charisma7,feature_charisma8,feature_charisma9,feature_charisma10,feature_charisma11,feature_charisma12,feature_charisma13,feature_charisma14,feature_charisma15,feature_charisma16,feature_charisma17,feature_charisma18,feature_charisma19,feature_charisma20,feature_charisma21,feature_charisma22,feature_charisma23,feature_charisma24,feature_charisma25,...,feature_wisdom8,feature_wisdom9,feature_wisdom10,feature_wisdom11,feature_wisdom12,feature_wisdom13,feature_wisdom14,feature_wisdom15,feature_wisdom16,feature_wisdom17,feature_wisdom18,feature_wisdom19,feature_wisdom20,feature_wisdom21,feature_wisdom22,feature_wisdom23,feature_wisdom24,feature_wisdom25,feature_wisdom26,feature_wisdom27,feature_wisdom28,feature_wisdom29,feature_wisdom30,feature_wisdom31,feature_wisdom32,feature_wisdom33,feature_wisdom34,feature_wisdom35,feature_wisdom36,feature_wisdom37,feature_wisdom38,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46,target_kazutsugi
0,n000315175b67977,era1,train,0.0,0.5,0.25,0.0,0.5,0.25,0.25,0.25,0.75,0.75,0.25,0.25,1.0,0.75,0.5,1.0,0.5,0.0,0.5,0.5,0.0,0.0,0.0,1.0,0.25,0.0,0.5,0.25,0.75,0.5,1.0,0.75,0.75,0.5,0.5,0.75,0.5,...,0.75,0.75,0.75,0.5,1.0,1.0,0.5,0.75,0.5,0.25,0.25,0.75,0.5,1.0,0.5,0.75,0.75,0.25,0.5,1.0,0.75,0.5,0.5,1.0,0.25,0.5,0.5,0.5,0.75,1.0,1.0,1.0,0.75,0.5,0.75,0.5,1.0,0.5,0.75,0.75
1,n0014af834a96cdd,era1,train,0.0,0.0,0.0,0.25,0.5,0.0,0.0,0.25,0.5,0.5,0.0,0.5,0.0,0.5,0.5,0.5,0.5,0.5,0.25,0.25,0.5,0.0,1.0,0.5,0.5,0.5,0.75,0.5,0.5,0.75,0.25,0.5,0.75,0.5,0.25,0.75,0.5,...,0.25,0.25,0.25,1.0,1.0,0.5,0.5,0.5,0.0,0.25,1.0,0.5,1.0,1.0,0.5,0.5,0.5,1.0,0.25,0.75,1.0,0.25,0.25,1.0,0.5,0.5,0.5,0.75,0.75,0.75,1.0,1.0,0.0,0.0,0.75,0.25,0.0,0.25,1.0,0.25
2,n001c93979ac41d4,era1,train,0.25,0.5,0.25,0.25,1.0,0.75,0.75,0.25,0.0,0.25,0.5,1.0,0.5,0.75,0.5,0.5,1.0,0.5,0.5,0.5,0.25,0.0,0.25,0.75,0.75,0.75,0.5,0.75,0.5,0.25,0.5,0.75,0.25,0.5,0.5,0.75,0.5,...,0.25,1.0,1.0,1.0,0.5,1.0,1.0,1.0,0.5,1.0,0.0,1.0,1.0,0.5,1.0,0.75,1.0,0.0,0.5,0.75,0.0,1.0,0.5,0.5,0.75,1.0,0.75,1.0,0.25,0.5,0.25,0.5,0.0,0.0,0.5,1.0,0.0,0.25,0.75,0.0
3,n0034e4143f22a13,era1,train,1.0,0.0,0.0,0.5,0.5,0.25,0.25,0.75,0.25,0.5,0.5,0.5,0.75,0.5,1.0,0.5,0.5,0.0,1.0,0.0,0.75,0.0,0.5,0.5,0.5,0.5,0.0,0.5,0.5,0.75,0.75,0.5,0.25,0.5,0.5,0.5,0.5,...,1.0,1.0,0.75,0.75,1.0,0.75,0.75,0.75,1.0,0.75,1.0,0.75,1.0,0.75,1.0,0.0,0.5,0.75,1.0,0.75,1.0,0.75,1.0,1.0,0.0,0.5,0.75,0.75,1.0,0.75,1.0,1.0,0.75,0.75,1.0,1.0,0.75,1.0,1.0,0.0
4,n00679d1a636062f,era1,train,0.25,0.25,0.25,0.25,0.0,0.25,0.5,0.25,0.25,0.5,0.25,0.25,0.75,0.5,0.0,0.5,0.5,0.25,0.0,0.5,0.0,0.5,0.25,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.75,0.5,0.25,0.5,0.5,0.5,0.5,...,1.0,0.25,0.75,1.0,0.75,0.0,0.0,0.75,0.5,1.0,0.5,0.75,0.25,0.5,0.0,0.5,0.5,0.5,0.75,0.75,0.5,0.75,0.25,0.75,0.5,0.5,0.25,0.25,0.75,0.5,0.75,0.75,0.25,0.5,0.75,0.0,0.5,0.25,0.75,0.75


In [None]:
# download the latest tournament dataset (takes around 30s)
tournament_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_tournament_data.csv.xz")
tournament_data.head()

Unnamed: 0,id,era,data_type,feature_intelligence1,feature_intelligence2,feature_intelligence3,feature_intelligence4,feature_intelligence5,feature_intelligence6,feature_intelligence7,feature_intelligence8,feature_intelligence9,feature_intelligence10,feature_intelligence11,feature_intelligence12,feature_charisma1,feature_charisma2,feature_charisma3,feature_charisma4,feature_charisma5,feature_charisma6,feature_charisma7,feature_charisma8,feature_charisma9,feature_charisma10,feature_charisma11,feature_charisma12,feature_charisma13,feature_charisma14,feature_charisma15,feature_charisma16,feature_charisma17,feature_charisma18,feature_charisma19,feature_charisma20,feature_charisma21,feature_charisma22,feature_charisma23,feature_charisma24,feature_charisma25,...,feature_wisdom8,feature_wisdom9,feature_wisdom10,feature_wisdom11,feature_wisdom12,feature_wisdom13,feature_wisdom14,feature_wisdom15,feature_wisdom16,feature_wisdom17,feature_wisdom18,feature_wisdom19,feature_wisdom20,feature_wisdom21,feature_wisdom22,feature_wisdom23,feature_wisdom24,feature_wisdom25,feature_wisdom26,feature_wisdom27,feature_wisdom28,feature_wisdom29,feature_wisdom30,feature_wisdom31,feature_wisdom32,feature_wisdom33,feature_wisdom34,feature_wisdom35,feature_wisdom36,feature_wisdom37,feature_wisdom38,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46,target_kazutsugi
0,n0003aa52cab36c2,era121,validation,0.25,0.75,0.5,0.5,0.0,0.75,0.5,0.25,0.5,0.5,0.25,0.0,0.25,0.5,0.25,0.0,0.25,1.0,1.0,0.25,1.0,1.0,0.25,0.25,0.0,0.5,0.25,0.75,0.0,0.5,0.25,0.25,0.25,0.5,0.0,0.5,1.0,...,0.0,0.0,0.25,0.5,0.25,0.25,0.0,0.25,0.0,0.25,0.5,0.5,0.5,0.5,0.0,0.25,0.75,0.25,0.25,0.5,0.25,0.0,0.25,0.5,0.25,0.5,0.25,0.25,1.0,0.75,0.75,0.75,1.0,0.75,0.5,0.5,1.0,0.0,0.0,0.0
1,n000920ed083903f,era121,validation,0.75,0.5,0.75,1.0,0.5,0.0,0.0,0.75,0.25,0.0,0.75,0.5,0.0,0.25,0.5,0.0,1.0,0.25,0.25,1.0,1.0,0.25,0.75,0.0,0.0,0.75,1.0,1.0,0.0,0.25,0.0,0.0,0.25,0.25,0.25,0.0,1.0,...,0.5,0.5,0.25,1.0,0.5,0.25,0.0,0.25,0.5,0.25,1.0,0.25,0.0,0.5,0.75,0.75,0.5,1.0,1.0,0.25,0.5,0.25,0.5,0.5,0.5,0.5,0.25,0.25,0.75,0.5,0.5,0.5,0.75,1.0,0.75,0.5,0.5,0.5,0.5,0.25
2,n0038e640522c4a6,era121,validation,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.5,0.5,1.0,1.0,1.0,0.75,0.5,0.5,1.0,1.0,0.5,0.5,0.0,1.0,0.5,1.0,0.5,1.0,0.5,1.0,0.25,1.0,1.0,1.0,0.5,1.0,1.0,0.75,1.0,...,0.25,0.5,0.0,0.0,0.0,0.25,0.25,0.0,0.5,0.0,0.0,0.0,0.25,0.0,0.25,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.75,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.5,0.25,0.0,0.0,0.5,0.5,0.0,1.0
3,n004ac94a87dc54b,era121,validation,0.75,1.0,1.0,0.5,0.0,0.0,0.0,0.5,0.75,1.0,0.75,0.0,0.5,0.0,0.5,0.75,0.5,0.75,0.25,0.75,0.25,0.75,0.25,0.75,1.0,0.5,0.5,0.75,0.5,1.0,0.5,0.25,0.75,0.25,0.75,0.25,0.75,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.25,0.75
4,n0052fe97ea0c05f,era121,validation,0.25,0.5,0.5,0.25,1.0,0.5,0.5,0.25,0.25,0.5,0.5,1.0,1.0,1.0,1.0,0.75,0.5,0.5,0.5,0.75,0.0,0.0,0.0,0.25,0.0,0.0,0.75,0.25,1.0,0.25,1.0,0.75,0.0,1.0,0.75,0.75,0.75,...,0.0,0.5,0.5,0.0,0.75,0.5,0.75,0.25,0.25,0.25,0.0,0.25,0.5,0.25,1.0,1.0,1.0,0.0,0.25,0.0,0.0,0.25,0.25,0.75,1.0,1.0,0.75,0.75,0.5,0.5,0.5,0.75,0.0,0.0,0.75,1.0,0.0,0.25,1.0,1.0


## 3. Train your first model
Let's create a basic model using sklearn's linear regression.

In [None]:
# find only the feature columns
feature_cols = training_data.columns[training_data.columns.str.startswith('feature_wisdom')]

In [None]:
print(feature_cols)

Index(['feature_wisdom1', 'feature_wisdom2', 'feature_wisdom3',
       'feature_wisdom4', 'feature_wisdom5', 'feature_wisdom6',
       'feature_wisdom7', 'feature_wisdom8', 'feature_wisdom9',
       'feature_wisdom10', 'feature_wisdom11', 'feature_wisdom12',
       'feature_wisdom13', 'feature_wisdom14', 'feature_wisdom15',
       'feature_wisdom16', 'feature_wisdom17', 'feature_wisdom18',
       'feature_wisdom19', 'feature_wisdom20', 'feature_wisdom21',
       'feature_wisdom22', 'feature_wisdom23', 'feature_wisdom24',
       'feature_wisdom25', 'feature_wisdom26', 'feature_wisdom27',
       'feature_wisdom28', 'feature_wisdom29', 'feature_wisdom30',
       'feature_wisdom31', 'feature_wisdom32', 'feature_wisdom33',
       'feature_wisdom34', 'feature_wisdom35', 'feature_wisdom36',
       'feature_wisdom37', 'feature_wisdom38', 'feature_wisdom39',
       'feature_wisdom40', 'feature_wisdom41', 'feature_wisdom42',
       'feature_wisdom43', 'feature_wisdom44', 'feature_wisdom45',
    

In [None]:
# select those columns out of the training dataset
training_features = training_data[feature_cols]

In [None]:
training_features.head()


Unnamed: 0,feature_wisdom1,feature_wisdom2,feature_wisdom3,feature_wisdom4,feature_wisdom5,feature_wisdom6,feature_wisdom7,feature_wisdom8,feature_wisdom9,feature_wisdom10,feature_wisdom11,feature_wisdom12,feature_wisdom13,feature_wisdom14,feature_wisdom15,feature_wisdom16,feature_wisdom17,feature_wisdom18,feature_wisdom19,feature_wisdom20,feature_wisdom21,feature_wisdom22,feature_wisdom23,feature_wisdom24,feature_wisdom25,feature_wisdom26,feature_wisdom27,feature_wisdom28,feature_wisdom29,feature_wisdom30,feature_wisdom31,feature_wisdom32,feature_wisdom33,feature_wisdom34,feature_wisdom35,feature_wisdom36,feature_wisdom37,feature_wisdom38,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46
0,0.25,1.0,0.75,0.5,0.75,0.75,0.75,0.75,0.75,0.75,0.5,1.0,1.0,0.5,0.75,0.5,0.25,0.25,0.75,0.5,1.0,0.5,0.75,0.75,0.25,0.5,1.0,0.75,0.5,0.5,1.0,0.25,0.5,0.5,0.5,0.75,1.0,1.0,1.0,0.75,0.5,0.75,0.5,1.0,0.5,0.75
1,0.5,1.0,0.0,0.25,0.0,1.0,1.0,0.25,0.25,0.25,1.0,1.0,0.5,0.5,0.5,0.0,0.25,1.0,0.5,1.0,1.0,0.5,0.5,0.5,1.0,0.25,0.75,1.0,0.25,0.25,1.0,0.5,0.5,0.5,0.75,0.75,0.75,1.0,1.0,0.0,0.0,0.75,0.25,0.0,0.25,1.0
2,1.0,0.5,1.0,0.75,0.0,1.0,0.75,0.25,1.0,1.0,1.0,0.5,1.0,1.0,1.0,0.5,1.0,0.0,1.0,1.0,0.5,1.0,0.75,1.0,0.0,0.5,0.75,0.0,1.0,0.5,0.5,0.75,1.0,0.75,1.0,0.25,0.5,0.25,0.5,0.0,0.0,0.5,1.0,0.0,0.25,0.75
3,1.0,1.0,1.0,1.0,0.75,0.75,1.0,1.0,1.0,0.75,0.75,1.0,0.75,0.75,0.75,1.0,0.75,1.0,0.75,1.0,0.75,1.0,0.0,0.5,0.75,1.0,0.75,1.0,0.75,1.0,1.0,0.0,0.5,0.75,0.75,1.0,0.75,1.0,1.0,0.75,0.75,1.0,1.0,0.75,1.0,1.0
4,0.25,0.75,0.0,0.5,0.5,1.0,0.75,1.0,0.25,0.75,1.0,0.75,0.0,0.0,0.75,0.5,1.0,0.5,0.75,0.25,0.5,0.0,0.5,0.5,0.5,0.75,0.75,0.5,0.75,0.25,0.75,0.5,0.5,0.25,0.25,0.75,0.5,0.75,0.75,0.25,0.5,0.75,0.0,0.5,0.25,0.75


In [None]:
# create a model and fit the training data (~30 sec to run)
model = sklearn.linear_model.LinearRegression()
model.fit(training_features, training_data.target_kazutsugi)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

## 4. Generate your first predictions
Now that we have a trained model, we can use it to make predictions on the tournament data.



In [None]:
# select the feature columns from the tournament data
live_features = tournament_data[feature_cols]

In [None]:
# predict the target on the live features
predictions = model.predict(live_features)

In [None]:
# predictions must have an `id` column and a `prediction_kazutsugi` column
predictions_df = tournament_data["id"].to_frame()
predictions_df["prediction_kazutsugi"] = predictions
predictions_df.head()

Unnamed: 0,id,prediction_kazutsugi
0,n0003aa52cab36c2,0.505695
1,n000920ed083903f,0.509867
2,n0038e640522c4a6,0.503935
3,n004ac94a87dc54b,0.489139
4,n0052fe97ea0c05f,0.50162


## 5. Make your first submission
To enter the tournament, we must submit the predictions back to Numerai. We will use the `numerapi` library to do this.

In [None]:
# Get your API keys and model_id from https://numer.ai/submit
public_id = "CYATEL5QQBU6APNFLCV7HEE7PV6SC7V6"
secret_key = "Y22BTSUGU4JEFGQB3RZNEESSULKA3HQJPAW3KI6BIXH2AMNMCTC44IFWTOQIO2UW"
model_id = "3c77ba09-cfa2-4b18-b789-918340c84c82"
napi = numerapi.NumerAPI(public_id=public_id, secret_key=secret_key)

In [None]:
# Upload your predictions
predictions_df.to_csv("predictions.csv", index=False)
submission_id = napi.upload_predictions("predictions.csv", model_id=model_id)

2020-09-27 07:16:47,794 INFO numerapi.base_api: uploading predictions...


# Done 🚀
Good job! You just made your first submission on Numerai!

Head back over to https://numer.ai/submit to continue.