# Numerai Round 252

## Parameter Vector Model

Goals this round:
- Train the model with correlation, not average distance from target
- Use validation data before submission

Train the model to find a golden parameter vector of equal dimensionality, plus one extra dimension for weight. 

To predict the target, calculate the angle (cosine distance), and the relative magnitude between teh parameter vector. Let the weight

### 1. Prepare Notebook

#### Column descriptions
*   id: a randomized id that corresponds to a stock 
*   era: a period of time
*   data_type: either `train`, `validation`, `test`, or `live` 
*   feature_*: abstract financial features of the stock 
*   target: abstract measure of stock performance

##### Install Dependencies and Import Packages

In [1]:
# !pip install pandas sklearn numerapi
# !pip install tqdm npm ipywidgets
# !jupyter nbextension enable --py widgetsnbextension
# !jupyter labextension install @jupyter-widgets/jupyterlab-manager
# !pip install ipympl
# %matplotlib widget
# !jupyter labextension install jupyter-matplotlib
# !jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-matplotlib
# !pip3 install papermill

# import dependencies
import pandas as pd
from scipy.spatial import distance
from os import path
from tqdm.notebook import tqdm, trange
import numpy
import math
import random as rand
import numerapi
import time
import sys
import matplotlib.pyplot as plt
from IPython.display import clear_output
from sklearn.ensemble import RandomForestClassifier
import papermill

##### General Functions

In [2]:
def clamp(my_value, min_value, max_value):
    return max(min(my_value, max_value), min_value)

def clamparray(array, min_val, max_val):
    for i in range(len(array)):
        array[i] = clamp(array[i], min_val, max_val)
    return array

def format_y(df): return (4 * df.target).astype(int)

##### Download Latest Data

In [3]:
# download the latest training dataset (takes around 30s)
training_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_training_data.csv.xz")
# download the latest tournament dataset (takes around 30s)
tournament_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_tournament_data.csv.xz")
tournament_data.head()

Unnamed: 0,id,era,data_type,feature_intelligence1,feature_intelligence2,feature_intelligence3,feature_intelligence4,feature_intelligence5,feature_intelligence6,feature_intelligence7,...,feature_wisdom38,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46,target
0,n0003aa52cab36c2,era121,validation,0.25,0.75,0.5,0.5,0.0,0.75,0.5,...,0.75,0.75,1.0,0.75,0.5,0.5,1.0,0.0,0.0,0.25
1,n000920ed083903f,era121,validation,0.75,0.5,0.75,1.0,0.5,0.0,0.0,...,0.5,0.5,0.75,1.0,0.75,0.5,0.5,0.5,0.5,0.5
2,n0038e640522c4a6,era121,validation,1.0,0.0,0.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.5,0.25,0.0,0.0,0.5,0.5,0.0,1.0
3,n004ac94a87dc54b,era121,validation,0.75,1.0,1.0,0.5,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.25,0.5
4,n0052fe97ea0c05f,era121,validation,0.25,0.5,0.5,0.25,1.0,0.5,0.5,...,0.5,0.75,0.0,0.0,0.75,1.0,0.0,0.25,1.0,0.75


### 2. Train the Model

In [4]:
# find only the feature columns
feature_cols = training_data.columns[training_data.columns.str.startswith('feature')]
training_features = training_data[feature_cols]

model = RandomForestClassifier()
X = training_features
y = format_y(training_data)
model.fit(X, y)

RandomForestClassifier()

### 3. Validate Model

In [5]:
# get data from the tournament data which include answers (i.e., drop rows where target = NaN)
feature_cols = tournament_data.columns[tournament_data.columns.str.startswith('feature')]
validation_data = tournament_data.copy().dropna()
validation_features = validation_data[feature_cols]

# make predictions on validation data
validation_targets = format_y(validation_data)
validation_predictions = model.predict(validation_features)

# plot predictions vs targets, and calculate correlation
corr = round(numpy.corrcoef(validation_targets, validation_predictions)[0, 1], 4)
print("Validation Correlation = " + str(corr))
acc = model.score(validation_features, validation_targets)
print("Validation mean accuracy = " + str(acc))

Validation Correlation = -0.0015


Validation mean accuracy = 0.49560528092089506


### 4. Generate Predictions

In [6]:
# select the feature columns from the tournament data
live_features = tournament_data[feature_cols]

# predict the target on the live features
predictions = model.predict(live_features) / 4.0 # back to 0.25 increments

# predictions must have an `id` column and a `prediction_kazutsugi` column
predictions_df = tournament_data["id"].to_frame()
predictions_df["prediction_kazutsugi"] = predictions
predictions_df.head()

Unnamed: 0,id,prediction_kazutsugi
0,n0003aa52cab36c2,0.5
1,n000920ed083903f,0.5
2,n0038e640522c4a6,0.5
3,n004ac94a87dc54b,0.5
4,n0052fe97ea0c05f,0.5


### 5. Submit Predictions

In [7]:
# Get API keys and model_id from https://numer.ai/submit
public_id = "JQSPCE2MKNH2BDULWMLU7SERQF4YDACE"
secret_key = "BLZJU3NECY3ZOZ6TXBY5XR7LY2VC3DFH4UVXF3VVWTE2P5NS7EZL73UUOMOH5BZH"
model_id = "777a91f4-d61e-43dc-8b96-1ec25afd1da4"
napi = numerapi.NumerAPI(public_id=public_id, secret_key=secret_key)
predictions_df.to_csv("predictions.csv", index=False)
submission_id = napi.upload_predictions("predictions.csv", model_id=model_id)
print('submitted!')

2021-08-21 21:46:42,087 INFO numerapi.base_api: uploading predictions...


submitted!


##### Done!