# Numerai Models
Spring 2024 - Statistical Arbitrage Team

These models focus on data provided by Numerai for its Numerai Tournament and not on the Signals Tournament.

[Numerai Docs](https://docs.numer.ai/)

By Ethan Nguyen-Tu


# Numerai Dictionary



- `Correlation, CORR`  -  Pearson Correlation between the model and target; calcualted by the function numerai_corr
- `era`  -  date, each exactly a week apart
- `features`  -  describes the attributes of the stock for a given date; quantitative attributes
- `id`  -  stock id
- `max drawdown`  -  a measure of a model's risk; in finance, the largest loss suffered by an investment strategy; computed in numerai as the maximum peak to trough drop in a cumulative score
- `Meta Model Contribution, MMC`  -  a measure of how uniquely additive a model is to the Numerai Meta Model; calculated by the funciton correlation_contribution
- `mean`  -  the primary measure of a model's long-time performance
- `sharpe`  -  a measure of a model's consistency; computed in Numerai as the mean divided by the standard deviation
- `sharpe ratio`  -  in finance, this ratio of an investment strategy measures risk adjusted returns
- `target`  -  a measure of future returns for a given stock; a measure of stock market returns over the next 20 (business) days

# Basic Model and Data Setup

Note: Numerai's dataset is `obfuscated`, which means that the underlying stock ids, feature names, and target definitions are anonymized.

In [None]:
# Check Python Version
!python --version

Python 3.10.12


In [None]:
# Install dependencies
!pip install -q numerapi pandas pyarrow matplotlib lightgbm scikit-learn cloudpickle scipy==1.10.1

# Inline plots
%matplotlib inline

In [None]:
# Import necessary basic packages
import json
import pandas as pd
import numpy as np
from numerapi import NumerAPI

In [None]:
# Initialize Numerai API client
napi = NumerAPI()

# Numerai Dataset Version
DATA_VERSION = "v4.3"

# download the feature metadata file
napi.download_dataset(f"{DATA_VERSION}/features.json");

# download the training metadata file
napi.download_dataset(f"{DATA_VERSION}/train_int8.parquet");

# load the metadata
feature_metadata = json.load(open(f"{DATA_VERSION}/features.json"))

# Data Exploration

## Check the Versions and Files
Numerai offers a few different versions and files to work with.

[More Info](https://docs.numer.ai/numerai-tournament/data#versions)

In [None]:
all_datasets = napi.list_datasets()

### Version Info
Each minor version (i.e. v4 vs v4.1 vs v4.2) generally maintains backwards compatibility with each other and makes it easy to plug in the latest data into trained models. Major versions (i.e. v3 vs v4) may have large, breaking changes to the structure and/or contents of the datasets, so it's usually best to re-train models when major versions are released.

In [None]:
# Check available versions
dataset_versions = list(set(d.split('/')[0] for d in all_datasets))
print("Available versions:\n", dataset_versions)

Available versions:
 ['v4.3', 'v4.1', 'v4.2', 'v4']


### Features Info
The main file format of Numerai data API is Parquet, which works great for large columnar data. Files have an int8 suffix denotes that the data in that file is stored using integers (0, 1, 2, 3, 4) instead of floats (0.0, 0.25, 0.5, 0.75, 1.0) because it requires less resources for models to train on int8 than on float32, which is helpful due to the large size of Numerai data sets.

Common files Numerai will usually give out with every version:

- `features.json` contains metadata about the features and feature sets, this is critical to use when you have limited resources, more on this below

- `train_int8.parquet` contains the historical data with features and targets

- `validation_int8.parquet` contains more historical data with features and targets

- `live_int8.parquet` contains the latest live features with no targets of the current round

- `meta_model.parquet` contains the meta model predictions of past rounds

- `live_example_preds.parquet` contains the latest live predictions of the example model

- `validation_example_preds.parquet` contains the validation predictions of the example model

In [None]:
# DATA_VERSION is already specified in "Basic Model and Data Setup"
# DATA_VERSION = "v4.3" # Uncomment to change the version

# Print all files available for download for the specified version
current_version_files = [f for f in all_datasets if f.startswith(DATA_VERSION)]
print("availbable", DATA_VERSION, "files:\n", current_version_files)

availbable v4.3 files:
 ['v4.3/features.json', 'v4.3/live_benchmark_models.parquet', 'v4.3/live_example_preds.csv', 'v4.3/live_example_preds.parquet', 'v4.3/live_int8.parquet', 'v4.3/meta_model.parquet', 'v4.3/train_benchmark_models.parquet', 'v4.3/train_int8.parquet', 'v4.3/validation_benchmark_models.parquet', 'v4.3/validation_example_preds.csv', 'v4.3/validation_example_preds.parquet', 'v4.3/validation_int8.parquet']


## Check the Metadata
The `features.json` file contains metadata about features in the dataset including:
- statistics on each feature
- helpful sets of features
- the targets available for training

In [None]:
for metadata in feature_metadata:
  print(metadata, len(feature_metadata[metadata]))

feature_stats 2376
feature_sets 17
targets 41


## Check Numerai's Feature Sets

Starter sets Numerai offers:

- `small` contains a minimal subset of features that have the highest [feature importance](https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html)

- `medium` contains all the "basic" features, each unique in some way (e.g. P/E ratios vs analyst ratings)

- `all` contains all features in `medium` and their variants (e.g. P/E by country vs P/E by sector)

Note: `all` is too large for Google Collab free tier.

In [None]:
# View just the starter sets
feature_sets = feature_metadata["feature_sets"]
sizes = ["small", "medium", "all"]

for feature_set in sizes:
  print(feature_set, len(feature_sets[feature_set]))

small 42
medium 705
all 2376


In [None]:
# View all sets
for feature in feature_sets:
  print(feature, len(feature_sets[feature]))

small 42
medium 705
all 2376
v2_equivalent_features 304
v3_equivalent_features 1000
fncv3_features 400
intelligence 35
charisma 290
strength 135
dexterity 51
constitution 335
wisdom 140
agility 145
serenity 95
sunshine 325
rain 666
midnight 244


## Check Numerai's Feature Groups
Numerai has 8 feature groups: `intelligence`, `wisdom`, `charisma`, `dexterity`, `strength`, `constitution`, `agility`, `serenity`. Each group contains a different type of feature. For example, all technical signals would be in one group, while all analyst predictions and ratings would be in another group.

In [None]:
groups = [
  "intelligence",
  "wisdom",
  "charisma",
  "dexterity",
  "strength",
  "constitution",
  "agility",
  "serenity",
  "all"
]

# compile the intersections of feature sets and feature groups
subgroups = {}
for size in sizes:
    subgroups[size] = {}
    for group in groups:
        subgroups[size][group] = (
            set(feature_sets[size])
            .intersection(set(feature_sets[group]))
        )

# convert to data frame and display the feature count of each intersection
pd.DataFrame(subgroups).applymap(len).sort_values(by="all", ascending=False)

Unnamed: 0,small,medium,all
all,42,705,2376
constitution,2,134,335
charisma,3,116,290
agility,2,58,145
wisdom,3,56,140
strength,1,54,135
serenity,3,34,95
dexterity,4,21,51
intelligence,2,14,35


## Explore Numerai Auxillary Targets

In [None]:
feature_set = "medium" # select feature set
feature_cols = feature_metadata["feature_sets"][feature_set]
target_cols = feature_metadata["targets"]
train = pd.read_parquet(
    f"{DATA_VERSION}/train_int8.parquet",
    columns=["era"] + feature_cols + target_cols
)

# Downsample to every 4th era to reduce memory usage and speedup model training (suggested for Colab free tier)
# Comment out the line below to use all the data (higher memory usage, slower model training, potentially better performance)
train = train[train["era"].isin(train["era"].unique()[::4])]

# Print target columns
train[["era"] + target_cols]

Unnamed: 0_level_0,era,target,target_tyler_v4_20,target_tyler_v4_60,target_victor_v4_20,target_victor_v4_60,target_ralph_v4_20,target_ralph_v4_60,target_waldo_v4_20,target_waldo_v4_60,...,target_jeremy_v4_20,target_jeremy_v4_60,target_teager_v4_20,target_teager_v4_60,target_agnes_v4_20,target_agnes_v4_60,target_claudia_v4_20,target_claudia_v4_60,target_rowan_v4_20,target_rowan_v4_60
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
n003bba8a98662e4,0001,0.25,0.50,0.25,0.25,0.00,0.25,0.25,0.50,0.25,...,0.25,0.25,0.50,0.75,0.25,0.00,0.50,0.50,0.50,0.75
n003bee128c2fcfc,0001,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,1.00,...,0.75,1.00,1.00,0.75,1.00,1.00,1.00,0.75,1.00,0.75
n0048ac83aff7194,0001,0.25,0.50,0.50,0.50,0.25,0.50,0.25,0.50,0.25,...,0.50,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25
n00691bec80d3e02,0001,0.75,0.50,0.75,0.75,0.50,0.75,0.50,0.50,0.50,...,0.50,0.50,0.75,0.75,0.50,0.50,0.75,0.75,0.75,0.50
n00b8720a2fdc4f2,0001,0.50,0.75,0.75,0.75,0.50,0.50,0.50,0.50,0.50,...,0.50,0.50,0.50,0.50,0.50,0.50,0.50,0.50,0.50,0.50
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
nffc2d5e4b79a7ae,0573,0.25,0.25,0.50,0.25,0.50,0.25,0.50,0.00,0.50,...,0.00,0.25,0.00,0.25,0.00,0.50,0.25,0.50,0.00,0.25
nffc7d24176548a4,0573,0.50,0.50,0.25,0.50,0.50,0.50,0.25,0.50,0.25,...,0.50,0.25,0.50,0.50,0.50,0.25,0.50,0.25,0.25,0.50
nffc9844c1c7a6a9,0573,0.50,0.50,0.50,0.50,0.50,0.25,0.50,0.25,0.50,...,0.50,0.50,0.50,0.50,0.50,0.50,0.50,0.50,0.25,0.75
nffd79773f4109bb,0573,0.50,0.75,0.50,0.50,0.50,0.50,0.50,0.50,0.50,...,0.50,0.50,0.50,0.50,0.75,0.50,0.50,0.50,0.50,0.50


# Models

The models should have the model setup followed by a defined prediction function to pickle.

[Numerai Meta Models](https://forum.numer.ai/t/benchmark-models/6754)

## hello_numerai.ipynb Model

In [None]:
# Model imports
# https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html
import lightgbm as lgb

# Define the feature set
feature_set = feature_metadata["feature_sets"]["medium"]

# Load the training data
train = pd.read_parquet(
    f"{DATA_VERSION}/train_int8.parquet",
    columns=["era", "target"] + feature_set
)

# Downsample to every 4th era to reduce memory usage and speedup model training
train = train[train["era"].isin(train["era"].unique()[::4])]

# Define the model
# https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html
model = lgb.LGBMRegressor(
  n_estimators=2000,
  learning_rate=0.01,
  max_depth=5,
  num_leaves=2**5-1,
  colsample_bytree=0.1
)

# Fit the model
model.fit(
  train[feature_set],
  train["target"]
);

In [None]:
# Define the model prediction pipeline as a function
def predict(live_features: pd.DataFrame) -> pd.DataFrame:

    live_predictions = model.predict(live_features[feature_set])
    submission = pd.Series(live_predictions, index=live_features.index)

    return submission.to_frame("prediction")

## target_ensemble.ipynb Model

Forum posts on ensembling:
- https://forum.numer.ai/t/how-to-ensemble-models/4034
- https://forum.numer.ai/t/target-jerome-is-dominating-and-thats-weird/6513

In [None]:
# Model Imports
import lightgbm as lgb

# Load data
feature_cols = feature_metadata["feature_sets"]["medium"]
target_cols = feature_metadata["targets"]
train = pd.read_parquet(
    f"{DATA_VERSION}/train_int8.parquet",
    columns=["era"] + feature_cols + target_cols
)

# Downsample to every 4th era to reduce memory usage and speedup model training
train = train[train["era"].isin(train["era"].unique()[::4])]

# Drop `target` column
assert train["target"].equals(train["target_cyrus_v4_20"])
target_names = target_cols[1:]
targets_df = train[["era"] + target_names]

# pick a few 20-day target candidates
target_candidates = [
  "target_cyrus_v4_20",
  "target_victor_v4_20",
  "target_xerxes_v4_20",
  "target_teager_v4_20"
]

models = {}
for target in target_candidates:
    model = lgb.LGBMRegressor(
        n_estimators=2000,
        learning_rate=0.01,
        max_depth=5,
        num_leaves=2**4-1,
        colsample_bytree=0.1
    )
    model.fit(
        train[feature_cols],
        train[target]
    );
    models[target] = model

In [None]:
# Define the model prediction pipeline as a function
def predict(
    live_features: pd.DataFrame,
    live_benchmark_models: pd.DataFrame
) -> pd.DataFrame:
    favorite_targets = [
        'target_cyrus_v4_20',
        'target_teager_v4_20'
    ]
    # generate predictions from each model
    predictions = pd.DataFrame(index=live_features.index)
    for target in favorite_targets:
        predictions[target] = models[target].predict(live_features[feature_cols])
    # ensemble predictions
    ensemble = predictions.rank(pct=True).mean(axis=1)
    # format submission
    submission = ensemble.rank(pct=True, method="first")
    return submission.to_frame("prediction")

## feature_neutralization.ipynb Model

In [None]:
# Numerai's model upload framework does not currently include numerai-tools

def neutralize(
    df: pd.DataFrame,
    neutralizers: np.ndarray,
    proportion: float = 1.0,
) -> pd.DataFrame:
    """Neutralize each column of a given DataFrame by each feature in a given
    neutralizers DataFrame. Neutralization uses least-squares regression to
    find the orthogonal projection of each column onto the neutralizers, then
    subtracts the result from the original predictions.

    Arguments:
        df: pd.DataFrame - the data with columns to neutralize
        neutralizers: pd.DataFrame - the neutralizer data with features as columns
        proportion: float - the degree to which neutralization occurs

    Returns:
        pd.DataFrame - the neutralized data
    """
    assert not neutralizers.isna().any().any(), "Neutralizers contain NaNs"
    assert len(df.index) == len(neutralizers.index), "Indices don't match"
    assert (df.index == neutralizers.index).all(), "Indices don't match"
    df[df.columns[df.std() == 0]] = np.nan
    df_arr = df.values
    neutralizer_arr = neutralizers.values
    neutralizer_arr = np.hstack(
        # add a column of 1s to the neutralizer array in case neutralizer_arr is a single column
        (neutralizer_arr, np.array([1] * len(neutralizer_arr)).reshape(-1, 1))
    )
    inverse_neutralizers = np.linalg.pinv(neutralizer_arr, rcond=1e-6)
    adjustments = proportion * neutralizer_arr.dot(inverse_neutralizers.dot(df_arr))
    neutral = df_arr - adjustments
    return pd.DataFrame(neutral, index=df.index, columns=df.columns)

In [None]:
import lightgbm as lgb

medium_features = feature_metadata["feature_sets"]["medium"]
train = pd.read_parquet(
    f"{DATA_VERSION}/train_int8.parquet",
    columns=["era", "target"] + medium_features
)
# Downsample to every 4th era to reduce memory usage and speedup model training
train = train[train["era"].isin(train["era"].unique()[::4])]

model = lgb.LGBMRegressor(
    n_estimators=2000,
    learning_rate=0.01,
    max_depth=5,
    num_leaves=2**4-1,
    colsample_bytree=0.1
)
model.fit(
    train[medium_features],
    train["target"]
)

In [None]:
# Define the model prediction pipeline as a function
def predict(live_features: pd.DataFrame) -> pd.DataFrame:
    # make predictions using all features
    predictions = pd.DataFrame(
        model.predict(live_features[medium_features]),
        index=live_features.index,
        columns=["prediction"]
    )
    # neutralize predictions to a subset of features
    neutralized = neutralize(predictions, live_features[feature_subset])
    return neutralized.rank(pct=True)

## example_model.ipynb Model

In [None]:
# Install dependencies
!pip install -q numerapi pandas lightgbm cloudpickle pyarrow scikit-learn scipy==1.10.1

In [None]:
features = feature_metadata["feature_sets"]["medium"] # use "all" for better performance. Requires more RAM.
train = pd.read_parquet(f"{DATA_VERSION}/train_int8.parquet", columns=["era"]+features+["target"])

# For better models, join train and validation data and train on all of it.
# This would cause diagnostics to be misleading though.
# napi.download_dataset(f"{DATA_VERSION}/validation_int8.parquet");
# validation = pd.read_parquet(f"{DATA_VERSION}/validation_int8.parquet", columns=["era"]+features+["target"])
# validation = validation[validation["data_type"] == "validation"] # drop rows which don't have targets yet
# train = pd.concat([train, validation])

# Downsample for speed
train = train[train["era"].isin(train["era"].unique()[::4])]  # skip this step for better performance

# Train model
import lightgbm as lgb
model = lgb.LGBMRegressor(
    n_estimators=2000,  # If you want to use a larger model we've found 20_000 trees to be better
    learning_rate=0.01, # and a learning rate of 0.001
    max_depth=5, # and max_depth=6
    num_leaves=2**5-1, # and num_leaves of 2**6-1
    colsample_bytree=0.1
)
model.fit(
    train[features],
    train["target"]
)

In [None]:
# Define predict function
def predict(
    live_features: pd.DataFrame,
    live_benchmark_models: pd.DataFrame
 ) -> pd.DataFrame:
    live_predictions = model.predict(live_features[features])
    submission = pd.Series(live_predictions, index=live_features.index)
    return submission.to_frame("prediction")

## Model 1
**Goal:** Combine all three tutorial models.

In [None]:
# Neutralization function from feature_neutralization.ipynb
# Numerai's model upload framework does not currently include numerai-tools

def neutralize(
    df: pd.DataFrame,
    neutralizers: np.ndarray,
    proportion: float = 1.0,
) -> pd.DataFrame:
    """Neutralize each column of a given DataFrame by each feature in a given
    neutralizers DataFrame. Neutralization uses least-squares regression to
    find the orthogonal projection of each column onto the neutralizers, then
    subtracts the result from the original predictions.

    Arguments:
        df: pd.DataFrame - the data with columns to neutralize
        neutralizers: pd.DataFrame - the neutralizer data with features as columns
        proportion: float - the degree to which neutralization occurs

    Returns:
        pd.DataFrame - the neutralized data
    """
    assert not neutralizers.isna().any().any(), "Neutralizers contain NaNs"
    assert len(df.index) == len(neutralizers.index), "Indices don't match"
    assert (df.index == neutralizers.index).all(), "Indices don't match"
    df[df.columns[df.std() == 0]] = np.nan
    df_arr = df.values
    neutralizer_arr = neutralizers.values
    neutralizer_arr = np.hstack(
        # add a column of 1s to the neutralizer array in case neutralizer_arr is a single column
        (neutralizer_arr, np.array([1] * len(neutralizer_arr)).reshape(-1, 1))
    )
    inverse_neutralizers = np.linalg.pinv(neutralizer_arr, rcond=1e-6)
    adjustments = proportion * neutralizer_arr.dot(inverse_neutralizers.dot(df_arr))
    neutral = df_arr - adjustments
    return pd.DataFrame(neutral, index=df.index, columns=df.columns)

In [None]:
# Model Imports
import lightgbm as lgb

# Load data
feature_cols = feature_metadata["feature_sets"]["medium"]
target_cols = feature_metadata["targets"]
train = pd.read_parquet(
    f"{DATA_VERSION}/train_int8.parquet",
    columns=["era"] + feature_cols + target_cols
)

# Downsample to every 4th era to reduce memory usage and speedup model training
train = train[train["era"].isin(train["era"].unique()[::4])]

# Drop the regular `target` column
assert train["target"].equals(train["target_cyrus_v4_20"])
target_names = target_cols[1:]
targets_df = train[["era"] + target_names]

# pick 20-day target candidates
target_candidates = [
  "target_cyrus_v4_20",
  "target_victor_v4_20",
  "target_xerxes_v4_20",
  "target_teager_v4_20"
]

models = {}
# Train the ensemble models
for target in target_candidates:
    model = lgb.LGBMRegressor(
        n_estimators=2000,
        learning_rate=0.01,
        max_depth=5,
        num_leaves=2**4-1,
        colsample_bytree=0.1
    )
    model.fit(
        train[feature_cols],
        train[target]
    );
    models[target] = model


[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.153930 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3525
[LightGBM] [Info] Number of data points in the train set: 606176, number of used features: 705
[LightGBM] [Info] Start training from score 0.499979
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.168047 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3525
[LightGBM] [Info] Number of data points in the train set: 606176, number of used features: 705
[LightGBM] [Info] Start training from score 0.500008
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.149788 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not e

In [None]:
# Define the model prediction pipeline as a function
def predict(
    live_features: pd.DataFrame,
    live_benchmark_models: pd.DataFrame
) -> pd.DataFrame:
    favorite_targets = [
        'target_cyrus_v4_20',
        'target_teager_v4_20'
    ]
    feature_subset = list(subgroups["medium"][group])
    # generate predictions from each model
    predictions = pd.DataFrame(index=live_features.index)
    for target in favorite_targets:
        predictions[target] = models[target].predict(live_features[feature_cols])
    # neutralize predictions to a subset of features
    predictions = neutralize(predictions, live_features[feature_subset])
    # ensemble predictions
    ensemble = predictions.rank(pct=True).mean(axis=1)
    # format submission
    submission = ensemble.rank(pct=True, method="first")
    return submission.to_frame("prediction")

## Model 2
**Goal:** See if increasing the weight of the max_drawdown feature the more it deviates from the lowest_drawdown feature improves the model.

# Model Performance Evaluation

In [None]:
# install Numerai's open-source scoring tools
!pip install -q --no-deps numerai-tools

# import the 2 scoring functions
from numerai_tools.scoring import numerai_corr, correlation_contribution

In [None]:
# Download validation data
napi.download_dataset(f"{DATA_VERSION}/validation_int8.parquet");

# Load the validation data, filtering for data_type == "validation"
validation = pd.read_parquet(
    f"{DATA_VERSION}/validation_int8.parquet",
    columns=["era", "data_type"] + feature_cols + target_cols
)
validation = validation[validation["data_type"] == "validation"]
del validation["data_type"]

# Downsample every 4th era to reduce memory usage and speedup validation
validation = validation[validation["era"].isin(validation["era"].unique()[::4])]

NameError: name 'feature_cols' is not defined

For ensemble models:

In [None]:
# Embargo overlapping eras from training data
last_train_era = int(train["era"].unique()[-1])
eras_to_embargo = [str(era).zfill(4) for era in [last_train_era + i for i in range(4)]]
validation = validation[~validation["era"].isin(eras_to_embargo)]

# Generate validation predictions for each model
for target in target_candidates:
    validation[f"prediction_{target}"] = models[target].predict(validation[feature_cols])

pred_cols = [f"prediction_{target}" for target in target_candidates]
validation[pred_cols]

In [None]:
prediction_cols = [
    f"prediction_{target}"
    for target in target_candidates
]
correlations = validation.groupby("era").apply(
    lambda d: numerai_corr(d[prediction_cols], d["target"])
)
cumsum_corrs = correlations.cumsum()
cumsum_corrs.plot(
  title="Cumulative Correlation of validation Predictions",
  figsize=(10, 6),
  xticks=[]
)

The primary scoring metrics in Numerai are:

- `CORR` (or "Correlation") which is calculated by the function `numerai_corr` - a Numerai specific variant of the Pearson Correlation between your model and the target.

- `MMC` (or "Meta Model Contribution") which is a calculated by the function `correlation_contribution` - a measure of how uniquely additive your model is to the Numerai Meta Model.

On the Numerai website, `CORR` is referred to as `CORR20V2`, where the "20" refers to the 20-day return target and "v2" specifies that we are using the 2nd version of the scoring function.

It is important to score each historical `era` independantly. So when evaluating the performance of our model, the "per era" metrics should be looked at.


In [None]:
# Download and join in the meta_model for the validation eras
napi.download_dataset(f"{DATA_VERSION}/meta_model.parquet")
validation["meta_model"] = pd.read_parquet(
    f"{DATA_VERSION}/meta_model.parquet"
)["numerai_meta_model"]

In [None]:
# Compute the per-era corr between our predictions and the target values
per_era_corr = validation.groupby("era").apply(
    lambda x: numerai_corr(x[["prediction"]].dropna(), x["target"].dropna())
)

# Compute the per-era mmc between our predictions, the meta model, and the target values
per_era_mmc = validation.dropna().groupby("era").apply(
    lambda x: correlation_contribution(x[["prediction"]], x["meta_model"], x["target"])
)


# Plot the per-era correlation
per_era_corr.plot(
  title="Validation CORR",
  kind="bar",
  figsize=(8, 4),
  xticks=[],
  legend=False,
  snap=False
)
per_era_mmc.plot(
  title="Validation MMC",
  kind="bar",
  figsize=(8, 4),
  xticks=[],
  legend=False,
  snap=False
)

Instead of looking at the raw score for each era, it is helpful to look at the cumulative scores.

If you are familiar with "backtesting" in quant finance where people simulate the historical performance of their investment strategies, you can roughly think of this plot as a backtest of your model performance over the historical validation period.

Notice a few things below:

- CORR gradually increases over many eras of the validation data even with this simple model on modern data.

- MMC is generated over a smaller set of recent eras - this is because the validation time range pre-dates the Meta Model.

- MMC is very high early on in the Meta Model's existence, MMC - this is because the newest datasets were not available and models trained on the newest data are could have been very additive in the past.

- MMC is flat and decreasing recently because the Meta Model has started catching up to modern data sets and getting correlation has been difficult in recent eras.

In [None]:
# Plot the cumulative per-era correlation
per_era_corr.cumsum().plot(
  title="Cumulative Validation CORR",
  kind="line",
  figsize=(8, 4),
  legend=False
)
per_era_mmc.cumsum().plot(
  title="Cumulative Validation MMC",
  kind="line",
  figsize=(8, 4),
  legend=False
)

## Performance metrics

It is also helpful to compute some summary metrics over the entire validation period:

- `Mean` is the primary measure of your model's long-term performance.

- `Sharpe` is a measure of your model's consistency. In finance, the Sharpe ratio of an investment strategy measures risk adjusted returns. In Numerai, we compute sharpe as the mean divided by the standard deviation.

- `Max drawdown` is a measure of your model's risk. In finance, the max drawdown of an investment strategy is the largest loss suffered. In Numerai, we compute max drawdown as the maximum peak to trough drop in a cumulative score.

In [None]:
# Compute performance metrics
corr_mean = per_era_corr.mean()
corr_std = per_era_corr.std(ddof=0)
corr_sharpe = corr_mean / corr_std
corr_max_drawdown = (per_era_corr.cumsum().expanding(min_periods=1).max() - per_era_corr.cumsum()).max()

mmc_mean = per_era_mmc.mean()
mmc_std = per_era_mmc.std(ddof=0)
mmc_sharpe = mmc_mean / mmc_std
mmc_max_drawdown = (per_era_mmc.cumsum().expanding(min_periods=1).max() - per_era_mmc.cumsum()).max()

pd.DataFrame({
    "mean": [corr_mean, mmc_mean],
    "std": [corr_std, mmc_std],
    "sharpe": [corr_sharpe, mmc_sharpe],
    "max_drawdown": [corr_max_drawdown, mmc_max_drawdown]
}, index=["CORR", "MMC"]).T

### Benchmark Models

In [None]:
# download Numerai's benchmark models
napi.download_dataset(f"{DATA_VERSION}/validation_benchmark_models.parquet")
benchmark_models = pd.read_parquet(
    f"{DATA_VERSION}/validation_benchmark_models.parquet"
)
benchmark_models

MMC over the validation period may not be truly indicative of out-of-sample performance because models trained on newer targets perform so well and Numerai releases their predictions, so it's likely many users will begin to shift their models to include newer data and targets. By extension, the Meta Model will begin to include information from from these new targets. The Meta Model over the early validation period did not have access to newer data/targets and MMC over the validation period may be misleading.


In [None]:
validation["v42_teager_plus_cyrus"] = benchmark_models["v42_teager_plus_cyrus"]


per_era_mmc, cumsum_mmc, summary = get_mmc(validation, "v42_teager_plus_cyrus")
# plot the cumsum mmc performance
cumsum_mmc.plot(
  title="Contribution of Neutralized Predictions to Numerai's Teager Ensemble",
  figsize=(10, 6),
  xticks=[]
)

pd.set_option('display.float_format', lambda x: '%f' % x)
summary

Benchmark Model Contribution or `BMC` - measures the contribution of a models to all of Numerai's benchmark models

On the website, `BMC` measures a model's contribution to a weighted ensemble of all of Numerai's Benchmark Models. It tells how additive a model is to Numerai's known models and, by extension, how additive it might be to the Meta Model in the future.

##### Using an unweighted ensemble of Numerai's Benchmarks to measure a models' BMC

In [None]:
validation["numerai_benchmark"] = (
    benchmark_models
    .groupby("era")
    .apply(lambda x: x.mean(axis=1))
    .reset_index()
    .set_index("id")[0]
)

per_era_mmc, cumsum_mmc, summary = get_mmc(validation, "numerai_benchmark")
# plot the cumsum mmc performance
cumsum_mmc.plot(
  title="Cumulative BMC of Neutralized Predictions",
  figsize=(10, 6),
  xticks=[]
)

pd.set_option('display.float_format', lambda x: '%f' % x)
summary

# Model Submission

### Live predictions

Numerai evaluates models based based on <ins>live</ins> performance.

Every Tuesday-Saturday, new `live features` are released, which represent the current state of the stock market. The model needs to generate `live predictions` on the unknown target values, which represent stock market returns 20 days into the future.

## Model Predictions with Live Data
View an instance of the model's predictions with live data:

In [None]:
# Download latest live features
napi.download_dataset(f"{DATA_VERSION}/live_int8.parquet")

# Load live features
live_features = pd.read_parquet(f"{DATA_VERSION}/live_int8.parquet", columns=feature_set)

# Generate live predictions
live_predictions = model.predict(live_features[feature_set])

# Format submission
pd.Series(live_predictions, index=live_features.index).to_frame("prediction")

## Model Upload to Numerai
To participate in the tournament, live predictions must be submitted every Tuesday-Saturday.

To automate this process:
- Define the model prediction pipeline as a function
- Serialize the prediction function using the `cloudpickle` library
- Upload the model pickle file (predict.pkl) to Numerai
- Let Numerai run the model to submit live predictions every day

Read more about Model Uploads and other self-hosted automation options at [docs](https://docs.numer.ai/numerai-tournament/submissions#automation).

In [None]:
# Serialize the prediction function using the cloudpickle library
import cloudpickle

# Assuming the model's prediction function is named predict
p = cloudpickle.dumps(predict)

with open("predict.pkl", "wb") as f:
    f.write(p)

In [None]:
# Download the predict.pkl file if running in Google Colab
try:
    from google.colab import files
    files.download('predict.pkl')
except:
    pass

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>