# Introduction

Previous iterations of this project attempted to reduce everything to a single command-line script that could read a configuration file, run the corresponding model or models, out output the results.

This notebook takes a different approach. It loads and uses some of the utiltiies for e.g. loading data, making submissions, computing metrics, and handling configurations, but the training loop should be defined fully in the notebook. This will allow for faster iteration and more transparent model comparison. Models to be consistently re-used can be repackaged to run from the command line if that would be beneficial.

## Imports and Setup

In [1]:
import copy
import os
from pathlib import Path
if Path().resolve().name != 'numerai':
    os.chdir(Path().resolve().parent)

!pip install --upgrade -I git+https://github.com/djliden/config-nest.git

import confignest.confignest
from pathlib import Path
import pandas as pd
import numpy as np
import time
from importlib import import_module
import time
from tqdm import tqdm
import yaml

import src.utils.setup
import src.utils.cross_val
import src.utils.eval
import src.utils.metrics
import gc
np.random.seed(623)    

Collecting git+https://github.com/djliden/config-nest.git
  Cloning https://github.com/djliden/config-nest.git to /tmp/pip-req-build-ig1rh1_j
  Running command git clone -q https://github.com/djliden/config-nest.git /tmp/pip-req-build-ig1rh1_j
Collecting pyyaml
  Using cached PyYAML-5.4.1-cp39-cp39-manylinux1_x86_64.whl (630 kB)
Building wheels for collected packages: confignest
  Building wheel for confignest (setup.py) ... [?25ldone
[?25h  Created wheel for confignest: filename=confignest-0.0.1-py3-none-any.whl size=4507 sha256=7482fcc340db5d60986d73d080f7d84b08fb24add37f08a7413af8dd142e0ab6
  Stored in directory: /tmp/pip-ephem-wheel-cache-xpdivqdi/wheels/29/69/48/50dde3920f402fc55419a3f1320199ea8556e31ed2e41417a6
Successfully built confignest
Installing collected packages: pyyaml, confignest
Successfully installed confignest-0.0.1 pyyaml-5.4.1


## NumerAPI setup and Data Download

In [2]:
src.utils.setup.credential()
napi = src.utils.setup.init_numerapi()

round = napi.get_current_round()
train = Path(f"./input/numerai_dataset_{round}/numerai_training_data.csv")
tourn = Path(f"./input/numerai_dataset_{round}/numerai_tournament_data.csv")
processed = Path(f'./input/training_processed_{round}.csv')
processed_pkl = Path(f'./input/training_processed_{round}.pkl')
output = Path("./output/")

src.utils.setup.download_current(napi=napi)
training_data, feature_cols, target_cols = src.utils.setup.process_current(processed,
                                                           processed_pkl, train, tourn)

Loaded Numerai Public Key into Global Environment!
Loaded Numerai Secret Key into Global Environment!
The dataset has already been downloaded.
You can re-download it with refresh = True
Loading the pickled training data from file



## Initial Configuration Setup

In [3]:
default_config = Path("./src/config/default_config.yaml")
cfg = confignest.confignest.Config(default_config)
cfg

Config Object with Keys:
CV:
  GAP: 0
  TRAIN_START: 0
  TRAIN_STOP: null
  VAL_END: 210
  VAL_N_ERAS: 4
  VAL_START: 206
DATA:
  REFRESH: false
  SAVE_PROCESSED_TRAIN: true
EVAL:
  CHUNK_SIZE: 1000000
  SAVE_PREDS: true
  SUBMIT_PREDS: false
SYSTEM:
  DEBUG: false

# Model Definition
This involves two components: the "default" model configuration and the definition of the model itself. In this case, we'll look at a Lasso model (to continue exploring the unreasonably good effectiveness of these models).

In [4]:
# Model Configuration
mod_cfg = {'MODEL': {
    'mod': 'Lasso',
    'alpha': .0005
}}
cfg.update_config(mod_cfg)
cfg.MODEL

Config Object with Keys:
alpha: 0.0005
mod: Lasso

In [5]:
# Model Definition
import sklearn.linear_model
mod = getattr(sklearn.linear_model, cfg.MODEL.mod)(alpha=cfg.MODEL.alpha)

# Training Loop
Similarly, the training loop has (or can have) its own configuration. Though it might be more convenient to keep this outside the configuration. It depends.

In [6]:
cv_config = {'CV': {
    'TRAIN_STOP': 120,
    'VAL_START': 209
}}
cfg.update_config(cv_config)
cfg.CV

Config Object with Keys:
GAP: 0
TRAIN_START: 0
TRAIN_STOP: 120
VAL_END: 210
VAL_N_ERAS: 4
VAL_START: 209

In [7]:
# Sweep
from sklearn.model_selection import ParameterGrid
param_grid = {
    'alpha': np.linspace(.0001, .001, 5),
    'mod': ['Lasso'],
    'train_stop': np.arange(25, 132)    
}

In [10]:
era_split = src.utils.cross_val.EraCV(eras = training_data.era)

X, y, era = training_data[feature_cols], training_data.target, training_data.era
name = 'TEST'
logs = []
for params in ParameterGrid(param_grid):
    corrs = []
    sharpes = []
    cfg_update = {'MODEL': {'alpha': params['alpha'].item(),
                           'mod': params['mod']},
                 'RESULTS': {'mean_corr': None,
                            'mean_sharpe': None},
                 'CV': {'TRAIN_STOP':params['train_stop']}}
    cfg.update_config(cfg_update)
    mod = getattr(sklearn.linear_model, cfg.MODEL.mod)(alpha=cfg.MODEL.alpha)   

    for valid_era in tqdm(range(208,209)):
        train, test = era_split.get_splits(valid_start = valid_era,
                                           valid_n_eras = 4,
                                           train_start = cfg.CV.TRAIN_START,
                                           train_stop = cfg.CV.TRAIN_STOP)
        mod.fit(X.iloc[train], y.iloc[train])
        val_preds = mod.predict(X.iloc[test])
        eval_df = pd.DataFrame({'prediction':val_preds,
                            'target':y.iloc[test],
                            'era':era.iloc[test]}).reset_index()
        corrs.append(src.utils.metrics.val_corr(eval_df))
        sharpes.append(src.utils.metrics.sharpe(eval_df))
    print(f'\nmodel: {mod.__class__.__name__}')
    if mod.__class__.__name__!="LinearRegression":
        print(f'alpha: {mod.alpha}')
    if mod.__class__.__name__=="ElasticNet":
        print(f'L1 Ratio: {mod.l1_ratio}')
    print(f'train stop: {cfg.CV.TRAIN_STOP}')
    print(f'mean validation corr: {np.array(corrs).mean()}')
    print(f'mean validation sharpe: {np.array(sharpes).mean()}')
    results = {'RESULTS': {'mean_corr': np.array(corrs).mean().item(),
                          'mean_sharpe': np.array(sharpes).mean().item()}
              }
    cfg.update_config(results)
    logs.append(copy.deepcopy(cfg.config))

100%|██████████| 1/1 [00:03<00:00,  3.52s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0008642455229190497
mean validation sharpe: 0.02132967967098023


100%|██████████| 1/1 [00:02<00:00,  2.89s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0021485006793837847
mean validation sharpe: 0.05354602460769885


100%|██████████| 1/1 [00:05<00:00,  5.02s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0003556931830714441
mean validation sharpe: 0.00919042893814813


100%|██████████| 1/1 [00:06<00:00,  6.16s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0012663277440892982
mean validation sharpe: 0.03291229272942421


100%|██████████| 1/1 [00:07<00:00,  7.35s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0005675890839057507
mean validation sharpe: 0.015160143373518955


100%|██████████| 1/1 [00:07<00:00,  7.59s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.002426275310608234
mean validation sharpe: 0.05951983264223677


100%|██████████| 1/1 [00:07<00:00,  7.95s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0007765774670524285
mean validation sharpe: 0.018899917648136987


100%|██████████| 1/1 [00:07<00:00,  7.38s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0013817583749474362
mean validation sharpe: 0.03253933643156772


100%|██████████| 1/1 [00:04<00:00,  4.13s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0023567132836472826
mean validation sharpe: 0.05498533597032796


100%|██████████| 1/1 [00:04<00:00,  4.48s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0029103160404439447
mean validation sharpe: 0.07280837304510804


100%|██████████| 1/1 [00:07<00:00,  7.03s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.002484105775501507
mean validation sharpe: 0.06517909980723498


100%|██████████| 1/1 [00:13<00:00, 13.74s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0034659905613519375
mean validation sharpe: 0.08903435171537188


100%|██████████| 1/1 [00:12<00:00, 12.72s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0035437315642392817
mean validation sharpe: 0.08229070837775151


100%|██████████| 1/1 [00:09<00:00,  9.42s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0026653697836884083
mean validation sharpe: 0.0641684649732623


100%|██████████| 1/1 [00:08<00:00,  8.01s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.003608655470907827
mean validation sharpe: 0.0867082277430618


100%|██████████| 1/1 [00:07<00:00,  7.93s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0022916021974798088
mean validation sharpe: 0.05779048104210449


100%|██████████| 1/1 [00:10<00:00, 10.03s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0014595812179043662
mean validation sharpe: 0.03894571343260078


100%|██████████| 1/1 [00:08<00:00,  8.76s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0017284712299475286
mean validation sharpe: 0.047968020501432354


100%|██████████| 1/1 [00:08<00:00,  8.81s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0008439358112685529
mean validation sharpe: 0.02224634317223394


100%|██████████| 1/1 [00:09<00:00,  9.17s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0016998557498441695
mean validation sharpe: 0.04390509878621877


100%|██████████| 1/1 [00:06<00:00,  7.00s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0022945966574472645
mean validation sharpe: 0.05654796938010011


100%|██████████| 1/1 [00:06<00:00,  6.09s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0012982119984624833
mean validation sharpe: 0.033972940833696445


100%|██████████| 1/1 [00:05<00:00,  5.87s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0020454985999369575
mean validation sharpe: 0.05395795749446915


100%|██████████| 1/1 [00:12<00:00, 12.68s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.004238125269086566
mean validation sharpe: 0.11395964437872405


100%|██████████| 1/1 [00:23<00:00, 23.73s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.005424658043742795
mean validation sharpe: 0.14316754398109777


100%|██████████| 1/1 [00:09<00:00,  9.35s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.006222233946375574
mean validation sharpe: 0.1527333919600322


100%|██████████| 1/1 [00:08<00:00,  8.24s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.006417801304448764
mean validation sharpe: 0.14955572962078267


100%|██████████| 1/1 [00:07<00:00,  7.23s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.006990357704402497
mean validation sharpe: 0.16898539740902482


100%|██████████| 1/1 [00:07<00:00,  7.24s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.006987881714181609
mean validation sharpe: 0.16484442710193978


100%|██████████| 1/1 [00:10<00:00, 10.39s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.006720806009109605
mean validation sharpe: 0.14630770464407747


100%|██████████| 1/1 [00:08<00:00,  8.20s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.005497978986243258
mean validation sharpe: 0.1191354763233436


100%|██████████| 1/1 [00:06<00:00,  6.23s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.004609242763983546
mean validation sharpe: 0.09752610284199328


100%|██████████| 1/1 [00:10<00:00, 10.62s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.0048046014232852995
mean validation sharpe: 0.10154017851314974


100%|██████████| 1/1 [00:07<00:00,  7.43s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.003767937955538562
mean validation sharpe: 0.08156784526645254


100%|██████████| 1/1 [00:08<00:00,  8.61s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.00430239845444875
mean validation sharpe: 0.09441182187951588


100%|██████████| 1/1 [00:08<00:00,  8.38s/it]
  0%|          | 0/1 [00:00<?, ?it/s]


model: Lasso
alpha: 0.0001
train stop: 31
mean validation corr: 0.004279841781371575
mean validation sharpe: 0.09936301035066468


  0%|          | 0/1 [00:16<?, ?it/s]


KeyboardInterrupt: 

# Parse Output Files
This part is still experimental.

In [42]:
ct = time.localtime()
current_time = f'{ct[0]}_{ct[1]}_{ct[2]}_{ct[3]}{ct[4]}_{ct[5]}'
outfile = f'logs/log_{name}_{current_time}.yaml'
stream = open(output/outfile, 'w')
yaml.dump_all(logs, stream)
stream.close()

X = yaml.load_all(open(output/outfile, 'r'), Loader=yaml.SafeLoader)

In [58]:
stop_eras = []; corrs = []; sharpes = []; alphas = []
for dict in X:
    stop_eras.append(dict['CV']['TRAIN_STOP'])
    alphas.append(dict['MODEL']['alpha'])
    corrs.append(dict['RESULTS']['mean_corr'])
    sharpes.append(dict['RESULTS']['mean_sharpe'])

In [65]:
results_summary = pd.DataFrame({'model': mods,
                               'alpha': alphas,
                               'Mean Corr': corrs,
                               'Mean Sharpe': sharpes})
results_summary.sort_values(['model', 'alpha'])

Unnamed: 0,model,alpha,Mean Corr,Mean Sharpe
0,Lasso,0.000325,0.007055,0.146216
2,Lasso,0.00055,0.008793,0.184469
4,Lasso,0.000775,0.007247,0.153249
6,Lasso,0.001,0.005433,0.117872
1,Ridge,0.000325,0.005279,0.132402
3,Ridge,0.00055,0.005279,0.132402
5,Ridge,0.000775,0.005279,0.132402
7,Ridge,0.001,0.005279,0.132403


In [76]:
era_split.eras.unique()

array(['era1', 'era2', 'era3', 'era4', 'era5', 'era6', 'era7', 'era8',
       'era9', 'era10', 'era11', 'era12', 'era13', 'era14', 'era15',
       'era16', 'era17', 'era18', 'era19', 'era20', 'era21', 'era22',
       'era23', 'era24', 'era25', 'era26', 'era27', 'era28', 'era29',
       'era30', 'era31', 'era32', 'era33', 'era34', 'era35', 'era36',
       'era37', 'era38', 'era39', 'era40', 'era41', 'era42', 'era43',
       'era44', 'era45', 'era46', 'era47', 'era48', 'era49', 'era50',
       'era51', 'era52', 'era53', 'era54', 'era55', 'era56', 'era57',
       'era58', 'era59', 'era60', 'era61', 'era62', 'era63', 'era64',
       'era65', 'era66', 'era67', 'era68', 'era69', 'era70', 'era71',
       'era72', 'era73', 'era74', 'era75', 'era76', 'era77', 'era78',
       'era79', 'era80', 'era81', 'era82', 'era83', 'era84', 'era85',
       'era86', 'era87', 'era88', 'era89', 'era90', 'era91', 'era92',
       'era93', 'era94', 'era95', 'era96', 'era97', 'era98', 'era99',
       'era100', 'er