## First Numerai Submission

Following example script listed [here](https://github.com/numerai/example-scripts/blob/master/example_model.py)

In [1]:
# import dependencies
import pandas as pd
from lightgbm import LGBMRegressor
import gc
import json
from pathlib import Path

from numerapi import NumerAPI

## 1. Download Tournament Items and Data

In [2]:
# setup api
napi = NumerAPI()

# get current round details
current_round = napi.get_current_round()
print(f'Current round: {current_round}')

Current round: 411


Tournament data changes every week so we specify the round in their name. Training and validation data only change periodically, so no need to download them every time.

In [3]:
# download datasets
print('Downloading dataset files...')
Path("./v4.1").mkdir(parents=False, exist_ok=True)

napi.download_dataset("v4.1/train.parquet")
napi.download_dataset("v4.1/validation.parquet")
napi.download_dataset("v4.1/live.parquet", f"v4.1/live_{current_round}.parquet")
napi.download_dataset("v4.1/validation_example_preds.parquet")
napi.download_dataset("v4.1/features.json")

napi.download_dataset("v4.1/live_example_preds.parquet")
napi.download_dataset("v4.1/meta_model.parquet")

Downloading dataset files...


2023-02-01 13:07:30,441 INFO numerapi.utils: target file already exists
2023-02-01 13:07:30,443 INFO numerapi.utils: download complete
2023-02-01 13:07:31,402 INFO numerapi.utils: target file already exists
2023-02-01 13:07:31,404 INFO numerapi.utils: download complete
2023-02-01 13:07:32,315 INFO numerapi.utils: target file already exists
2023-02-01 13:07:32,318 INFO numerapi.utils: download complete
2023-02-01 13:07:33,209 INFO numerapi.utils: target file already exists
2023-02-01 13:07:33,211 INFO numerapi.utils: download complete
2023-02-01 13:07:34,044 INFO numerapi.utils: target file already exists
2023-02-01 13:07:34,046 INFO numerapi.utils: download complete
2023-02-01 13:07:34,879 INFO numerapi.utils: target file already exists
2023-02-01 13:07:34,881 INFO numerapi.utils: download complete
2023-02-01 13:07:35,731 INFO numerapi.utils: target file already exists
2023-02-01 13:07:35,734 INFO numerapi.utils: download complete


## 2. Read Training Data

In [4]:
# Reading minimal training data
# read feature metadata, feature set, or all features
with open('v4.1/features.json', 'r') as f: 
    feature_metadata = json.load(f)

# all_features = list(feature_metadata["feature_stats"].keys()) 
# small_feature_set = feature_metadata["feature_sets"]["small"]
# medium_feature_set = feature_metadata["feature_sets"]["medium"]
features = feature_metadata['feature_sets']['medium']

# read in just those features along with era and target columns
read_columns = features + ['era', 'data_type', 'target_nomi_v4_20']

## 3. Read Data from Downloaded Parquet Files

Sometimes when reading downloaded data, an error is raised about invalid magic parquet bytes. If this occurs, delete the file and rerun the napi.download_dataset to fix the corrupted file

In [7]:
# read data from parquet files
training_data = pd.read_parquet('v4.1/train.parquet', columns=read_columns)
validation_data = pd.read_parquet('v4.1/validation.parquet', columns=read_columns)
live_data = pd.read_parquet(f'v4.1/live_{current_round}.parquet', columns=read_columns)

# 4. Get Per Era Correlations