In [16]:
# hide
from nbdev.showdoc import *

# Load model from Weights & Biases (wandb)

This tutorial is for people who are using [Weights & Biases (wandb)](https://wandb.ai/site) in their training pipeline and are looking for a convenient way to use saved models on W&B cloud to make predictions, evaluate and submit in a few lines of code.

To authenticate your W&B account you are given several options:
1. Run `wandb login` in terminal and follow instructions.
Configure global environment variable `'WANDB_API_KEY'`.
Run `wandb.init(project=PROJECT_NAME, entity=ENTITY_NAME)` and pass API key from [https://wandb.ai/authorize](https://wandb.ai/authorize)

Currently only Keras models (`.h5`) are supported for wandb loading in this framework. Future versions will include other formats like PyTorch support.

The first thing we do is download the current validation data and example predictions to evaluate against. This can be done in a few lines of code with `NumeraiClassicDownloader`.

In [17]:
import pandas as pd

from numerai_blocks.download import NumeraiClassicDownloader
from numerai_blocks.numerframe import create_numerframe
from numerai_blocks.model import WandbKerasModel
from numerai_blocks.evaluation import NumeraiClassicEvaluator

In [18]:
downloader = NumeraiClassicDownloader("wandb_keras_test")
# Path variables
val_file = "numerai_validation_data.parquet"
val_save_path = f"{str(downloader.dir)}/{val_file}"
# Download only validation parquet file
downloader.download_single_dataset(val_file,
                                   dest_path=val_save_path)
# Download example val preds
downloader.download_example_data()

# Initialize NumerFrame from parquet file path
dataf = create_numerframe(val_save_path)

# Add example preds to NumerFrame
example_preds = pd.read_parquet("wandb_keras_test/example_validation_predictions.parquet")
dataf['prediction_example'] = example_preds.values

2022-02-17 13:06:06,399 INFO numerapi.utils: starting download
wandb_keras_test/numerai_validation_data.parquet: 228MB [00:37, 6.09MB/s]                            


2022-02-17 13:06:45,643 INFO numerapi.utils: starting download
wandb_keras_test/example_predictions.parquet: 33.5MB [00:05, 5.82MB/s]                            


2022-02-17 13:06:52,976 INFO numerapi.utils: starting download
wandb_keras_test/example_validation_predictions.parquet: 13.0MB [00:02, 4.50MB/s]                            


--------------------------------------------------------------------

`WandbKerasModel` automatically downloads and loads in a `.h5` from a specified wandb run. The path for a run is specified in the ["Overview" tab](https://docs.wandb.ai/ref/app/pages/run-page#overview-tab) of the run.

The default name for the best model in a run is `model-best.h5`. If you want to use a model you have saved under a different name specify `file_name` for `WandbKerasModel` initialization.

The model will be downloaded to the directory you are working in. You will be warned if this directory contains models with the same filename. If these models can be overwritten specify `replace=True`.

`combine_preds=True` will average all columns in case you have trained a multi-target model.

`autoencoder_mlp=True` is specific to the case where your [model architecture includes an autoencoder](https://forum.numer.ai/t/autoencoder-and-multitask-mlp-on-new-dataset-from-kaggle-jane-street/4338) and therefore the output is a tuple of 3 tensors. `WandbKerasModel` will in this case take the third output of the tuple (target predictions).



In [19]:
run_path = "crowdcent/cc-numerai-classic/h4pwuxwu"
model = WandbKerasModel(run_path=run_path,
                        replace=True, combine_preds=True, autoencoder_mlp=True)

After initialization you can generate predictions with one line. `.predict` takes a `NumerFrame` as input and outputs a `NumerFrame` with a new prediction column. The prediction column name will be of the format `prediction_{RUN_PATH}`.

In [20]:
dataf = model.predict(dataf)

In [21]:
dataf.prediction_cols

['prediction_example', 'prediction_crowdcent/cc-numerai-classic/h4pwuxwu']

In [22]:
main_pred_col = f"prediction_{run_path}"
main_pred_col

'prediction_crowdcent/cc-numerai-classic/h4pwuxwu'

We can now use the output of the model to evaluate in 2 lines of code. Additionally, we can directly submit predictions to Numerai with this `NumerFrame`. Check out the educational notebook `submitting.ipynb` for more information on this.

In [23]:
evaluator = NumeraiClassicEvaluator()
val_stats = evaluator.full_evaluation(dataf=dataf,
                                      target_col="target",
                                      pred_cols=[main_pred_col,
                                                 "prediction_example"],
                                      example_col="prediction_example"
                                      )

Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

2022-02-17 13:07:38,565 INFO numexpr.utils: Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2022-02-17 13:07:38,565 INFO numexpr.utils: NumExpr defaulting to 8 threads.


The evaluator outputs a `pd.DataFrame` with most of the main validation metrics for Numerai. We welcome new ideas and metrics for Evaluators. See `nbs/07_evaluation.ipynb` in this repository for full Evaluator source code.

In [24]:
val_stats

Unnamed: 0,target,mean,std,sharpe,max_drawdown,apy,mmc_mean,mmc_std,mmc_sharpe,corr_with_example_preds,max_feature_exposure,feature_neutral_mean,feature_neutral_std,feature_neutral_sharpe,tb200_mean,tb200_std,tb200_sharpe,tb500_mean,tb500_std,tb500_sharpe
prediction_crowdcent/cc-numerai-classic/h4pwuxwu,target,0.022801,0.029303,0.778087,-0.111327,189.356908,0.006027,0.0146,0.675001,0.585234,0.306783,0.012025,0.015307,0.785587,0.054181,0.082155,0.659504,0.041855,0.054545,0.767353
prediction_example,target,0.025453,0.026586,0.957381,-0.082849,228.846183,-2.6e-05,0.000146,0.955276,0.999934,0.219134,0.017187,0.013747,1.250211,0.045748,0.058146,0.786766,0.041661,0.042485,0.980604


After we are done, downloaded files can be removed with one call on `NumeraiClassicDownloader` (optional).

In [25]:
# Clean up environment
downloader.remove_base_directory()

We hope this tutorial explained clearly to you how to load and predict with Weights & Biases (wandb) models.

Below you will find the full docs for `WandbKerasModel` and link to source code:

In [26]:
# hide_input
from nbdev import show_doc
show_doc(WandbKerasModel)

<h2 id="WandbKerasModel" class="doc_header"><code>class</code> <code>WandbKerasModel</code><a href="https://github.com/crowdcent/numerai_blocks/tree/main/numerai_blocks/model.py#L162" class="source_link" style="float:right">[source]</a></h2>

> <code>WandbKerasModel</code>(**`run_path`**:`str`, **`file_name`**:`str`=*`'model-best.h5'`*, **`combine_preds`**=*`False`*, **`autoencoder_mlp`**=*`False`*, **`replace`**=*`False`*) :: [`SingleModel`](/numerai_blocks/model.html#SingleModel)

Download best .h5 model from Weights & Biases (W&B) run in local directory and make predictions.
More info on W&B: https://wandb.ai/site
:param run_path: W&B path structured as entity/project/run_id.
Can be copied from the Overview tab of a W&B run.
For more info: https://docs.wandb.ai/ref/app/pages/run-page#overview-tab
Entity, project and id can be found in Overview tab of W&B run.
:param file_name: Name of .h5 file as saved in W&B run.
'model-best.h5' by default.
File name can be found under files tab of W&B run.
:param combine_preds: Whether to average predictions along column axis.
Convenient when you want to predict the main target by averaging a multi-target model.
:param autoencoder_mlp: Whether your model is an autoencoder + MLP model.
Will take the 3rd of tuple output in this case. Only relevant for NN models.
More info on autoencoders:
https://forum.numer.ai/t/autoencoder-and-multitask-mlp-on-new-dataset-from-kaggle-jane-street/4338
:param replace: Replace any model files saved under the same file name with downloaden W&B run model. WARNING: Setting to True may overwrite models in your local environment.

To authenticate your W&B account you are given several options:
1. Run wandb login in terminal and follow instructions.
2. Configure global environment variable "WANDB_API_KEY".
3. Run wandb.init(project=PROJECT_NAME, entity=ENTITY_NAME) and
pass API key from https://wandb.ai/authorize