In [None]:
#| include: false
from nbdev.showdoc import *

This tutorial is for people who are using [Weights & Biases (wandb)](https://wandb.ai/site) `WandbCallback` in their training pipeline and are looking for a convenient way to use saved models on W&B cloud to make predictions, evaluate and submit in a few lines of code.

Currently only Keras models (`.h5`) are supported for wandb loading in this framework. Future versions will include other formats like PyTorch support.

---------------------------------------------------------------------
## 0. Authentication

To authenticate your W&B account you are given several options:
1. Run `wandb login` in terminal and follow instructions.
2. Configure global environment variable `'WANDB_API_KEY'`.
3. Run `wandb.init(project=PROJECT_NAME, entity=ENTITY_NAME)` and pass API key from [https://wandb.ai/authorize](https://wandb.ai/authorize)

-----------------------------------------------------
## 1. Download validation data

The first thing we do is download the current validation data and example predictions to evaluate against. This can be done in a few lines of code with `NumeraiClassicDownloader`.

In [None]:
#| other
import pandas as pd

from numerblox.download import NumeraiClassicDownloader
from numerblox.numerframe import create_numerframe
from numerblox.model import WandbKerasModel
from numerblox.evaluation import NumeraiClassicEvaluator

In [None]:
#| other
downloader = NumeraiClassicDownloader("wandb_keras_test")
# Path variables
val_file = "v4.1/validation.parquet"
val_save_path = f"{str(downloader.dir)}/{val_file}"
# Download only validation parquet file
downloader.download_single_dataset(val_file,
                                   dest_path=val_save_path)
# Download example val preds
downloader.download_example_data()

# Initialize NumerFrame from parquet file path
dataf = create_numerframe(val_save_path)

# Add example preds to NumerFrame
example_preds = pd.read_parquet("wandb_keras_test/validation_example_preds.parquet")
dataf['prediction_example'] = example_preds.values

2023-01-04 20:20:28,273 INFO numerapi.utils: target file already exists
2023-01-04 20:20:28,274 INFO numerapi.utils: download complete


2023-01-04 20:20:28,917 INFO numerapi.utils: target file already exists
2023-01-04 20:20:28,918 INFO numerapi.utils: download complete


2023-01-04 20:20:29,554 INFO numerapi.utils: target file already exists
2023-01-04 20:20:29,555 INFO numerapi.utils: download complete


--------------------------------------------------------------------
## 2. Predict (WandbKerasModel)

`WandbKerasModel` automatically downloads and loads in a `.h5` from a specified wandb run. The path for a run is specified in the ["Overview" tab](https://docs.wandb.ai/ref/app/pages/run-page#overview-tab) of the run.

- `file_name`: The default name for the best model in a run is `model-best.h5`. If you want to use a model you have saved under a different name specify `file_name` for `WandbKerasModel` initialization.


- `replace`: The model will be downloaded to the directory you are working in. You will be warned if this directory contains models with the same filename. If these models can be overwritten specify `replace=True`.


- `combine_preds`: Setting this to True will average all columns in case you have trained a multi-target model.


- `autoencoder_mlp:` This argument is for the case where your [model architecture includes an autoencoder](https://forum.numer.ai/t/autoencoder-and-multitask-mlp-on-new-dataset-from-kaggle-jane-street/4338) and therefore the output is a tuple of 3 tensors. `WandbKerasModel` will in this case take the third output of the tuple (target predictions).



In [None]:
#| other
run_path = "crowdcent/cc-numerai-classic/h4pwuxwu"
model = WandbKerasModel(run_path=run_path,
                        replace=True, combine_preds=True, autoencoder_mlp=True)

After initialization you can generate predictions with one line. `.predict` takes a `NumerFrame` as input and outputs a `NumerFrame` with a new prediction column. The prediction column name will be of the format `prediction_{RUN_PATH}`.

In [None]:
#| other
# dataf = model.predict(dataf)
# dataf.prediction_cols

In [None]:
#| other
main_pred_col = f"prediction_{run_path}"
main_pred_col

'prediction_crowdcent/cc-numerai-classic/h4pwuxwu'

After we are done, downloaded files can be removed with one call on `NumeraiClassicDownloader` (optional).

In [None]:
#| other
# Clean up environment
downloader.remove_base_directory()

------------------------------------------------------------------
We hope this tutorial explained clearly to you how to load and predict with Weights & Biases (wandb) models.

Below you will find the full docs for `WandbKerasModel` and link to the source code:

In [None]:
#| other
#| echo: false
show_doc(WandbKerasModel)

---

[source](https://github.com/crowdcent/numerblox/tree/master/blob/master/numerblox/model.py#LNone){target="_blank" style="float:right; font-size:smaller"}

### WandbKerasModel

>      WandbKerasModel (run_path:str, file_name:str='model-best.h5',
>                       combine_preds=False, autoencoder_mlp=False,
>                       replace=False, feature_cols:list=None)

Download best .h5 model from Weights & Biases (W&B) run in local directory and make predictions.
More info on W&B: https://wandb.ai/site

:param run_path: W&B path structured as entity/project/run_id.
Can be copied from the Overview tab of a W&B run.
For more info: https://docs.wandb.ai/ref/app/pages/run-page#overview-tab 

:param file_name: Name of .h5 file as saved in W&B run.
'model-best.h5' by default.
File name can be found under files tab of W&B run. 

:param combine_preds: Whether to average predictions along column axis. Convenient when you want to predict the main target by averaging a multi-target model. 

:param autoencoder_mlp: Whether your model is an autoencoder + MLP model.
Will take the 3rd of tuple output in this case. Only relevant for NN models. 

More info on autoencoders:
https://forum.numer.ai/t/autoencoder-and-multitask-mlp-on-new-dataset-from-kaggle-jane-street/4338 

:param replace: Replace any model files saved under the same file name with downloaded W&B run model. WARNING: Setting to True may overwrite models in your local environment. 

:param feature_cols: optional list of features to use for prediction. Selects all feature columns (i.e. column names with prefix 'feature') by default.