# Ensemble with NumerBay

In this notebook we are going to make a simple ensemble Numerai predictions file from predictions bought on NumerBay, using the `numerbay` Python client.

Ensembling tends to lower variance and improve accuracy, and we want these for the Numerai tournaments.

Replace with your credentials below and change `products_to_ensemble` to the full names of the prediction files you have bought on NumerBay.
Products you buy need to be listed in "File" mode so that you can download the files.

In [1]:
# Install the Python client if you have not (uncomment below)

# !pip install numerbay

In [2]:
import pandas as pd
from numerbay import NumerBay

api = NumerBay(username="myusername", password="mypassword")

## Download predictions from NumerBay

I bought two products: `numerai-predictions-numerbay` and `numerai-predictions-numerbay2`

In [3]:
products_to_ensemble = ["numerai-predictions-numerbay", "numerai-predictions-numerbay2"]

for product_name in products_to_ensemble:
    api.download_artifact(f"{product_name}.csv", product_full_name=product_name)

2022-01-09 07:26:56,369 INFO numerbay.utils: starting download
numerai-predictions-numerbay.csv: 47.6MB [00:03, 12.7MB/s]                            
2022-01-09 07:27:04,072 INFO numerbay.utils: starting download
numerai-predictions-numerbay2.csv: 35.3MB [00:03, 9.34MB/s]                            


## Read downloaded predictions

In [4]:
all_preds = [pd.read_csv(f"{product_name}.csv", index_col=0).add_suffix(f"_{product_name}") for product_name in products_to_ensemble]

concat_preds = pd.concat(all_preds, axis=1, names=products_to_ensemble)

In [5]:
concat_preds

Unnamed: 0_level_0,prediction_numerai-predictions-numerbay,prediction_numerai-predictions-numerbay2
id,Unnamed: 1_level_1,Unnamed: 2_level_1
n0003aa52cab36c2,0.48919,
n000920ed083903f,0.49109,
n0038e640522c4a6,0.53275,
n004ac94a87dc54b,0.50717,
n0052fe97ea0c05f,0.50383,
...,...,...
nffcf1b2f7ae1bcc,0.48928,0.06814
nffcf5878d59ce3a,0.48988,0.61441
nffdfeb228cda39f,0.49429,0.37674
nffe33484b9de099,0.51979,0.31049


We can see that the two predictions files are not aligned. This is because the first one is a legacy submission file and the second one is a v2 submission file.

We would need to drop the NaNs so that the predictions are aligned.

In [6]:
concat_preds = concat_preds.dropna(how='any')

In [7]:
concat_preds

Unnamed: 0_level_0,prediction_numerai-predictions-numerbay,prediction_numerai-predictions-numerbay2
id,Unnamed: 1_level_1,Unnamed: 2_level_1
n000101811a8a843,0.50016,0.03255
n001e1318d5072ac,0.50311,0.33838
n002a9c5ab785cbb,0.50197,0.48398
n002ccf6d0e8c5ad,0.51080,0.96371
n0051ab821295c29,0.48718,0.22107
...,...,...
nffcf1b2f7ae1bcc,0.48928,0.06814
nffcf5878d59ce3a,0.48988,0.61441
nffdfeb228cda39f,0.49429,0.37674
nffe33484b9de099,0.51979,0.31049


## Ensemble by simple average

For demo purpose we do a simple average ensemble here. You can of course try other methods such as rank-averaged predictions, etc.

In [8]:
ensemble_preds = concat_preds.mean(axis=1).rename('prediction').to_frame()

In [9]:
ensemble_preds

Unnamed: 0_level_0,prediction
id,Unnamed: 1_level_1
n000101811a8a843,0.266355
n001e1318d5072ac,0.420745
n002a9c5ab785cbb,0.492975
n002ccf6d0e8c5ad,0.737255
n0051ab821295c29,0.354125
...,...
nffcf1b2f7ae1bcc,0.278710
nffcf5878d59ce3a,0.552145
nffdfeb228cda39f,0.435515
nffe33484b9de099,0.415140


In [10]:
ensemble_preds.to_csv('ensemble.csv')
