In [1]:
# hide
%load_ext autoreload
%autoreload 2

In [2]:
# hide
from numerai_blocks.download import NumeraiClassicDownloader
from numerai_blocks.numerframe import create_numerframe
from numerai_blocks.postprocessing import FeatureNeutralizer
from numerai_blocks.model import SingleModel
from numerai_blocks.model_pipeline import ModelPipeline
from numerai_blocks.key import load_key_from_json
from numerai_blocks.submission import NumeraiClassicSubmittor

# Numerai Blocks

> Tools for solid Numerai pipelines

## 1. Install

`pip install`

## 2. How to use

### 2.1. Contents

Example and educational notebooks can be found in the `edu_nbs` directory. Development notebooks are in the `nbs` directory.

The library features the following tools to build your Numerai pipelines:

1. `download`
2. `numerframe`
3. `preprocessing`
4. `model`
5. `postprocessing`
6. `ModelPipeline` (and `ModelPipelineCollection`)
7. `evaluation`
8. `Key` (containing authentication info)
9. `NumeraiClassicSubmittor` and `NumeraiSignalsSubmittor`
10. `staking`

### 2.2. Examples

Below we will illustrate a common use case for inference pipelines. To learn more in-depth about the features of this library, check out notebooks in the `edu_nbs` directory.

#### 2.2.1. Numerai Classic

In [1]:
#other
#hide_output

# --- 1. Download version 2 data ---
downloader = NumeraiClassicDownloader("data")
downloader.download_inference_data("current_round")

# --- 2. Initialize NumerFrame ---
metadata = {"version": 2,
            "joblib_model_name": "test",
            "joblib_model_path": "test_assets/joblib_v2_example_model.joblib",
            "numerai_model_name": "test_model1",
            "key_path": "test_assets/test_credentials.json"
            }
dataf = create_numerframe(file_path="data/current_round/numerai_tournament_data.parquet",
                          metadata=metadata)

# --- 3. Define and run pipeline ---
model1 = SingleModel(dataf.meta.joblib_model_path,
                     model_name=dataf.meta.joblib_model_name)
# No preprocessing and 0.5 feature neutralization
pipeline = ModelPipeline(preprocessors=[],
                         models=[model1],
                         postprocessors=[FeatureNeutralizer(
                             pred_name=f"prediction_{dataf.meta.joblib_model_name}",
                             proportion=0.5
                         )]
                         )
dataset = pipeline(dataf)

# --- 4. Submit ---
# Random credentials
key = load_key_from_json(dataf.meta.key_path)
submittor = NumeraiClassicSubmittor(directory_path="sub_current_round", key=key)
# Only works with valid key credentials
submittor.full_submission(dataf=dataf,
                          cols=f"prediction_{dataf.meta.joblib_model_name}_neutralized_0.5",
                          file_name=f"{dataf.meta.numerai_model_name}.csv",
                          model_name=dataf.meta.numerai_model_name,
                          version=dataf.meta.version
                          )

# --- 5. Clean up environment (optional) ---
downloader.remove_base_directory()
submittor.remove_base_directory()

In [2]:
# hide_input
from rich.console import Console
from rich.tree import Tree

console = Console(record=True, width=100)

tree = Tree(":computer: Structure before starting", guide_style="bold bright_black")
model_tree = tree.add(":file_folder: test_assets")
model_tree.add(":page_facing_up: joblib_v2_example_model.joblib")
model_tree.add(":page_facing_up: test_credentials.json")

console.print(tree)

tree2 = Tree(":computer: Structure after submitting", guide_style="bold bright_black")
data_tree = tree2.add(":file_folder: data")
current_tree = data_tree.add(":file_folder: current_round")
current_tree.add(":page_facing_up: numerai_tournament_data.parquet")
sub_tree = tree2.add(":file_folder: sub_current_round")
sub_tree.add(":page_facing_up: test_model1.csv")
model_tree = tree.add(":file_folder: test_assets")
model_tree.add(":page_facing_up: joblib_v2_example_model.joblib")
model_tree.add(":page_facing_up: test_credentials.json")

console.print(tree2)

## Contributing

Below are a few guidelines to smooth out development of `numerai_blocks`.

Thanks a lot for wanting to help us out with this project! We are using a project setup called [nbdev](https://nbdev.fast.ai/) to easily develop code, documentation and tests within Jupyter notebooks. If you are only using the library you don't have to worry about this. Just pip install and you are good to go!

If you are thinking of contributing and are not familiar with nbdev, it may take some time to learn nbdev development. We are happy to help out and point you to documentation or videos to learn more.

If you are interested in the full scope of what nbdev has to offer, check out this tutorial with Jeremy Howard:
 [https://youtu.be/Hrs7iEYmRmg](https://youtu.be/Hrs7iEYmRmg).

Why are we using nbdev? To learn more about the rationale behind nbdev:
[https://youtu.be/9Q6sLbz37gk](https://youtu.be/9Q6sLbz37gk)

nbdev live coding example with Hamel Husain:
[https://youtu.be/ZJTop5uqC2U](https://youtu.be/ZJTop5uqC2U)



### Bugs / Issues / Enhancements.

Even though most of the components in this library are tested, the project is still in an early stage of development. If you discover bugs, other issues or ideas for enhancements, do not hesitate to make a Github issue. Describe in the issue what code was run on what machine and background on the issue. Add stacktraces and screenshots if this is relevant for solving the issue. Also, please define appropriate labels for the Github issue.

### Contributing Code

There are a few small things you should do before contributing code to this project. After you clone the repository, please run `nbdev_install_git_hooks` in your terminal. This sets up git hooks, which cleans up the notebooks to remove the extraneous stuff stored in the notebooks (e.g. which cells you ran). This avoids unnecessary merge conflicts.

Before pushing code to the branch you are working in, be sure to run `nbdev_build_lib` and `nbdev_build_docs` so all code is synced.



### Branch structure


Every new feature should be implemented in a branch that branches from `dev` and has the naming convention `feature/{FEATURE_DESCRIPTION}`. Explicit bugfixes should be names `bugfix/{FIX_DESCRIPTION}`. An example structure is given below.

In [3]:
# hide_input
console = Console(record=True, width=100)

tree = Tree("Branch structure", guide_style="bold bright_black")

main_tree = tree.add("📦 main (release)", guide_style="bright_black")
dev_tree = main_tree.add("👨‍💻 dev")
feature_tree = dev_tree.add(":sparkles: feature/ta-signals-features")
dev_tree.add(":sparkles: feature/news-api-downloader")
dev_tree.add(":sparkles: feature/staking-portfolio-management")
dev_tree.add(":sparkles: bugfix/evaluator-metrics-fix")

console.print(tree)

In [4]:
# hide
# Run this cell to sync all changes with library
from nbdev.export import notebook2script

notebook2script()

Converted 00_misc.ipynb.
Converted 01_download.ipynb.
Converted 02_numerframe.ipynb.
Converted 03_preprocessing.ipynb.
Converted 04_model.ipynb.
Converted 05_postprocessing.ipynb.
Converted 06_modelpipeline.ipynb.
Converted 07_evaluation.ipynb.
Converted 08_key.ipynb.
Converted 09_submission.ipynb.
Converted 10_staking.ipynb.
Converted index.ipynb.
