This tutorial introduces core functionality of `NumerFrame`. This application is a good example of how `NumerFrame` can make it easier to implement Numerai specific applications.

In [1]:
from numerblox.numerframe import create_numerframe, NumerFrame
from numerblox.download import NumeraiClassicDownloader

  from .autonotebook import tqdm as notebook_tqdm


First, we download validation data using `NumeraiClassicDownloader`.

In [2]:
downloader = NumeraiClassicDownloader("numerframe_edu")
# Path variables
live_file = "v5.0/live.parquet"
live_save_path = f"{str(downloader.dir)}/{live_file}"
# Download only validation parquet file
downloader.download_single_dataset(live_file,
                                   dest_path=live_save_path)

No existing directory found at 'numerframe_edu'. Creating directory...
Downloading 'v5.0/live.parquet'.


2024-09-14 13:23:38,257 INFO numerapi.utils: starting download
numerframe_edu/v5.0/live.parquet: 7.73MB [00:02, 3.08MB/s]                            


Loading in data and initializing a `NumerFrame` takes one line of code. It will automatically recognize the data format such as `.csv` or `.parquet`.

In [3]:
# Initialize NumerFrame from parquet file path
dataf = create_numerframe(live_save_path)

All features of Pandas DataFrames can still be used in a `NumerFrame`.

In [4]:
dataf.head(2)

Unnamed: 0_level_0,era,data_type,feature_shaded_hallucinatory_dactylology,feature_itinerant_hexahedral_photoengraver,feature_prudent_pileate_oven,feature_subalpine_apothegmatical_ajax,feature_pistachio_atypical_malison,feature_symmetrical_spongy_tricentenary,feature_ungrounded_transpontine_winder,feature_aseptic_eely_hemiplegia,...,target_teager2b_20,target_teager2b_60,target_tyler_20,target_tyler_60,target_victor_20,target_victor_60,target_waldo_20,target_waldo_60,target_xerxes_20,target_xerxes_60
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
n000e5c2bb49b372,X,live,4,2,0,0,4,0,2,1,...,,,,,,,,,,
n001c81b540ef5d2,X,live,1,0,2,0,3,1,0,2,...,,,,,,,,,,


NumerFrame extends the Pandas DataFrame with convenient features for working with Numerai data.

For example, the `NumerFrame` groups columns and makes use of the fact that, for Numerai data, all feature column names start with `'feature'`, target columns start with `'target'`, etc. It also keeps track of the era column and parses it automatically for other parts of this library (`'era'` for Numerai Classic and `'date'` for Numerai Signals).

In [5]:
dataf.target_cols[-1]

'target_xerxes_60'

In [6]:
dataf.get_single_target_data.head(2)

Unnamed: 0_level_0,target
id,Unnamed: 1_level_1
n000e5c2bb49b372,
n001c81b540ef5d2,


In [7]:
dataf.feature_cols[0]

'feature_shaded_hallucinatory_dactylology'

In [8]:
dataf.get_feature_data.head(2)

Unnamed: 0_level_0,feature_shaded_hallucinatory_dactylology,feature_itinerant_hexahedral_photoengraver,feature_prudent_pileate_oven,feature_subalpine_apothegmatical_ajax,feature_pistachio_atypical_malison,feature_symmetrical_spongy_tricentenary,feature_ungrounded_transpontine_winder,feature_aseptic_eely_hemiplegia,feature_elemental_easier_alkalinity,feature_cycloid_zymotic_galloway,...,feature_sprucer_godlier_assembling,feature_venturesome_jesting_characterisation,feature_unstained_anhedonic_hetty,feature_vivisectional_latvian_dispensator,feature_pantheist_interramal_episcopalianism,feature_percurrent_deontic_sectionalisation,feature_myalgic_eulogistic_propagation,feature_pressor_chiropodial_hypertension,feature_diogenic_wooden_lout,feature_pleuritic_equipotent_loudmouth
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
n000e5c2bb49b372,4,2,0,0,4,0,2,1,1,0,...,0,0,4,4,4,4,0,0,0,0
n001c81b540ef5d2,1,0,2,0,3,1,0,2,2,0,...,2,2,2,2,2,2,2,2,2,2


In [9]:
dataf.prediction_cols

[]

`aux_cols` are all columns that are not a feature, target or prediction column.

In [10]:
dataf.aux_cols

['era', 'data_type']

In [11]:
dataf.get_aux_data.head(2)

Unnamed: 0_level_0,era,data_type
id,Unnamed: 1_level_1,Unnamed: 2_level_1
n000e5c2bb49b372,X,live
n001c81b540ef5d2,X,live


In [12]:
dataf.get_prediction_data.head(2)

n000e5c2bb49b372
n001c81b540ef5d2


In [13]:
dataf.meta.era_col

'era'

A split of features and target(s) can be retrieved in 1 line of code.

In [14]:
X, y = dataf.get_feature_target_pair(multi_target=True)

In [15]:
X.head(2)

Unnamed: 0_level_0,feature_shaded_hallucinatory_dactylology,feature_itinerant_hexahedral_photoengraver,feature_prudent_pileate_oven,feature_subalpine_apothegmatical_ajax,feature_pistachio_atypical_malison,feature_symmetrical_spongy_tricentenary,feature_ungrounded_transpontine_winder,feature_aseptic_eely_hemiplegia,feature_elemental_easier_alkalinity,feature_cycloid_zymotic_galloway,...,feature_sprucer_godlier_assembling,feature_venturesome_jesting_characterisation,feature_unstained_anhedonic_hetty,feature_vivisectional_latvian_dispensator,feature_pantheist_interramal_episcopalianism,feature_percurrent_deontic_sectionalisation,feature_myalgic_eulogistic_propagation,feature_pressor_chiropodial_hypertension,feature_diogenic_wooden_lout,feature_pleuritic_equipotent_loudmouth
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
n000e5c2bb49b372,4,2,0,0,4,0,2,1,1,0,...,0,0,4,4,4,4,0,0,0,0
n001c81b540ef5d2,1,0,2,0,3,1,0,2,2,0,...,2,2,2,2,2,2,2,2,2,2


In [16]:
y.head(2)

Unnamed: 0_level_0,target,target_agnes_20,target_agnes_60,target_alpha_20,target_alpha_60,target_bravo_20,target_bravo_60,target_caroline_20,target_caroline_60,target_charlie_20,...,target_teager2b_20,target_teager2b_60,target_tyler_20,target_tyler_60,target_victor_20,target_victor_60,target_waldo_20,target_waldo_60,target_xerxes_20,target_xerxes_60
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
n000e5c2bb49b372,,,,,,,,,,,...,,,,,,,,,,
n001c81b540ef5d2,,,,,,,,,,,...,,,,,,,,,,


------------------------------------------------------

After we are done we can easily clean up our downloaded data with one line of code called from the downloader.

In [21]:
# Clean up environment
downloader.remove_base_directory()

Path: '/Users/clepelaars/Desktop/crowdcent/repositories/numerblox/examples/numerframe_edu'
