In [1]:
%matplotlib inline


# Meta-Learner


First check to have a directory like this:

```text
lib
├── data                        # Store data
│   ├─ dataset                  # Store datasets
│   ├─ metafeatures             # Store metafeatures
│   └─ model                    # Store trained ML models
├── images                      # Images for presentations, README, etc.
├── src                         # Actual code
│   ├─ test                     # Test code
│   ├─ utils                    # General utility code
│   ├─ out                      # Store eventual data
|   └─ config.py
|── main.py
|── Tutorial.ipynb
└── test.py
```

## Data
To train the meta-learner we first need the data.

In [None]:
from os.path import join
from src.config import DATASET_FOLDER 
from src.utils.metalearner import data_preparation 

# Just a directory where you've stored your CSV datasets.
prova = join(DATASET_FOLDER, 'prova') 

data_preparation(
    data_path=prova,
    data_selection = False,
    data_preprocess = True,
    metafeatures_extraction = True,
    model_training = True,
    quotient=True)

The 'data preparation' function can perform multiple functions: 
* **Data download**: downloads the datasets. If you already have a dataset you can disable it by setting `data_selection = False`
* **Data preprocessing**: performs all preprocessing of all datasets. If you have already done it you can disable it by setting `data_preprocess = False`
* **Metafeatures Extraction**: extract metafeatures from all datasets, preprocessed and not. If you have already done it you can disable it by setting `metafeatures_extraction = False`
* **Models Training**: train all models on all datasets, preprocessed and not. If you have already done it you can disable it by setting `model_training = False`

* **Quotient** is used to regulate the delta. If it's False the difference between the metrics is done by a quotient, else by a subtraction.

## Train

If you want to train on the delta of the performances

In [None]:
from os.path import join
from src.utils.metalearner import train_metalearner
from src.config import METAFEATURES_FOLDER


delta_path = join(METAFEATURES_FOLDER, "delta.csv")

train_metalearner(
    metafeatures_path = delta_path,
    algorithm='random_forest')

If you want to train on the raw data and than compute the difference (delta) after

In [None]:
from os.path import join
from src.utils.metalearner import choose_performance_from_metafeatures
from src.utils.metalearner import train_metalearner
from src.config import METAFEATURES_FOLDER

metafeatures_path = join(METAFEATURES_FOLDER, "metafeatures.csv")

choose_performance_from_metafeatures(
    metafeatures_path = metafeatures_path,
    metric='f1_score',
    copy_name='new_metafeatures.csv')

new_metafeatures_path = join(METAFEATURES_FOLDER, "new_metafeatures.csv")

train_metalearner(
    metafeatures_path = new_metafeatures_path,
    algorithm='random_forest')

To check if it's better to use delta_metafeatures or metafeatures we can use `delta_or_metafeatures`. 

In [None]:
from src.utils.metalearner import delta_or_metafeatures

delta_path = join(METAFEATURES_FOLDER, "delta.csv")
metafeatures_path = join(METAFEATURES_FOLDER, "metafeatures.csv")
delta_or_metafeatures(delta_path=delta_path, metafeatures_path=metafeatures_path)