# Notebook displaying usage of the three main functionalities of *ritme*

Note: the provided data paths are not available in this repos. Adjust these paths to your respective files to make the examples run. An example dataset is coming soon.

## CLI example usage

run from within this folder `experiments`

````
ritme split-train-test data_splits data/metadata_proc_v20240323_r0_r3_le_2yrs.tsv data/all_otu_table_filt.qza --group-by-column host_id --seed 12

ritme find-best-model-config ../config/r_local_linreg.json data_splits/train_val.pkl --path-to-tax data/otu_taxonomy_all.qza --path-to-tree-phylo data/silva-138-99-rooted-tree.qza --path-store-model-logs ritme_refact_logs

ritme evaluate-tuned-models ritme_refact_logs/r_local_linreg data_splits/train_val.pkl data_splits/test.pkl
````

You can find the trained best model objects (= best model with the best feature engineering method) in this path:

`{path-store-model-logs}/{config["experiment_tag"]}/*_best_model.pkl`

where 
* `path-store-model-logs`: is the parameter provided to `find-best-model-config`
* `config["experiment_tag"]`: is the experiment tag specified in the experiment config file (in this example in file `../config/r_local_linreg.json`)

Note the path is also printed upon running `find-best-model-config` under "Best model configurations were saved in".

You can now use the specified best model and create predictions on the train and test set in Python with:

```
import pandas as pd
import pickle

# paths saved train/test splits & best model we are interested in
path_to_train = "data_splits/train_val.pkl"
path_to_test = "data_splits/test.pkl"
path_to_best_linreg = "ritme_refact_logs/r_local_linreg/linreg_best_model.pkl"

# read data
train_cli = pd.read_pickle(path_to_train)
test_cli = pd.read_pickle(path_to_test)

# read best linreg model
with open(path_to_best_linreg, "rb") as file:
    best_linreg_cli = pickle.load(file)

# perform prediction with best linreg model on test and train
test_predictions = best_linreg_cli.predict(test_cli, "test")
train_predictions = best_linreg_cli.predict(train_cli, "train")
```

## Python API example usage

In [None]:
import os

from ritme.evaluate_tuned_models import evaluate_tuned_models
from ritme.find_best_model_config import (
    _load_experiment_config,
    _load_phylogeny,
    _load_taxonomy,
    find_best_model_config,
    save_best_models,
)
from ritme.split_train_test import _load_data, split_train_test

%load_ext autoreload
%autoreload 2

In [None]:
######## USER INPUTS ########
# set experiment configuration path
model_config_path = "../config/r_local_linreg_py.json"

# define path to feature table, metadata, phylogeny, and taxonomy
path_to_ft = "data/all_otu_table_filt.qza"
path_to_md = "data/metadata_proc_v20240323_r0_r3_le_2yrs.tsv"
path_to_phylo = "data/silva-138-99-rooted-tree.qza"
path_to_tax = "data/otu_taxonomy_all.qza"

# define train size
train_size = 0.8

# if you want to store the best models, define path where you want to store
# them, else set None
path_to_store_best_models = "best_models"
######## END USER INPUTS #####

In [None]:
config = _load_experiment_config(model_config_path)

### Read & split data

In [None]:
md, ft = _load_data(path_to_md, path_to_ft)
print(md.shape, ft.shape)

In [None]:
train_val, test = split_train_test(
    md,
    ft,
    group_by_column=config["group_by_column"],
    train_size=train_size,
    seed=config["seed_data"],
)

### Find best model config

In [None]:
tax = _load_taxonomy(path_to_tax)
phylo = _load_phylogeny(path_to_phylo)

best_model_dict, path_to_exp = find_best_model_config(
    config, train_val, tax, phylo, path_store_model_logs="ritme_refact_logs"
)

### Evaluate best models

In [None]:
metrics = evaluate_tuned_models(best_model_dict, config, train_val, test)
metrics

### Extracting trained best models

#### Get best models for further usage

The best models are stored in Python dictionary best_model_dict with model_type as keys and `TunedModel` objects as values. To extract a given best model and perform predictions with it, see below code excerpt:

In [None]:
# get best linreg model
best_linreg_model = best_model_dict["linreg"]
best_linreg_model

In [None]:
# perform prediction with best linreg model
test_predictions = best_linreg_model.predict(test, "test")
train_predictions = best_linreg_model.predict(train_val, "train")

#### Save all best models trained with Python API to disk

In [None]:
if path_to_store_best_models is not None:
    print(f"Saving best models to {path_to_store_best_models}...")
    if not os.path.exists(path_to_store_best_models):
        os.makedirs(path_to_store_best_models)

    save_best_models(best_model_dict, path_to_store_best_models)