# Learning-Module Tutorial − Training an XGBoost classification model using MEDimage package

@Author : [MEDomics consortium](https://github.com/medomics/)

@Email : medomics.info@gmail.com


**STATEMENT**:
This file is part of <https://github.com/MEDomics/MEDomicsLab/>,
a package providing PYTHON programming tools for radiomics analysis.
--> Copyright (C) MEDomicsLab consortium.

**Warning**: Do not run all cells at once, run the cells accodring to the following steps:
- Run the first two to create the data splits
- Move the features folder to `path_data`
- Run the last cell

In [None]:
import os
import sys
from pathlib import Path

MODULE_DIR = os.path.dirname(os.path.abspath('../MEDimage/'))
sys.path.append(os.path.dirname(MODULE_DIR))

import MEDimage

#### Splitting data - train & test
using `method="random"`creates a train, test, and a hold-out set but using `method="all_learn"` creates only a train and test set.

In [None]:
# Path to the folder of the study (folder that will contain everything)
path_study = Path.cwd() / "learning"

# Path to the outcomes file
path_outcome_file = Path.cwd() / "learning" / "Glioma__LGG_IDH__outcomes.csv"

# Create a folder for the experiment
if not os.path.isdir(path_study / 'experiments'):
    os.mkdir(path_study / 'experiments')

path_save_experiment = path_study / 'experiments'

# Seperate data (using the outcomes file)
path_data, _ = MEDimage.learning.ml_utils.create_holdout_set(
    path_outcome_file=path_outcome_file, 
    outcome_name='IDH', 
    path_save_experiments=path_save_experiment,
    method='all_learn'
)

print("Move the features folder to ", path_data)

**PS**: FEATURES folder must be moved to the `path_data` folder before running the experiment!

### Setting up and running the experiment

#### Explanation of the settings files:

- **ml_settings**: This file contains the methods to be used for the experiment, including the machine learning algorithm, feature selection method, and features normalization method. Currently, only one option is available for this step, and therefore, **the file must not be altered**.

- **ml_variables (MAIN FILE)**: This file configures the variables for the experiment along with its options. Inside *"var1"*, you should set the following parameters:

    - **nameType**: Specify "Radiomics*Something*" if the experiment utilizes only radiomics features.
    - **path**: By default, this is set to *setToFEATURESinWorkspace*, indicating that the folder containing the feature is named *FEATURES*. Adjust this to your actual folder name using *setToFolderNameinWorkspace*.
    - **scans**: List the modalities to be used in the experiment, for example, ["T1C", "T2WI"].
    - **rois**: List the ROIs to be analyzed (ROIs are found in the CSV file name within parentheses). For example, the ROI name for *radiomics__T2WI(rccLesion)__image.csv* is *rccLesion*.
    - **imSpaces**: Specify the Radiomics features CSV file extension, typically *image*.
    - **combinations**: Utilize this option to combine different files for various modalities in the experiment. For instance: {"T2WI": ["texture"],"T1C": ["intensity"]} will utilize the *texture* CSV for *T2WI* and the *intensity* CSV for *T1C*.
    - **use_combinations**: By default, set to *False*. Change it to *True* to enable combinations.
    - **var_datacleaning**: Determine the data cleaning method to be used.
    - **var_normalization**: Specify the normalization method to be used. Set to "combat" or leave empty.
    - **var_fSetReduction**: Specify the method for feature set reduction. The only option available for now is "*FDA*".
    - Other options should not be changed.

- **ml_algorithms**: This file contains the parameters of the machine learning algorithm (threshold, variable importance, parameter tuning method, etc.).

- **ml_datacleaning**: This file includes options for feature cleaning (imputation method, variance threshold, etc.).

- **ml_design**: This file contains options for the data splitting methods (number of splits, test set proportion, etc.).

- **ml_fset_reduction**: This file holds options for the feature reduction method. Currently, only FDA is implemented, so the file is specific to FDA options (number of features to keep, inter-correlation threshold, etc.).

- **ml_fset_selection**: NOT CURRENTLY IN USE. PLEASE IGNORE!!!

- **ml_imbalance**: NOT CURRENTLY IN USE. PLEASE IGNORE!!!

- **ml_normalization**: NOT CURRENTLY IN USE. PLEASE IGNORE!!!

In [None]:
# Experiment name. The recommanded norm: DatasetName_ClassificationProblem_RadiomicsType_Modality
experiment_label = "Glioma_IDH_Image_T1"

# Path to settings folder
path_settings = Path.cwd() / "learning" / "settings"

# Initialize the radiomics learner class (Main machine learning class)
learner = MEDimage.learning.RadiomicsLearner(
    path_study=path_data, 
    path_settings=path_settings, 
    experiment_label=experiment_label
)

# Launch the experiment. Set holdout_test to True to test the model on holdout as well
learner.run_experiment(holdout_test=False)