<a href="https://colab.research.google.com/github/chiaramarzi/ML-models-validation/blob/main/Holdout.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Artificial intelligence (AI) for health - potentials


*   **Data mining**: finding pattern in big data
*   **Biomarker discovery**: determining potential (compound) biomarkers
*   The **predicitive nature** of machine learning strategies is highly in line with the aim of clinical diagnosis and prognosis **in the single patient**

# Models validation



In machine learning, model validation is referred to as the process where a trained model is evaluated with a testing data set. The testing data set is a separate portion of the same data set from which the training set is derived.
Model validation is carried out after model training.

Estimation of **unbiased generalization performance** of the model

# Outline

* Holdout validation
* K-fold cross-validation (CV)
* Leave-One-Out CV (LOOCV)
* Hyperparameters tuning
* Training, validation and test set: the holdout validation
* training, validation and test set: the nested CV
* Sampling bias
* Repetition of holdout validation
* Repetition of CV
* Unbalanced datasets

# Age prediction based on neuroimaging features



*   Data: T1-weighted images of 86 healthy subjects with age ranging from 19 to 85 years (41 males and 45 females, age 44.2 ± 17.1 years, mean ± standard deviation). Data are freely accessible at [here](https://fcon_1000.projects.nitrc.org/) and described in (Mazziotta et al., 2001)
*   Features:
  * Cortical thickness (mCT)
  * Gyrification index (Pial_mean_GI)
  * Fractal dimension (FD)
* Task:
  * Regression
  * Classification ("young" vs. "old")

The same data and features have been previously investigated in (Marzi et al., 2020).


**References**

Marzi, C., Giannelli, M., Tessa, C. et al. Toward a more reliable characterization of fractal properties of the cerebral cortex of healthy subjects during the lifespan. Sci Rep 10, 16957 (2020). https://doi.org/10.1038/s41598-020-73961-w

Mazziotta, J. et al. A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM). Philos. Trans. R. Soc. Lond. B Biol. Sci. 356, 1293–1322. https://doi.org/10.1098/rstb.2001.0915 (2001).

# Libraries and data loading

In [None]:
# Libraries loading
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, KFold
from sklearn.svm import SVR, SVC
from sklearn.metrics import mean_absolute_error, accuracy_score

# Regression data
url = 'https://raw.githubusercontent.com/chiaramarzi/ML-models-validation/main/data_regression.csv?token=AMEZNGPJVEPXJHPEKZBE3CDARQOUO'
reg_data = pd.read_csv(url)

# Balanced classification data
url = 'https://raw.githubusercontent.com/chiaramarzi/ML-models-validation/main/data_classification_balanced.csv?token=AMEZNGM6BECXUQXKH7CC2D3ARQOYC'
class_data = pd.read_csv(url)

# Unbalanced classification data
url = 'https://raw.githubusercontent.com/chiaramarzi/ML-models-validation/main/data_classification_unbalanced.csv?token=AMEZNGON3NZJCTLXDG66XU3ARQOZW'
unbal_class_data = pd.read_csv(url)

In [None]:
reg_data

In [None]:
class_data

In [None]:
unbal_class_data

# Holdout validation