# brain_age

Predict age from brain grey matter (regression).

## Preprocessed input data

Voxel Based Morphometry (VBM) from [cat12](http://www.neuro.uni-jena.de/cat/):

- ROIs of Gray Matter (GM) scaled for the Total Intracranial Volume (TIV)
  * `[test|train]_rois.csv` 284 features

- VBM map in the MNI space (3 D map)
  * `[test|train]_train_vbm.npz` 3D images of shapes (121, 145, 121).
  This npz contains the 3D mask and the affine transformation to MNI
  referential. Masking the brain provide 331 695 input features (voxels).

`problem.get_train_data()` return the concatenation of 284 ROIs features with 331 695 features.
Those two blocks are redundant. To select only on ROIs features do:

```
x_arr[:, :284]
```

To select only on VBM (voxel with the brain) features do:

```
x_arr[:, 284:]
```

There are 357 samples in the training set and 90 samples in the test set.


## Links


- [RAMP-workflow’s documentation](https://paris-saclay-cds.github.io/ramp-workflow/)
- [RAMP Kits](https://github.com/ramp-kits)

In [56]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

In [57]:
import problem

In [58]:
X_train, y_train = problem.get_train_data()

## Feature Selection

Select only the 284 ROIs features:

In [59]:
X_train = X_train[:, :284]

## Predictor

We propose a simple Regression predictor based on ROIs features only:

In [60]:
# Models
from sklearn.linear_model import LinearRegression
from sklearn.base import BaseEstimator
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

pipe = make_pipeline(ROIsFeatureExtractor(), StandardScaler(), LinearRegression())

## Evaluation using CV

The framework is evaluated with a cross-validation approach. The metrics used are the root-mean-square error (RMSE).

In [61]:
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_validate
from problem import get_cv

cv = get_cv(X_train, y_train)

results = cross_validate(pipe, X_train, y_train, scoring=['neg_root_mean_squared_error'], cv=cv,
                             verbose=1, return_train_score=True,
                             n_jobs=1)

print("Training RMSE: {:.3f} +- {:.3f}".format(-np.mean(results['train_neg_root_mean_squared_error']),
                                                        np.std(results['train_neg_root_mean_squared_error'])))
print("Test RMSE: {:.3f} +- {:.3f}".format(-np.mean(results['test_neg_root_mean_squared_error']),
                                                        np.std(results['test_neg_root_mean_squared_error'])))

Training RMSE: 0.639 +- 0.288
Test RMSE: 47.793 +- 16.807


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    0.1s finished
