# Which feature helps age prediction?

After the toy experiment in `../is_age_predictable/toy_regression_svm_rf_mlp.py`, 
we want to know which feature contributes to age prediction. 
In particular, we want to know which DTI scalar to use for future design.

Author: Chenyu Gao

Date: Jul 3, 2023

In [1]:
from functions import *
from sklearn.ensemble import RandomForestRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_absolute_error
from tqdm import tqdm
import numpy as np
import multiprocessing
import warnings

random_seed = 0
warnings.filterwarnings('ignore')

### Test Experiment

First, perform some sanity check of the code.

In [2]:
# Load data
df = prepare_dataframe(atlas=['EveType3'], DTI_scalar=['FA'], value=['mean'], volume=False)
df_encoded = pd.get_dummies(df, columns=['Sex'], drop_first=True)
X = df_encoded.drop(['Session','Age'], axis=1)
y = df_encoded['Age']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=300, random_state=random_seed)
    
# Random Forest regression model
rf_model = RandomForestRegressor(n_estimators=100, random_state=random_seed)
rf_scores = -cross_val_score(rf_model, X_train, y_train, cv=5, scoring='neg_mean_absolute_error')

rf_model.fit(X_train, y_train)
rf_predictions = rf_model.predict(X_test)
rf_mae = mean_absolute_error(y_test, rf_predictions)
    
# Fully Connected Neural Network regression model (MLP)
mlp_model = MLPRegressor(hidden_layer_sizes=(100,100), max_iter=5000, random_state=random_seed)
mlp_scores = -cross_val_score(mlp_model, X_train, y_train, cv=5, scoring='neg_mean_absolute_error')

mlp_model.fit(X_train, y_train)
mlp_predictions = mlp_model.predict(X_test)
mlp_mae = mean_absolute_error(y_test, mlp_predictions)

print('Random Forest\nMAE (cross validation): {}\nMAE (testing): {}'.format(rf_scores,rf_mae))
print('MLP\nMAE (cross validation): {}\nMAE (testing): {}'.format(mlp_scores,mlp_mae))

Random Forest
MAE (cross validation): [5.41105155 5.22090515 5.38831959 5.20710744 5.1403905 ]
MAE (testing): 4.918329999999997
MLP
MAE (cross validation): [5.7568732  6.22060007 5.76138283 5.66925726 5.61275005]
MAE (testing): 5.328636982908498


### Combination of features

Try different input feature combinations, and see which results in better prediction.

The blocks of code have been moved to script: `combination_of_features_parallel.py`