# Hyperparameter Tuning

One of the unique features of **NiftyTorch** is the ability to perform automated hyperparameter tuning on different CNN for neuroimaging data. Here is an example of such technique. 

### AlexNet example
Here we use `niftytorch` automated hyperparameter optimization module to optimze [AlexNet](https://en.wikipedia.org/wiki/AlexNet). 

In [2]:
# import the required libraries 
import torch
import niftytorch
from niftytorch.Models.AlexNet import train_alexnet
from torchvision import transforms
from niftytorch.Loss.Losses import FocalLoss
import torch.optim as optim
import torch.nn as nn
from torch.optim import lr_scheduler

In [3]:
# define path to data and labels
data_folder = "/example/farshid/img/data/StudyName"
data_csv = "/example/farshid/img/data/StudyName/label.csv"

Note that the `DataLoader` of the `niftytorch` is designed in way that no additional changes to the existing `torch` data input/output modules are required. Users will be able to still use their favorite `torch` commands, while `niftytorch` is taking care of the 3D adaptation in the background. 

In [4]:
# define tensor transformers and loss
data_transforms = transforms.Compose([transforms.ToTensor()])
train = train_alexnet()
loss_list = [nn.CrossEntropyLoss()]

### Configuration file
Below cell shows an example of `cfgs` file, containing a list of configurations for AlexNet. For full description on the configuration, see the documentation files. 

Here we assume: 
* **label.csv** file contains a column `Subject` with the name of sample subjects.
* **label.csv** file contains a column `diagnosis` with labels for each class (e.g. "normal","disease").
* **label.csv** file contains demographic informations, including `['age','gender','education']`. 
* Study data folder contains `t1w.nii.gz` and `flair.nii.gz` for all subjects. 
* For the purpose of the Demo we set the number of epochs and trials to 10 and 20, respectively. Consider higher numbers, specially if the computational resources are available. 
* Define the lower and upper bounds of the learning rate using `lr_min` and `lr_max`.

The study folder should contain training (`train`) and validation (`val`) folders, organized as follow:
```
StudyName
└───train
│   └───subjectID
│   │       t1w.nii.gz
│   │       flair.nii.gz
│   └───subjectID
│   │       t1w.nii.gz
│   │       flair.nii.gz              
│   │   ...
│   
└───val
│   └───subjectID
│   │       t1w.nii.gz
│   │       flair.nii.gz
│   └───subjectID
│   │       t1w.nii.gz
│   │       flair.nii.gz             
│   │   ...

```

### Note on optimizer
Users can choose available optimizers on Torch for their applications. Sometimes when a given optimizations do not progress, it is likely that the optimizer is not suited to your dataset. A list of optimizers are available in Torch which can be used, including but not limited to:
```optim.ASGD, optim.Adam, optim.LBFGS, optim.Rprop, optim.AdamW, optim.SGD, optim.SparseAdam```

With Automated hyperparameter optimization module you can choose a list of optimizer using:

```'opt_list': [optim.Adam,optim.RMSprop]```

In [6]:
# set up the configurations
cfgs = {
    'num_classes':2,
    'in_channels':2,
    'data_folder':data_folder,
    'data_csv':data_csv,
    'data_transforms':data_transforms,
    'filename_label':'Subject',
    'class_label':'diagnosis',
    'channels':False,
    'channels_1':[32, 96, 144, 8],
    'channels_2':[32, 96, 144, 8],
    'channels_3':[32, 96, 144, 8],
    'channels_4':[32, 96, 144, 8],
    'channels_5':[32, 96, 144, 8],
    'l2':3e-4,
    'strides':False,
    'strides_1':[1],
    'strides_2':[1], 
    'strides_3':[2, 1], 
    'strides_4':[2, 1], 
    'strides_5':[1],
    'kernel_size':False,
    'kernel_size_1':[5, 3],
    'kernel_size_2':[5, 3], 
    'kernel_size_3':[5, 3], 
    'kernel_size_4':[5, 3], 
    'kernel_size_5':[3],
    'padding':False,
    'padding_1':[0, 1],
    'padding_2':[0, 1],
    'padding_3':[0, 2, 1],
    'padding_4':[0, 2, 1],
    'padding_5':[2, 1],
    'learning_rate': False,
    'lr_min':1e-6,
    'lr_max':1e-2,
    'loss':False,
    'demographic':['age','gender','education'],
    'loss_list':loss_list,
    'scheduler':lr_scheduler.StepLR,
    'image_scale':False,
    'num_epochs':10,
    'image_scale_list':[64,80],
    'optimizer':False,
    'opt_list': [optim.Adam,optim.RMSprop],
    'gamma':.2,
    'batch_size':8,
    'num_workers':2,
    'cuda':'cuda:2',
    'device_ids':[],
    'step_size':7,
    'file_type':('t1w.nii.gz','flair.nii.gz')
}

Now that the `cfgs` are set, hyperparameter optimization can be done by running below code. Note that this step is highly computational demanding. We have tested it on a GPU server, but may be slow on local machines. 

In [7]:
train.hyperopt_set_params(cfgs)
train.optimize(n_trials = 2,
    contour_plot_params = ['image_size','lr'],
    optimization_history = True,
    plot_parallel_coordinate_params = ['image_size','lr'],
    slice_plot_params = ['image_size','lr'])

Above example executes **20 trials**, set by `n_trials = 20` option. 

At the end of each trial an update of the internal progress will be printed. For example: 


> ```
Finished trial#0 with value: 0.7536231884057971 with parameters: {'lr': 0.00018164197161668776, 'optimizer': <class 'torch.optim.rmsprop.RMSprop'>, 'loss': CrossEntropyLoss(), 'image_size': 80, 'channels_1': 8, 'channels_2': 144, 'channels_3': 96, 'channels_4': 96, 'channels_5': 96, 'strides_1': 1, 'strides_2': 1, 'strides_3': 2, 'strides_4': 2, 'strides_5': 1, 'kernel_size_1': 3, 'kernel_size_2': 3, 'kernel_size_3': 5, 'kernel_size_4': 3, 'kernel_size_5': 3, 'padding_1': 1, 'padding_2': 0, 'padding_3': 2, 'padding_4': 0, 'padding_5': 2}. Best is trial#0 with value: 0.7536231884057971.
```


When hyperparameter tuning is completed, the optimized hyperparamters of the network are obtained. 

In addition, we save optimization history. Above code outputs a file called `results.pkl` in the working directory in which the code was run. The `pkl` file contains performace results for each run and can be accessed using below commands: 

In [1]:
import joblib
import matplotlib.pyplot as plt
restuls = joblib.load('results.pkl')
df = restuls.trials_dataframe()
df.head(1)

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_channels_1,params_channels_2,params_channels_3,params_channels_4,params_channels_5,...,params_padding_2,params_padding_3,params_padding_4,params_padding_5,params_strides_1,params_strides_2,params_strides_3,params_strides_4,params_strides_5,state
0,0,0.445135,2020-05-18 00:54:04.486563,2020-05-18 01:11:33.480228,00:17:28.993665,96,144,32,144,96,...,0,2,1,1,1,1,2,2,1,COMPLETE


In [15]:
# plt.plot(df.learning_rate,df.accuracy)
# plt.xlabel('learning_rate')
# plt.ylabel('accuracy')
# plt.grid()
# plt.show()