# Train a new model

In this notebook we will learn how to train a new model for axon & myelin segmentation. It covers the following scenario:

* Train a model from scratch by defining the parameters of the network
* Make inference using the trained model


### Imports

In [1]:
import json
import os
import time
import tensorflow as tf
import numpy as np
from skimage import io
from scipy.misc import imread, imsave
import imageio
import matplotlib.pyplot as plt
from shutil import copy

from AxonDeepSeg.train_network import train_model
from AxonDeepSeg.apply_model import axon_segmentation

# reset the tensorflow graph for new training
tf.reset_default_graph()

%matplotlib inline

### 1. Train a new model
#### 1.1. Create subfolder to save new model and its parameters.

For simplicity, the new model will be created under the `models/` folder in the AxonDeepSeg repository. The name of the model folder will be generated automatically using the date and time (to avoid multiple instances).

In [2]:
# Define path to where the trained model will be saved
dir_name = time.strftime("%Y-%m-%d") + '_' + time.strftime("%H-%M-%S")
path_model = os.path.join('../models/', dir_name)

# Create directory
if not os.path.exists(path_model):
    os.makedirs(path_model)

file_config = 'config_network.json'  # file name of network configuration

#### 1.2. Define the name and path of the training set

Here we assume that the training data folder has already been created by following the guidelines detailed in [guide_dataset_building.ipynb](https://github.com/neuropoly/axondeepseg/blob/master/notebooks/guide_dataset_building.ipynb).

The expected structure of the training data folder is the following:

~~~
data_training
 └── Train
      └── image_0.png
      └── mask_0.png
      └── image_1.png
      └── mask_1.png
          ...
 └── Validation
      └── image_0.png
      └── mask_0.png
      └── image_1.png
      └── mask_1.png
          ...
~~~

In [3]:
# trainingset_name = 'my_awesome_model'
trainingset_name = 'SEM_3c_512'
# path_training = '/home/jondoe/data_training'  #  folder containing training data
path_training = '/home/neuropoly/mikula/data/patched_data'  #  folder containing training data

#### 1.3. Define the configuration parameters of the training

The networks and training parameters (i.e. hyperparameters) used in the original [AxonDeepSeg article](https://www.nature.com/articles/s41598-018-22181-4) are defined below for TEM and SEM. **Note that these architectures might not produce satisfactory results on your data.**

Importantly: the pixel size is not defined at the training step. During inference however, the parameter `-t {SEM,TEM}` sets the resampling resolution to 0.1µm or 0.01µm respectively (i.e., implying the pixel size of the training data is 0.1µm or 0.01µm respectively). This is definitely a limitation of the current version of AxonDeepSeg, which we are planning to solve at some point (for more info, see [Issue #152](https://github.com/neuropoly/axondeepseg/issues/152)). 

In [5]:
# Example of network configuration for TEM data
config = {
    
# General parameters:    
  "n_classes": 3,  # Number of classes. For this application, the number of classes should be set to **3** (i.e. axon pixel, myelin pixel, or background pixel).
  "thresholds": [0, 0.2, 0.8],  # Thresholds for the 3-class classification problem. Do not modify.  
  "trainingset_patchsize": 512,  # Patch size of the training set in pixels (note that the patches have the same size in both dimensions).  
  "trainingset": trainingset_name,  # Name of the training set.
  "batch_size": 8,  # Batch size, i.e. the number of training patches used in one iteration of the training. Note that a larger batch size will take more memory.

# Network architecture parameters:     
  "depth": 4,  # Depth of the network (i.e. number of blocks of the U-net).
  "convolution_per_layer": [3, 3, 3, 3],  # Number of convolution layers used at each block.
  "size_of_convolutions_per_layer": [[5, 5, 5], [3, 3, 3], [3, 3, 3], [3, 3, 3]],  # Kernel size of each convolution layer of the network.
  "features_per_convolution": [[[1, 16], [16, 16], [16, 16]], [[16, 32], [32, 32], [32, 32]], [[32, 64], [64, 64], [64, 64]], [[64, 128], [128, 128], [128, 128]]],  # Number of features of each convolution layer.
  "downsampling": "convolution",  # Type of downsampling to use in the downsampling layers of the network. Option "maxpooling" for standard max pooling layer or option "convolution" for learned convolutional downsampling.
  "dropout": 0.75,  # Dropout to use for the training. Note: In TensorFlow, the keep probability is used instead. For instance, setting this param. to 0.75 means that 75% of the neurons of the network will be kept (i.e. dropout of 25%).
     
# Learning rate parameters:    
  "learning_rate": 0.001,  # Learning rate to use in the training.  
  "learning_rate_decay_activate": True,  # Set to "True" to use a decay on the learning rate.  
  "learning_rate_decay_period": 24000,  # Period of the learning rate decay, expressed in number of images (samples) seen.
  "learning_rate_decay_type": "polynomial",  # Type of decay to use. An exponential decay will be used by default unless this param. is set to "polynomial" (to use a polynomial decay).
  "learning_rate_decay_rate": 0.99,  # Rate of the decay to use for the exponential decay. This only applies when the user does not set the decay type to "polynomial".
    
# Batch normalization parameters:     
  "batch_norm_activate": True,  # Set to "True" to use batch normalization during the training.
  "batch_norm_decay_decay_activate": True,  # Set to "True" to activate an exponential decay for the batch normalization step of the training.  
  "batch_norm_decay_starting_decay": 0.7,  # The starting decay value for the batch normalization. 
  "batch_norm_decay_ending_decay": 0.9,  # The ending decay value for the batch normalization.
  "batch_norm_decay_decay_period": 16000,  # Period of the batch normalization decay, expressed in number of images (samples) seen.
        
# Weighted cost parameters:    
  "weighted_cost-activate": True,  # Set to "True" to use weights based on the class in the cost function for the training.
  "weighted_cost-balanced_activate": True,  # Set to "True" to use weights in the cost function to correct class imbalance. 
  "weighted_cost-balanced_weights": [1.1, 1, 1.3],  # Values of the weights for the class imbalance. Typically, larger weights are assigned to classes with less pixels to add more penalty in the cost function when there is a misclassification. Order of the classes in the weights list: background, myelin, axon.
  "weighted_cost-boundaries_sigma": 2,  # Set to "True" to add weights to the boundaries (e.g. penalize more when misclassification happens in the axon-myelin interface).
  "weighted_cost-boundaries_activate": False,  # Value to control the distribution of the boundary weights (if activated). 
    
# Data augmentation parameters:
# For each type of data augmentation, the order needs to be specified if you decide to apply more than one 
# transformation sequentially. For instance, setting the "da-0-shifting-activate" parameter to 'True' means that the 
# shifting is the first transformation that will be applied to the sample(s) during training. The default ranges of 
# transformations are:
#   Shifing: Random horizontal and vertical shifting between 0 and 10% of the patch size, sampled from a uniform distribution.
#   Rotation: Random rotation, angle between 5 and 89 degrees, sampled from a uniform distribution.
#   Rescaling: Random rescaling of a randomly sampled factor between 1/1.2 and 1.2.
#   Flipping: Random fipping: vertical fipping or horizontal fipping.
#   Blurring: Gaussian blur with the standard deviation of the gaussian kernel being uniformly sampled between 0 and 4.
#   Elastic deformation: Random elastic deformation with uniformly sampled deformation coefficient α=[1–8] and fixed standard deviation σ=4.
# You can find more information about the range of transformations applied to the patches for each data augmentation technique in the file "data_augmentation.py".
  "da-type": "all",  # Type of data augmentation procedure. Option **"all"** applies all selected data augmentation transformations sequentially, while option **"random"** only applies one of the selected transformations (randomly) to the sample(s). Type of data augmentation procedure. Option "all" applies all selected data augmentation transformations sequentially, while option "random" only applies one of the selected transformations (randomly) to the sample(s). List of available data augmentation transformations: 'random_rotation', 'noise_addition', 'elastic', 'shifting', 'rescaling' and 'flipping'. 
  "da-2-random_rotation-activate": False,  
  "da-5-noise_addition-activate": False, 
  "da-3-elastic-activate": True, 
  "da-0-shifting-activate": True, 
  "da-4-flipping-activate": True, 
  "da-1-rescaling-activate": False    
}

In [43]:
# Example of network configuration for SEM data
config = {
    
# General parameters:    
  "n_classes": 3,  # Number of classes. For this application, the number of classes should be set to **3** (i.e. axon pixel, myelin pixel, or background pixel).
  "thresholds": [0, 0.2, 0.8],  # Thresholds for the 3-class classification problem. Do not modify.  
  "trainingset_patchsize": 512,  # Patch size of the training set in pixels (note that the patches have the same size in both dimensions).  
  "trainingset": trainingset_name,  # Name of the training set.
  "batch_size": 8,  # Batch size, i.e. the number of training patches used in one iteration of the training. Note that a larger batch size will take more memory.

# Network architecture parameters:     
  "depth": 4,  # Depth of the network (i.e. number of blocks of the U-net).
  "convolution_per_layer": [2, 2],  # Number of convolution layers used at each block.
  "size_of_convolutions_per_layer": [[3, 3], [3, 3]],  # Kernel size of each convolution layer of the network.
  "features_per_convolution": [[[1, 5], [5, 5]], [[5, 10], [10, 10]]],  # Number of features of each convolution layer.
  "downsampling": "maxpooling",  # Type of downsampling to use in the downsampling layers of the network. Option "maxpooling" for standard max pooling layer or option "convolution" for learned convolutional downsampling.
  "dropout": 0.75,  # Dropout to use for the training. Note: In TensorFlow, the keep probability is used instead. For instance, setting this param. to 0.75 means that 75% of the neurons of the network will be kept (i.e. dropout of 25%).
     
# Learning rate parameters:    
  "learning_rate": 0.001,  # Learning rate to use in the training.  
  "learning_rate_decay_activate": True,  # Set to "True" to use a decay on the learning rate.  
  "learning_rate_decay_period": 24000,  # Period of the learning rate decay, expressed in number of images (samples) seen.
  "learning_rate_decay_type": "polynomial",  # Type of decay to use. An exponential decay will be used by default unless this param. is set to "polynomial" (to use a polynomial decay).
  "learning_rate_decay_rate": 0.99,  # Rate of the decay to use for the exponential decay. This only applies when the user does not set the decay type to "polynomial".
    
# Batch normalization parameters:     
  "batch_norm_activate": True,  # Set to "True" to use batch normalization during the training.
  "batch_norm_decay_decay_activate": True,  # Set to "True" to activate an exponential decay for the batch normalization step of the training.  
  "batch_norm_decay_starting_decay": 0.7,  # The starting decay value for the batch normalization. 
  "batch_norm_decay_ending_decay": 0.9,  # The ending decay value for the batch normalization.
  "batch_norm_decay_decay_period": 16000,  # Period of the batch normalization decay, expressed in number of images (samples) seen.
        
# Weighted cost parameters:    
  "weighted_cost-activate": True,  # Set to "True" to use weights based on the class in the cost function for the training.
  "weighted_cost-balanced_activate": True,  # Set to "True" to use weights in the cost function to correct class imbalance. 
  "weighted_cost-balanced_weights": [1.1, 1, 1.3],  # Values of the weights for the class imbalance. Typically, larger weights are assigned to classes with less pixels to add more penalty in the cost function when there is a misclassification. Order of the classes in the weights list: background, myelin, axon.
  "weighted_cost-boundaries_sigma": 2,  # Set to "True" to add weights to the boundaries (e.g. penalize more when misclassification happens in the axon-myelin interface).
  "weighted_cost-boundaries_activate": False,  # Value to control the distribution of the boundary weights (if activated). 
    
# Data augmentation parameters:
# For each type of data augmentation, the order needs to be specified if you decide to apply more than one 
# transformation sequentially. For instance, setting the "da-0-shifting-activate" parameter to 'True' means that the 
# shifting is the first transformation that will be applied to the sample(s) during training. The default ranges of 
# transformations are:
#   Shifing: Random horizontal and vertical shifting between 0 and 10% of the patch size, sampled from a uniform distribution.
#   Rotation: Random rotation, angle between 5 and 89 degrees, sampled from a uniform distribution.
#   Rescaling: Random rescaling of a randomly sampled factor between 1/1.2 and 1.2.
#   Flipping: Random fipping: vertical fipping or horizontal fipping.
#   Blurring: Gaussian blur with the standard deviation of the gaussian kernel being uniformly sampled between 0 and 4.
#   Elastic deformation: Random elastic deformation with uniformly sampled deformation coefficient α=[1–8] and fixed standard deviation σ=4.
# You can find more information about the range of transformations applied to the patches for each data augmentation technique in the file "data_augmentation.py".
  "da-type": "all",  # Type of data augmentation procedure. Option **"all"** applies all selected data augmentation transformations sequentially, while option **"random"** only applies one of the selected transformations (randomly) to the sample(s). Type of data augmentation procedure. Option "all" applies all selected data augmentation transformations sequentially, while option "random" only applies one of the selected transformations (randomly) to the sample(s). List of available data augmentation transformations: 'random_rotation', 'noise_addition', 'elastic', 'shifting', 'rescaling' and 'flipping'. 
  "da-2-random_rotation-activate": False,  
  "da-5-noise_addition-activate": False, 
  "da-3-elastic-activate": True, 
  "da-0-shifting-activate": True, 
  "da-4-flipping-activate": True, 
  "da-1-rescaling-activate": False    
}

#### 1.4. Save configuration parameters of the network as configuration file (.json)

After the config. parameters of the network to be trained are defined, they are saved into a .json file in the model folder. This .json file keeps tract of the network and model parameters in a structured way.

In [6]:
# Check if config file already exists
fname_config = os.path.join(path_model+file_config)
if os.path.exists(fname_config):
    with open(fname_config, 'r') as fd:
        config_network = json.loads(fd.read())
else:
    with open(fname_config, 'w') as f:
        json.dump(config, f, indent=2)
    with open(fname_config, 'r') as fd:
        config_network = json.loads(fd.read())

#### 1.5. Launch the training procedure

The training can be launched by calling the *'train_model'* function. After each epoch, the function will display the loss and accuracy of the model. The model is automatically saved at every 5 epochs.

In [None]:
# reset the tensorflow graph for new testing
tf.reset_default_graph()

train_model(path_training, path_model, config)

('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 1, 'Features: ', [128, 128])
('Size:', 3)
('Layer: ', 0, ' Conv: ', 0, 'Features: ', [1, 16])
('Size:', 5)
('Layer: ', 0, ' Conv: ', 1, 'Features: ', [16, 16])
('Size:', 5)
('Layer: ', 1, ' Conv: ', 0, 'Features: ', [16, 32])
('Size:', 3)
('Layer: ', 1, ' Conv: ', 1, 'Features: ', [32, 32])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 0, 'Features: ', [32, 64])
('Size:', 3)
('Layer: ', 2, ' Conv: ', 1, 'Features: ', [64, 64])
('Size:', 3)
('Layer: ', 3, ' Conv: ', 0, 'Features: ', [64, 128])
('Size:', 3)
('Layer:

2018-11-10 21:46:34.121853-epoch:58-loss:0.36253155767917633-acc:0.8915863037109375
2018-11-10 21:46:46.948825-epoch:59-loss:0.43030229210853577-acc:0.8755509853363037
2018-11-10 21:46:59.776982-epoch:60-loss:0.44730304181575775-acc:0.8778009414672852
Best accuracy model saved in file: ../models/2018-11-10_21-32-36/best_acc_model.ckpt
Best loss model saved in file: ../models/2018-11-10_21-32-36/best_loss_model.ckpt
Model saved in file: ../models/2018-11-10_21-32-36/model.ckpt
2018-11-10 21:47:14.042641-epoch:61-loss:0.5076057314872742-acc:0.8630046844482422
2018-11-10 21:47:26.883356-epoch:62-loss:0.4303636699914932-acc:0.8789882659912109
2018-11-10 21:47:39.746626-epoch:63-loss:0.4539366364479065-acc:0.8736264705657959
2018-11-10 21:47:52.600319-epoch:64-loss:0.4647601693868637-acc:0.8750269412994385
2018-11-10 21:48:05.484794-epoch:65-loss:0.3473314493894577-acc:0.8918039798736572
Best accuracy model saved in file: ../models/2018-11-10_21-32-36/best_acc_model.ckpt
Best loss model sav

#### 1.6. Monitor the training with Tensorboard

TensorBoard can be used to monitor the training procedure (loss and accuracy graphs, gradients, activations, identify bugs, etc.). To run TensorBoard, activate ADS virtual environment and run:
```
tensorboard --logdir PATH_MODEL --port 6006
```
where `PATH_MODEL` corresponds to this notebook's `path_model` variable (folder where model is being trained), and `port` is the port number where the TensorBoard local web server will be sent to (e.g., port 6006). Once the command is run, open a web browser with the address:
```
http://localhost:6006/
```

### 2. Test the trained model
#### 2.1. Set the path of the test image to be segmented with the trained model

In [19]:
path_img = '/home/neuropoly/mikula/data/ground_truths/443.png'

#### 2.1. Set the path of the test image to be segmented with the trained model

#### 2.2. Set the folder name of the trained model

In [None]:
img = imageio.imread(path_img)
path_folder, file_name = os.path.split(path_img)

path_configfile = os.path.join(path_model,'config_network.json')
with open(path_configfile, 'r') as fd:
    config_network = json.loads(fd.read())

#### 2.3. Launch the image segmentation.

In [13]:
# TO REMOVE
path_model = "/home/neuropoly/axondeepseg/models/2018-11-10_18-25-37/"

In [18]:
# Set paths (comment for using model trained at section 1.)
path_folder, file_name = os.path.split(path_img)

# reset the tensorflow graph for new testing
tf.reset_default_graph()

prediction = axon_segmentation(path_folder, file_name, path_model, config_network, acquired_resolution=0.1, verbosity_level=3)

Loading acquisitions ...
Rescaling acquisitions to the target resolution ...
Graph construction ...
Beginning inference ...
processing patch 1 of 6
processing patch 2 of 6
processing patch 3 of 6
processing patch 4 of 6
processing patch 5 of 6
processing patch 6 of 6


ValueError: Max value == min value, ambiguous given dtype

#### 2.4. Load the segmented image and display it.

In [None]:
pred_img_path = '/Users/Documents/Aldo/ads_feb/axondeepseg/AxonDeepSeg/data_test/AxonDeepSeg.png'
pred_img = imageio.imread(pred_img_path)

plt.figure(figsize=(13,10))
plt.title('Prediction with the trained model')
plt.imshow(pred_img,cmap='gray')