### Goal in brief ###

Registrating brain images (2D/3D) with NNs. 

### Survey of architectures / Possible ideas ###
Most recent work are all using a "bottleneck" structure somewhere.  
* <span style ="color: blue"> Add a VAE infront for generalization purpose? It also provides a way to compress the data</span>

Some work utilizes `spatial transformation netwrok` to enforce the net learns spatial transforms.
* <span style ="color: blue"> Use certain loss (e.g. gradient related, if so, apply a gaussian filter first?) to regularize? </span>

* Assuming two templates are already proper aligned. If not, could stack a spatial transformer layer ahead for learning the proper affine (or other) transformations. 


### Other Ideas ###
* A two stage registration. First stage learns the spatial relationship, second stage learns to draw the intensity better.        
   <span style = 'color: blue'> How to decide if the spatial relationship is learned adequetly? using certain kinds of contour   mask?
   
* Taking the idea of GAN, let a discriminator to justify the quality from the output image. Using ANTs or similar to provide warped "ground truth". (What is the merit of this? Can it provide better result than the purely unsupervised way?)

### Questions during actual implementation###
* In the code `spatialdeformer.py`, why 
```
deformation = tf.reshape(deformation, shape = (batch_size, 2, -1) )
```

is not the same as 

```
deformation = tf.reshape(deformation, (-1, output_height * output_width, 2))
deformation = tf.transpose(deformation, (0, 2, 1))
```

The latter gives desired result, the former does not (Gradient seems to be very small for its update).

<span style = 'color: blue'> The truth is the mechanism of `tf.reshape()`, it will start filling the tensor using digits beginning from the last dimension of the original tensor. The deformation is of shape (batch_size, height, width, 2). So in order to make the information stay in the same channel, you have to do it the latter way. Even though these weight are latent and learned during training.</span>

* Stacking a STN infront of a SDN will produce a warning relating to converting a sparse tense into a dense one implicitly.
(The out put of STN is of shape (?, height, width, ?), why the channel info is not seen?)

* Is it necessary to do some preprocess like substract mean from trainning channel-wised?

<span style = 'color: blue'> Does not seem to help much, there will be stripes at the border, and the loss curve does not look nice. Maybe other preprocess will help, check what do others do. </span>


* Deformation, when as an extra output, returns NAN when using total variation loss.
    <span style = 'color: blue'> Solved by initializing the displacement layer with NONZEROS </span>
    
* Need a module for visualization/exploration purposes.
   * Check plotly, mayavi
   * plt.axes3d.voxels method seems to be very slow.
   * Check [pysufer](https://pysurfer.github.io/install.html)
  

### Focused on losses###

* Single loss: means_sqaure_logarithmic_error and KL-divergence seems to work good
   
     * Their scales on train. KL: (negative when using img directly, ~0.10 when flattened and normalized by sum)
                              SobelLoss: ~0.08 
                              MSE: ~0.08
                              MSE_log: ~0.005   
                              BCE: ~0.3
                              TVS: ~0.02 with $\alpha = 1.25$

* Using perception loss extracted from other pretrained models.

   <span style="color: blue"> This may be a general solution if one want to combine different features but is not sure what particular kinds of feature should be used manually.</span>
   
   Perceptual losses from pretrained VGG16 net on imagenet. 
   * MSE loss from a single layer. Using only block3 will have "water rippples" effect comapred to block4
   * MSE loss from multiple layer. Scales of different losses roughly are : 0.5, 10, 15, 0.5, 0.01.
     <span style = 'color: blue'> How to balance different losses so as to better assign weights? </span>
     
   * "Stylish" losses that uses the gram matrix
   * "Correlation" losses.

* Current loss used: KL + SobelLoss + BCE

* What about mutual information (MI): https://matthew-brett.github.io/teaching/mutual_information.html

* <span style = "color: blue"> An MSE + $H^1$(displacement field) seems to work generally well. As for the weight tuning, work on low resolution images can provides guides for higher cases.</span>


                              
### Structure of the regress net ###

* Choice of uppooling: upsampling or conv_transpose.

* Adding paralelled SDN modules.

* Adding some short-cut "highway"
 
   **<span style = "color: blue"> A `multiply` layer between an eariler conv layer (detecting edges) and a latter one shows an impressive improvement in learning !! This may because feature maps produced at the earlier conv when mulitplied with the latter feature map help put attention on "edges", hence learns the shape better.</span>**
   
   **<span style = "color: red"> This is true in 2D case, does not seem to be so in 3D**
   

* Does not seems to be the case for OASIS brain slices, but in case of too much freedom on the deformation. A proper regularization loss should be considered.

### TODO ###

* Find a 2D dataset for evaluation: Jacard or landmark distance.
     * http://www.bic.mni.mcgill.ca/ServicesAtlases/ICBM152NLin2009
     * LONI
       Need to learn how the data is organised and their formats.       
            * current using simpleitk for access the analyse format. An 
            [example](http://insightsoftwareconsortium.github.io/SimpleITK-Notebooks/Python_html/10_matplotlib's_imshow.html)
            * Images does not seem to be affine-aligned.Some of the heads are tilted. The ROI data is not very good (at least visually)
* Utilreg or other to produce $P$ for adding a supervised learning module.
* Extend the current framework to 3d. [DONE]
    *<span style = 'color:blue'> In the translation example, adding multiplication short cut actually block the learning...</span>
* How to write a regularization term?
     * As one of the loss
     * regularization method in layer class [DONE]
* Sample/interpolate directly from images of the original resolution with generated grids of different resolution instead of presampling training images to desired resolution?
   <span style = 'color:blue'> Need to reimplement SDN </span>

### Problem Encountered ###

* The deformation network deform the background as well. When there are relative shifts, the deformation does not seem to be very smooth. This is because the network does not learn to distinguish objects but tries to make changes on edges so that it "looks like" the target. What **regularization** will produce desired results? (Less emphasizing "edges"?)
    * <span style = 'color:blue'> It seems that a fine tuned regularization strength will help. Use different weights for displacement and its derivatives.</span>


* **Warping small ventricles to large ones seems to be more difficult than the other way around**. 

<span style = 'color: blue'> This may because of the nature of your SDN which uses the warped grid to sample, if the moving image has a gap that is only 2 or 3 pixels wide, it is hard for many grid points to be squeezed in that gap. </span>

* It is hard to learn local sturctures.

   * Idea one: Use certain loss function wrt `shapes`: countours...
   * Idea two: How to enforce the **"edge to edge"** relationship with a mask of the target image.
   
        Multiply the mask directly (or its exponential)to the predicted image/displacement to "boost" values on the edges?
   * Idea three: Maybe a correlation feature map like in Rocco's paper will help?
   * <span style = "color: blue"> A smart choice of regularization loss.</span>

* AntsR imageSimilarity does strange things...
* *Need to check if background is 0 before feed them to NN*!
* Am I doing the right thing for visualizing the warped grid. The value plot does not seem to be consistent with the grid plot... what do others do?


#### Training on large dataset ####
* Using "train_on_batch" + self-written training routine does not seem to provide loss decay on the whole data set,
I guess it is because I should not let the gradient update form each minibath. Reducing(or increasing?) the learning_rate could help. But eventually fix the problem by "fit_generator". (Better not have double "for" routine when implementing it.)

* If the training is properly warmed-up, the process could be fastened.  

* Adam optimizer does a good job, lr = 1e-3 ~ 1e-4.
   * lr 1e-3 with reg strength 1e-5 and 5 epochs does the best job for now.
   * 
* Batchsize = 8 to 10 is used, larger size will cause memory error on a 11G GPU.

* Learned it the hard way: better check some literatures when tuning hyperparameters.

### Experiment Design ###

* data set : LPBA40_delineation_skullstripped, 40 subjects, 56 labels.
* train/test: 30/10
   * Image intensities varies a lot subject wise. Individually `normalize` each volumn by their max value?
* metric: Jaccard/Dice on label. (Though equivalent, the latter seems to be used more often)
   * train on normalized image, predict on normalized label.
   * train on normalized image, get the displacement to warp the moving. <- (`should use this`)
   * train on normalized label, predict on normalized label
* Compare with Ants, different architecture?
* What's needed to be visualized?
   * some overlay may look good.

### Possible Directions ###
*  Instead of fixed weights for regulariztion strength, using a spatially varying weights.
*  Since unsupervised, prediction + correction type?
*  A Generative model.
*  Droupout layers to make it Bayesian to access uncertainty of prediction.
*  Atlas construction.
*  Something like an adaptive method, bring attention to poorly registered region.
    * A 3d STN.
*  Train together with momentum information.
    * Use "ground truth" to train produced displacement.
*  Put a STN in front?
*  Performance v.s. training size?
*  Compare different architectures?
*  Compare predictions on "seen" and "unseen" samples.
*  Performance vs epochs, training size.

### Links ###

* Software here seems to be very [technical](http://resource.loni.usc.edu/resources/downloads)
* Get to know the field: 
   * some popular algorithms: DARTEL, geodesic shooting, diffeo-demons, SyN (symmetric normalization), LDDMM 
* Do not know how does UtilzReg uses the displacement field, should learn to use ANTs/ANTsR