# LUNA Train Unet

# Dependency Descriptions
1. **keras**: is a high-level neural networks library (that allows for easy and fast prototyping)

In [1]:
from __future__ import print_function

import numpy as np
from keras.models import Model
from keras.layers import Input, merge, Convolution2D, MaxPooling2D, UpSampling2D
from keras.optimizers import Adam
from keras.optimizers import SGD
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras import backend as K

Using TensorFlow backend.


In [2]:
WORKING_PATH = "../../../../output/build-simple-model/"
IMG_ROWS = 512
IMG_COLS = 512

K.set_image_dim_ordering('th')  # Theano dimension ordering in this code
# dimension ordering is simply the order dimensions come in (ex: width, height, z)
# and this is using theano's ordering convention

**[Dice Coefficient Loss Function](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient)**: compares the predicted and actual node mask (similar metric to what was used in Ultrasound Nerve Segmentation challenge that the U-net was originally written for)


Everything should be working as they did it in their tutorial first, so you are sure you understand their code (and how their code works). Then you can slowly change the code to fit your own ideas, so you are sure errors are not due to an error in the copying of the tutorial's code. 

Therefore training and predicting will be done on the typical train/test split (that the tutorial recommends) and after getting the tutorial to work successfully you can use 10 fold cross validation in place of it to choose a model, then train the model on the entire dataset and predict.

# Understanding Of Sequential Order of Code
## Loading / Preprocessing Training Data
```python
imgs_train = np.load(working_path+"trainImages.npy").astype(np.float32)
imgs_mask_train = np.load(working_path+"trainMasks.npy").astype(np.float32)

imgs_test = np.load(working_path+"testImages.npy").astype(np.float32)
imgs_mask_test_true = np.load(working_path+"testMasks.npy").astype(np.float32)
    
mean = np.mean(imgs_train)  # mean for data centering
std = np.std(imgs_train)  # std for data normalization

imgs_train -= mean  # images should already be standardized, but just in case
imgs_train /= std
```

## Actually Creating the Unet
*goal: to understanding exactly how this set of code*
## Steps:
1. Create the intial structure of a Unet
2. Create checkpoints for the unet to save its best weights (at that time period)
3. Give the unet an initial set of weights (optional)
3. Train the unet on trianing data (consisting of lung image, and node mask)

## Getting the Unet
```python
# where the return of the function should give you the "model"
model = get_unet()
```

## Creating the Unet 
*using keras define the initial structure of the model (layers, nodes, etc...)*

**so how does this code create the structure of a model?**

## Research into Unet
### To Understand Everything:
1. Go through [this guide](https://keras.io/getting-started/sequential-model-guide/)
2. Go through [other guide](https://keras.io/getting-started/functional-api-guide/)

### Sequential Models ([reference](https://keras.io/getting-started/sequential-model-guide/))
- Sequential Model: linear stack of layers
- tell model what input shape to expect (first layer must recieve info about input shape)
- before training a model, configure the learning process, which requires a `compile` method which contains:
  1. an optimizer: from existing optimizers, or instance of optimizer class, [reference](https://keras.io/optimizers/)
  2. a loss function: the object the model will try to minimize, existing loss function or just an objective function, [reference](https://keras.io/objectives/)
     - note custom objective functions have specific structures (like must have y_true, y_pred and return a scalar)
  3. list of metrics: existing metrics, custom metrics must return single tensor value, [reference](https://keras.io/metrics/)
- keras models: trained on Numpy arrays of input data and labels, use the `fit` function (sequential model api [complete reference](https://keras.io/models/sequential/))

#### Examples 
- *[complete examples folder](https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py)*
- good demonstration of CNN ([here](https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py))

### Getting started with the Keras functional API [reference](https://keras.io/getting-started/functional-api-guide/#the-concept-of-layer-node)
- Keras functional API: allows you to define complex models
- layer instances are callable (on a tensor?), returns a tensor
  - then the input and output tensors are used to define a `Model`
  - understand tensors below in the section called "**Tensor Understanding**"
- then the model is trained exactly the same a `Sequential` model
  - perfect example:
    ```python
    from keras.layers import Input, Dense
    from keras.models import Model

    # this returns a tensor
    inputs = Input(shape=(784,)) # your inputs are a tensor

    # a layer instance is callable on a tensor, and returns a tensor
    x = Dense(64, activation='relu')(inputs) # modify the inputs with another tensor
    x = Dense(64, activation='relu')(x)      # and so on and so forth
    predictions = Dense(10, activation='softmax')(x) # final output layer modification

    # this creates a model that includes
    # the Input layer and three Dense layers
    model = Model(input=inputs, output=predictions) 
    # input tensor, and output tensor wraps everything together
    # the rest is directly from `Sequential` models
    model.compile(optimizer='rmsprop',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    model.fit(data, labels)  # starts training
    ```
- the entire created model could even be considered a large tensor! and used again!

#### Examples!
- [here](https://keras.io/getting-started/functional-api-guide/#more-examples)

### Tensor Understanding
- [wikipedia definition](https://en.wikipedia.org/wiki/Tensor): tensors are geometric objects that describe linear relations between geometric vectors, scalars, and other tensors (like the dot product, cross product or even linear maps)
  - Given a coordinate basis or fixed frame of reference, a tensor can be represented as an organized multidimensional array of numerical values.
  - The order (also degree or rank) of a tensor is the dimensionality of the array needed to represent it, or equivalently, the number of indices needed to label a component of that array. For example, a linear map is represented by a matrix (a 2-dimensional array) in a basis, and therefore is a 2nd-order tensor. 
  - there are many definitions (the definitions describe the same geometric concept in different languages and differing levels of abstraction)
- simple machine learning definition: multidimensional arrays (generalizing arrays and matrices), [reference](http://stats.stackexchange.com/questions/144860/how-are-tensors-used-in-neural-networks)
  - more explanations [here](http://stats.stackexchange.com/questions/198061/why-the-sudden-fascination-with-tensors)

## Understanding Our Convolutional Network
## Clarification:
1. what are those tensors?
   1. what do they do?
   2. what are those parameters within them?
2. why those tensors in that specific order?
   1. how does it create the model we want to do segmentation?
3. how does U-net relate? 
   1. what makes U-net different then a NN, or a CNN?
4. what is the Adam optimizer?
5. what does our custom loss and metric do? how does it do it?

**look at examples, documentation (of both keras and U-net) to answer the above questions to completely understand all the code**

## Clarification Answers:
### To Learn About CNNs:
*this assumes you have understanding of neural networks from Andrew Ng's neural networks section in his machine learning course*

1. Watch all videos "Intro To CNNs" up to "Conclusion" ([here](https://classroom.udacity.com/courses/ud730/lessons/6377263405/concepts/63741833610923#))
2. Read lesson ([here](http://cs231n.github.io/convolutional-networks/))
3. Read wiki ([here](https://en.wikipedia.org/wiki/Convolutional_neural_network))

### Convolutional Neural Networks
- *A ConvNet architecture is in the simplest case a list of Layers that transform the image volume into an output volume*
- Neural network where connectivity pattern between neurons is inspired by organization of animal visual cortex (perfect for image recognition)
- Individual cortical neurons respond to stimuli in a restricted region of space known as the receptive field. The receptive fields of different neurons partially overlap such that they tile the visual field. The response of an individual neuron to stimuli within its receptive field can be approximated mathematically by a convolution operation
  - receptive fields: small neuron collections which process portions of the input image, outputs of those neurons are tiled so input regions overlap (repeated for every layer of the network)
  - pooling layers: combine the outputs of neuron clusters

#### Features:
1. 3D Volumes of Neurons: 
   - layers of neurons in 3 dimensions (length, height, depth)
   - one layer only connected to small region in layer before it (receptive field)
   - distinct layers (locally/completely connected) are stacked
2. Local Connectivity:
   - spatially local connectivity between neurons (in adjacent layers)
   - filters are built for local input patterns
   - stacking many "local layers" creates global filters (response to larger region)
   - first network creates representations of small parts of input, then larger areas
3. Shared Weights:
   - filter is replicated across entire visual field (same weight/bias)
   - all neurons in one layer detect same feature (you can detect features regarless of position in visual field)
   - allows you to find statistical invariants easily, you can detect a feature anywhere in the image
   - **only use if one feature is useful to compute at some spatial position (x,y), then it should also be useful to compute at a different position (x2,y2)**
4. Convolution:
   - if all neurons are using the same weight vector, then forward pass can be computed as a convolution (of weights and inputs)
   - kernel / filter: commonly used to refer to sets of weights (because it is convolved with input)
- these allow better generalizations, and less training required

#### Visualization
*Refer to this video ([here](https://classroom.udacity.com/courses/ud730/lessons/6377263405/concepts/64063017560923#))*

1. Imagine the input image as a 3 dimensional volume (3rd dimension is RGB), that is the input layer
2. Second layer of neurons, the width and height of volume of neurons is smaller, because each neuron is responsible for a small region of neurons in the input layer (small patches, also overlapping with other neurons), the depth of the neuron volume is greater because you are increasing sematic complexity of representation (looking for bigger features than just a colored pixel, like edges, then shapes, etc...)
3. The layers progressively scan patches, but get smaller and smaller in the width and height and larger depth
4. finally feed to finaly layer a classifier

#### Vocabulary
1. patches / kernels: the receptive field or size of each neuron
2. depth: the 3rd dimension, normally representing semantic complexity
3. feature map / filters: every single layer in the depth dimension
4. stride: number of pixels shifting by for each filter
5. 'valid' padding: no padding, the next layer of neurons does not reach past the edge of image
6. 'same' padding: pad with 0s, go off the edge so next layer of neurons is same size as previous
7. receptive field / filter size: spatial extent of connectivity from neurons in one layer to the other

#### Parts of CNN:
1. Convolutional Layer
   - layer's parameters: of a set of learnable filters (kernels), with small receptive field (going through the depth of the input volume)
   - network learning: learns filters that activate when detecting specific feature
   - model: take the filter, multiply times width and height of input volume
   - neuron connections: each neuron only connected to small patch of input volume
     - receptive field hyperparameter: determines how big the patches are
2. Pooling Layer
   - reduce spatial extent of image (length, height), without losing too much information like the strides from convolutional layer (controls overfitting)
   - max pooling: take the max in the region to downsample (no parameters, accurate, only pooling size and stride)
   - average pooling: take average over window of pixels at location (basically a blurred, low resolution view of the image)
3. 1x1 Convolutions
   - makes models deeper and have more parameters
4. Inception Module
   - choosing between using pooling, 5x5, 3x3, 1x1 convolutions (when all of them could help you with your modeling so why choose? USE THEM ALL)
   - do all of the different possibilities in different locations, with different orders and concatenate their outputs
   - choose less parameters in a way that it is better than simple convolution!
5. Hyperparameters:
   1. depth: number of filters used for a layer (each filter looking for something different
   2. stride: number of pixels to slide the filter, determines size of output volumes
   3. zero-padding: padding input volume with 0s

#### Newest Advancements:
1. Google’s Inception architectures
2. Residual Networks from Microsoft Research Asia
   - http://arxiv.org/abs/1512.03385
3. (Find the history and more [here](http://cs231n.github.io/convolutional-networks/) in the case studies section) 

#### Good Lecture:
- [Deep Learning for Computer Vision (Andrej Karpathy, OpenAI)](https://www.youtube.com/watch?v=u6aEYuemt0M)

In [None]:
def get_unet():
    inputs = Input((1,IMG_ROWS, IMG_COLS)) # input tensor with size of imgs (1, 512, 512)
    
    # a bunch of layers, tensors modifying previous tensors
    # activation functions tensorflow
    
    
    conv1 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(inputs) 
    # filter of size 3x3x32 (depth is 32), https://keras.io/layers/convolutional/
    # stride = 3 
    # no 0 padding
    # activation: ReLU
    conv1 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    # pooling operation (2x2 window, stride 2)
    # split activations into 2x2 squares, and takes every different square to 1 neuron in next layer

    conv2 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(pool1)
    conv2 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

    conv3 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(pool2)
    conv3 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

    conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(pool3)
    conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

    conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(pool4)
    conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(conv5)

    up6 = merge([UpSampling2D(size=(2, 2))(conv5), conv4], mode='concat', concat_axis=1)
    # do not understand why you have these upsampling, and merging steps 
    # also do not understand the reason for this increased depth, then less depth
    # finally how does this actually segment for you (the size does not match up)
    #
    conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(up6)
    conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv6)

    up7 = merge([UpSampling2D(size=(2, 2))(conv6), conv3], mode='concat', concat_axis=1)
    conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(up7)
    conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(conv7)

    up8 = merge([UpSampling2D(size=(2, 2))(conv7), conv2], mode='concat', concat_axis=1)
    conv8 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(up8)
    conv8 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(conv8)

    up9 = merge([UpSampling2D(size=(2, 2))(conv8), conv1], mode='concat', concat_axis=1)
    conv9 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(up9)
    conv9 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(conv9)

    conv10 = Convolution2D(1, 1, 1, activation='sigmoid')(conv9) # final layer, tensor

    # create the Model by telling it its input/output
    # where output is a tensor of other layers
    model = Model(input=inputs, output=conv10) 

    # compile the model with our very own loss and metric, and the existing Adam optimizer
    model.compile(optimizer=Adam(lr=1.0e-5), loss=dice_coef_loss, metrics=[dice_coef])

    return model

### Understanding Activation Functions
**What is an activation function?** ~ Activation functions (in terms of neural networks), are functions that can be applied to every individual neuron. So after the neuron takes in inputs, multiplies times weights and adds, it takes its value and feeds it to an activation function. Every neuron in the network has an activation function. 

**Why have an activation function?** ~ A neural network with no activation functions will simply be a single linear neuron. Therefore it cannot separate nonlinear cases (its just a linear boundary!). Thus without activation function you cannot have non linear functions, imagine trying to model everything in life with a line! (read [here](https://www.quora.com/What-is-the-role-of-the-activation-function-in-a-neural-network) and [here](https://www.quora.com/Why-do-neural-networks-need-an-activation-function))

**Explanation for ReLU and Sigmoid** ~ IF you think about neuron action potential, the neuron needs to pass a certain threshold or else it does not fire. ReLu function's output is either 0 or positive, so if a neuron with ReLU is 0, it is "off" and its inputs are not propogated forward. However if the neuron is on, its input is sent to next layers. Basically it mimics the action potential of a neuron (without it, you would not have thresholds and it would all be linear). Sigmoid is similar except for the fact it limits solely to between 0 or 1 (very small or very large = very close to the 1 or 0). (read [here](http://stats.stackexchange.com/questions/228296/what-is-the-purpose-of-a-neural-network-activation-function))

### Structure of Layers
CONV, RELU, CONV, RELU, POOL, 
CONV, RELU, CONV, RELU, POOL, 
CONV, RELU, CONV, RELU, POOL, 
CONV, RELU, CONV, RELU, POOL, 
CONV, RELU, CONV, RELU,
MERGE, CONV, RELU, CONV, RELU,
MERGE, CONV, RELU, CONV, RELU,
MERGE, CONV, RELU, CONV, RELU,
MERGE, CONV, RELU, CONV, RELU,
CONV, SIGMOID

### Adam Optimizer
### Custom Loss and Metric
#### Dice Coefficient
#### Understanding of Code for Function
#### Difference Between Loss and Metric

## Save the trained model at checkpoints
```python
model_checkpoint = ModelCheckpoint('unet.hdf5', monitor='loss', save_best_only=True)
```

## Use weights given by tutorial
```python
if use_existing:
    model.load_weights('./unet.hdf5')
```
## Train model on training data
```python
model.fit(imgs_train, imgs_mask_train, batch_size=2, nb_epoch=20, verbose=1, shuffle=True,
              callbacks=[model_checkpoint])
```
_**The final weights are what you want, with those weights you put them on the model and can start making predictions**_

The same way that CNNs are an extension of NN, designed specifically for image recognition, you need to design a model that is specifically for recognizing lungs.

Domain knowledge can be integrated:
- the fact that just knowing you have an image= then you can create these receptive fields, because according to domain knowledge, things that are closer together in the image have more significance then everything in the image linked together and this creates the whole concept of a convolutional neural network with its receptive fields (allowing you to create a better model using domain knowledge)

for this specific problem of lung cancer you already know many things about domain knowledge such as: - the most likely areas for tumors are on the wall, or at the end of a blood vessel
  - then you add extra weight for those identified near the wall, and you need to identify the wall first
- big tumors are easy to spot
- hardest ones are groundglass nodules

what if you could have a crawler that would crawl over just the surface of the lungs walls looking for tumors, then crawlers that crawled over blood vessels to the end searching for tumors as well

instead of controlling overfitting by limiting learning commplexity, instead you should identify where it is learning false information and stop it from doing so