<a href="https://colab.research.google.com/github/JAMES-YI/00_Tensorflow_Tutorials/blob/master/%E2%80%9Csegmentation_ipynb%E2%80%9D%E7%9A%84%E5%89%AF%E6%9C%AC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Last updated by JYI on 04/08/2020

-------
##### Copyright 2019 The TensorFlow Authors.

Licensed under the Apache License, Version 2.0 (the "License");



In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

- focuses on the task of image segmentation, using a modified <a href="https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/" class="external">U-Net</a>.
- want to segment the image, i.e., each pixel of the image is given a label. Thus, the task of image segmentation is to train a neural network to output a pixel-wise mask of the image.
- The dataset that will be used for this tutorial is the [Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/), created by Parkhi *et al*. The dataset consists of images, their corresponding labels, and pixel-wise masks. The masks are basically labels for each pixel. Each pixel is given one of three categories :

Class 1 : Pixel belonging to the pet; Class 2 : Pixel bordering the pet; Class 3 : None of the above/ Surrounding pixel.

In [0]:
'''

'''

!pip install git+https://github.com/tensorflow/examples.git
import tensorflow as tf
from tensorflow_examples.models.pix2pix import pix2pix

import tensorflow_datasets as tfds
tfds.disable_progress_bar()

from IPython.display import clear_output
import matplotlib.pyplot as plt

## Download the Oxford-IIIT Pets dataset

The dataset is already included in TensorFlow datasets, all that is needed to do is download it. The segmentation masks are included in version 3+.

In [0]:
'''
- download dataset
- performs a simple augmentation of flipping an image. How the corresponding pixel masks can be generated?
- The dataset already contains the required splits of test and train and so let's continue to use the same split.
- benefits of using @tf.function
- input_image, input_mask: will all be tensors
- visualization of the image and mask
- how to use DatasetV1Adapter
- raw image with pixels in [0,255], and the normalized data is used for training
'''

dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True) # load dataset

def normalize(input_image, input_mask):
  input_image = tf.cast(input_image, tf.float32) / 255.0 # normalized to [0,1]
  input_mask -= 1 # get labels that are : {0, 1, 2}.
  return input_image, input_mask

@tf.function
def load_image_train(datapoint):
  input_image = tf.image.resize(datapoint['image'], (128, 128))
  input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))

  if tf.random.uniform(()) > 0.5:
    input_image = tf.image.flip_left_right(input_image)
    input_mask = tf.image.flip_left_right(input_mask)

  input_image, input_mask = normalize(input_image, input_mask)

  return input_image, input_mask

def load_image_test(datapoint):
  input_image = tf.image.resize(datapoint['image'], (128, 128))
  input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))

  input_image, input_mask = normalize(input_image, input_mask)

  return input_image, input_mask

# print(info)
# print(dataset['test'])
# print(dataset['train'])

In [0]:
'''
- load_image_train(), a function in the graph
- DatasetV1Adapter
- dataset['train'].map(), apply a function over each element of the dataset
- https://kite.com/python/docs/tensorflow.contrib.autograph.operators.control_flow.dataset_ops.DatasetV2.map
- num_parallel_calls = tf.data.experimental.AUTOTUNE, the number of parallel calls is set dynamically based on available CPU
- get contents from tensorflow.python.data.ops.dataset_ops.DatasetV1Adapter
- train.cache(), return a dataset

'''

print(info.splits['test'].num_examples)
print(info.splits['train'].num_examples)
TRAIN_LENGTH = info.splits['train'].num_examples
BATCH_SIZE = 64
BUFFER_SIZE = 1000
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE

train = dataset['train'].map(load_image_train, num_parallel_calls=tf.data.experimental.AUTOTUNE)
test = dataset['test'].map(load_image_test)
train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
test_dataset = test.batch(BATCH_SIZE) # still cannot see the images

print(train)
print(train.cache())
print(train.cache().shuffle(BUFFER_SIZE))
print(train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE))
print(train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat())
print(train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat().prefetch(buffer_size=tf.data.experimental.AUTOTUNE))
print(test.batch(BATCH_SIZE))

In [0]:
'''
- illustration of images and its masks
- how to access images and the corresponding outputs
- how to use the outputs to formulate loss
'''

def display(display_list):
  plt.figure(figsize=(15, 15))

  title = ['Input Image', 'True Mask', 'Predicted Mask']

  for i in range(len(display_list)):
    plt.subplot(1, len(display_list), i+1)
    plt.title(title[i])
    plt.imshow(tf.keras.preprocessing.image.array_to_img(display_list[i]))
    plt.axis('off')
  plt.show()

for image, mask in train.take(1): # take the 1st image; 2nd, 3rd, ...
  sample_image, sample_mask = image, mask
  
display([sample_image, sample_mask])

print(sample_image.get_shape())
print(sample_mask.get_shape())

## Define the model
- model being used here is a modified U-Net. A U-Net consists of an encoder (downsampler) and decoder (upsampler). 
- a pretrained model can be used as the encoder. Thus, the encoder for this task will be a pretrained model, whose intermediate outputs will be used
- the decoder will be the upsample block already implemented in TensorFlow Examples in the [Pix2pix tutorial](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/pix2pix/pix2pix.py) 
- The reason to output three channels is because there are three possible labels for each pixel. Think of this as multi-classification where each pixel is being classified into three classes.
- the encoder will be a pretrained MobileNetV2 model which is prepared and ready to use in [tf.keras.applications](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/applications). The encoder consists of specific outputs from intermediate layers in the model. Note that the encoder will not be trained during the training process.

In [0]:
'''
- how to use different models in keras to perform segmentation
- use encoders implemented by MobileNetV2/ResNet/GoogLeNet/InceptionNet/VGG
- check the layer names in a trained network
- tf.keras.applications contains: 
- each layer is an object, and it has many attributes and member functions
- how to define and use the upsampling blocks
- down_stack, only gets the output of the specific layers
- 

use of tf.keras.applications.MobileNetV2
- construct model 
  base_model = tf.keras.applications.MobileNetV2(input_shape=[128, 128, 3], include_top=False)
- get configurations of constructed model
  base_model.summary()
- choose specific layers
  block_1_expand_relu, (64,64,96)
  block_3_expand_relu, (32,32,144)
  block_6_expand_relu, (16,16,192)
  block_13_expand_relu, (8,8,576)
  block_16_project, (4,4,320)
- extract output of the specified layers
  base_model.get_layer('block_1_expand_relu').output
- model is set to be nontrainable, 
- how the weights are determined
- how the outputs of different layers are used
'''

OUTPUT_CHANNELS = 3
base_model = tf.keras.applications.MobileNetV2(input_shape=[128, 128, 3], include_top=False)

# Use the activations of these layers
layer_names = [
    'block_1_expand_relu',   # 64x64
    'block_3_expand_relu',   # 32x32
    'block_6_expand_relu',   # 16x16
    'block_13_expand_relu',  # 8x8
    'block_16_project',      # 4x4
]
layers = [base_model.get_layer(name).output for name in layer_names]

# Create the feature extraction model
down_stack = tf.keras.Model(inputs=base_model.input, outputs=layers)

down_stack.trainable = False

# print(base_model.summary())
for layer in layers:
  print(layer)

Layer = base_model.get_layer('block_1_expand_relu')
print(Layer.output)
print(Layer.weights)
print(Layer.with_name_scope)
print(Layer.variables)
print(Layer.trainable)
print(Layer.output_shape)
print(Layer.name)
print(base_model.input)

The decoder/upsampler is simply a series of upsample blocks implemented in TensorFlow examples.

In [0]:
'''
- pix2pix.upsample(), return an upsampling layer
- what's the output of the return of pix2pix.upsample()
- what's the input of the return of pix2pix.upsampe()
- how to use the return of pix2pix.upsample()
- tensorflow.python.keras.engine.sequential.Sequential object
'''

up_stack = [
    pix2pix.upsample(512, 3),  # 4x4 -> 8x8
    pix2pix.upsample(256, 3),  # 8x8 -> 16x16
    pix2pix.upsample(128, 3),  # 16x16 -> 32x32
    pix2pix.upsample(64, 3),   # 32x32 -> 64x64
] # only the layers are defined, but the inputs and outputs are not specified

print(pix2pix)
print(up_stack[0])
print(up_stack[0].summary)

In [0]:
'''
- why take x = skip[-1]?
- reversed(skips[:-1]), reverse the order in skip; the last element is not considered
- up_stack, (3,3,512), (3,3,256), (3,3,128), (3,3,64), stride=2
- when up = up_stack[0], skip = skips[0]
  x = up(x), (8,8,512) => x = concat([x,skip]), (8,8,1088)
- when up = up.stack[1], ski[ = skips[1]
  x = up(x), (16,16,256) => x = concat([x,skip]), (16,16,448)
- the output in a deeper layer of the encoder is upsampled to be able to 
  be concatenated with output in previous layer in the encoder, and this process
  ends when we get the same size as the original image
- concatenation is along channel dimension

'''

def unet_model(output_channels):
  inputs = tf.keras.layers.Input(shape=[128, 128, 3])
  x = inputs

  # Downsampling through the model
  skips = down_stack(x)
  x = skips[-1] # (4,4,320)
  skips = reversed(skips[:-1]) # (8,8,576), (16,16,192), (32,32,144), (64,64,96)

  # Upsampling and establishing the skip connections
  for up, skip in zip(up_stack, skips):
    x = up(x)
    # print(x)
    concat = tf.keras.layers.Concatenate()
    x = concat([x, skip])
    # print(x)

  # This is the last layer of the model
  last = tf.keras.layers.Conv2DTranspose(
      output_channels, 3, strides=2,
      padding='same')  #64x64 -> 128x128

  x = last(x)

  return tf.keras.Model(inputs=inputs, outputs=x)

## Train the model
- loss being used here is `losses.SparseCategoricalCrossentropy(from_logits=True)`. The reason to use this loss function is because the network is trying to assign each pixel a label, just like multi-class prediction. 
- In the true segmentation mask, each pixel has either a {0,1,2}. The network here is outputting three channels. Essentially, each channel is trying to learn to predict a class, and `losses.SparseCategoricalCrossentropy(from_logits=True)` is the recommended loss for 
such a scenario. 
- Using the output of the network, the label assigned to the pixel is the channel with the highest value. This is what the create_mask function is doing.

In [0]:
'''
- tf.keras.utils.plot_model, plot computational graph
'''
model = unet_model(OUTPUT_CHANNELS)
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
tf.keras.utils.plot_model(model, show_shapes=True)

def create_mask(pred_mask):
  pred_mask = tf.argmax(pred_mask, axis=-1)
  pred_mask = pred_mask[..., tf.newaxis]
  return pred_mask[0]

def show_predictions(dataset=None, num=1):
  if dataset:
    for image, mask in dataset.take(num):
      pred_mask = model.predict(image)
      display([image[0], mask[0], create_mask(pred_mask)])
  else:
    display([sample_image, sample_mask,
             create_mask(model.predict(sample_image[tf.newaxis, ...]))])
    
show_predictions()

Let's observe how the model improves while it is training. To accomplish this task, a callback function is defined below. 

In [0]:
print(info.splits)
print(info.splits['test'])
print(info.splits['test'].num_examples)
print(train_dataset)
print(type(train_dataset))

In [0]:
'''
- VAL_SUBSPLITS, 
'''

class DisplayCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs=None):
    clear_output(wait=True)
    show_predictions()
    print ('\nSample Prediction after epoch {}\n'.format(epoch+1))

EPOCHS = 10 # default 20
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS

model_history = model.fit(train_dataset, epochs=EPOCHS,
                          steps_per_epoch=STEPS_PER_EPOCH,
                          validation_steps=VALIDATION_STEPS,
                          validation_data=test_dataset,
                          callbacks=[DisplayCallback()])

In [0]:
loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

epochs = range(EPOCHS)

plt.figure()
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'bo', label='Validation loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss Value')
plt.ylim([0, 1])
plt.legend()
plt.show()

show_predictions(test_dataset, 3)