# Artifical Neural Networks & Deep Learning
# Homework 2 - Image Segmentation
# Selected subset : Bipbip - Haricot

**Developement Team:**
- Acquati Marco - 10583134 
- Brugali Giorgio - 10794550
- Puoti Francesco - 10595640 


# *1. Data acquisition and augmentation*
> We decided to tile images in 256 x 256 sub_images. Thus, for each image in the dataSet, we extracted 48 sub_images.
Then, we decided to manage by ourselfs data augmentation:
- Each image of the dataset is included resized;
- For each tile previously extracted, it's applied either random-angle rotation or vertical flip or horizontal flip. Obviously, for the tiles who are composed by only black pixels, the data augmentation is not applied, so that we do not run out of ram.

> Further information about this topic has been issued with annotations as the code flows down, to better clarify the correspondence between the explanations and the code snippets 

> **1.1. MaskCleaner**
>> It's noteworthy the mask acquisition. In fact, we used the maskCleaner function. Masks are loaded in grey scale mode, then noises are deleted with the aforementioned function. The resulting masks are composed by pixels with values in {0, 1, 2} (one value for each class).



# *2. Model overview*


> **2.1. The Network**
>> We based our model on the U-Net. It's composed by 7 convBlock. Each of the latters consists of two convolutional layer. The filters number starts from 64 and it's subsequently duplicated until 512, that is the bottleneck of our net. We tried also with 1024 but we figured out that the 512 U-net is less prone to noise effects.
Regarding the upsampling, we opted for a transpose convolutional instead of normal UpSampling, to leverage the fact that the Transpose Convolutional layer learns how to fill in details during the model training process.

> **2.2. Classification and Prediction process**
>> As output layer, we used a Convolutional one that classifies by means of the softmax activation function. since we crop the images before of feeding them into the network, after the prediction, we reconstruct the mask, taking 48 subimages at once.

> **2.3. Optimizer & LossFunction** 
>> - Adam, with a starting learning rate of 1e-3 and amsgrad = True to have an adaptive learning rate, so as to prevent the network from being stuck on a suboptimal solution.
>> - Loss function : Sparse Categorical Crossentropy.

> **2.4. Metrics** 
>> As metric, we chose the MeanIoU of Keras, but we gave a weight for each class:
- weight = 1 for both crop and weed;
- weight = 0.3 for the background.

> **2.5. Further information about the implemention process**
>> No EarlyStopping has been used in the final model as, after some trials, such model got stopped even though the learning process would have subsequently led to noteworthy improvements.

>> With regard to the checkpoints, we initially implemented them but, the more the network was becoming complex, the bigger was the space occupied on the HDD.
Therefore, we need to avoid to use checkpoint, otherwise HDD space on kaggle as well as colab would have ran out of memory.

>> Number of epochs is set to 75, since we noticed that the more the networks trains after 7, the more it produces only distorted images

>>In order to preserve some RAM memory space we explicitly called the garbage collector and the del keyword for deleting the no more used images arrays.

In [None]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import os
import gc
import tensorflow as tf
import numpy as np

# Set the seed for random operations. 
# This let our experiments to be reproducible. 
SEED = 1234
tf.random.set_seed(SEED)  
np.random.seed(SEED)
cwd = os.getcwd()

In [None]:
import PIL

def cropImg(img): #Function used to crop the images. 
    #print(img.shape)
    img = np.expand_dims(img, axis= 0)
   # img = np.expand_dims(img, axis = -1)
    crop = tf.image.extract_patches(images= img,
                             sizes=[1, 256,256, 1],
                            strides=[1, 256,256, 1],
                            rates=[1, 1, 1, 1],
                            padding='SAME')
    crop = np.array(crop)
    #print(crop.shape)
    crop = crop.reshape(48*len(img),256,256,3)
    return crop

def cropMask(img): #Function used to crop the masks. 
    img = np.expand_dims(img, axis= 0)
    img = np.expand_dims(img, axis = -1)
    crop = tf.image.extract_patches(images= img,
                             sizes=[len(img), 256,256, 1],
                            strides=[len(img), 256,256, 1],
                            rates=[1, 1, 1, 1],
                            padding='SAME')
    crop = np.array(crop)
    crop = crop.reshape(48*len(img),256,256,1)
    return crop

def maskCleaner(img): #Function used to convert the masks' pixel in our class values.
    mask = np.ones_like(img) *2 #weed
    mask[np.where(img == 0)] = 0 #background
    mask[np.where(img == 255)] = 1 #haricot
    return mask


#GET IMAGES:
#This function load all the images from the specified path and calls the functions above. 
#It returns two array, one with the cropped images and the other with the cropped masks.
def loadImages(idx):
    images = []
    masks = []
    paths =[ '../input/development-datasetzip/Development_Dataset/Training/Bipbip/Haricot'] 
    for dirs,_,files in os.walk(str(paths[idx])  +'/Images'):
        files.sort()
        for f in files:
            img = PIL.Image.open(os.path.join(paths[idx] + '/Images',f))
            img = np.array(img)
            img = cropImg(img)
            images.append(img)
    for dirs,_,files in os.walk(paths[idx] + '/Masks'):        
        files.sort()
        for f in files:
            mask = PIL.Image.open(os.path.join(paths[idx] + '/Masks',f)).convert('I')
            mask = np.array(mask)
            mask = maskCleaner(mask)
            mask= cropMask(mask)
            masks.append(mask)
    return images, masks





In [None]:
images , masks = loadImages(0)

In [None]:
#Arrays reshape from (90,48,256,356,channels) to (90*48,256,256,channels) 
images = np.array(images)
images = images.reshape([90*48,256,256,3])
masks = np.array(masks)
masks = masks.reshape([90*48,256,256,1])

In [None]:
from sklearn.model_selection import train_test_split

# Data set split in training and validation sets
# with the latter having a size equal to the 20% of the entire data set.
#-----------------------------------------------------------------------
images_train, images_valid, masks_train, masks_valid = train_test_split(images, masks, test_size=0.2, shuffle = True, random_state = SEED)


del images
del masks

gc.collect()

In [None]:
#These two functions are used to perform DataAugmentation on both images and masks.

from scipy import ndimage
def DataAugmetation(img_, mask_):
  
  aug_imgs = [img_]
  aug_masks = [mask_]
  if np.all(np.unique(mask_) == 0): #if the mask is composed by black pixels only, it's useless to perform data augmentation
    return aug_imgs, aug_masks

  operation = np.random.randint(low = 0, high=3) #low (inclusive) to high (exclusive)
  if operation == 0:
      #rotation
      angle = int(np.random.uniform(0,360,1))
      aug_imgs.append(ndimage.rotate(img_, angle, reshape=False, order=0, mode = 'nearest'))
      aug_masks.append(ndimage.rotate(mask_, angle, reshape=False, order=0, mode = 'nearest'))
  elif operation == 1:
      #vertical flip
      aug_imgs.append(np.flipud(img_))
      aug_masks.append(np.flipud(mask_))
  else:
      #horizontal flip
      aug_imgs.append(np.fliplr(img_))
      aug_masks.append(np.fliplr(mask_))
  return aug_imgs, aug_masks


def dataAugment():
  augImages = []
  augMasks = []
  for i in range(len(images_train)):
    x,y =DataAugmetation(images_train[i],masks_train[i])
    for k in x:
      augImages.append(k)
    for j in y:
      augMasks.append(j)
  return augImages, augMasks


augImages, augMasks = dataAugment()
augImages = np.array(augImages)
augMasks = np.array(augMasks)


del images_train
del masks_train

gc.collect()

In [None]:

from tensorflow.keras.preprocessing.image import ImageDataGenerator             

train_data_gen = ImageDataGenerator(rescale = 1./255)   
valid_data_gen = ImageDataGenerator(rescale = 1./255)


In [None]:

# Batch size
bs = 32

# img shape
img_h = 256
img_w = 256

train_gen = train_data_gen.flow(tf.convert_to_tensor(augImages),
                                augMasks,
                                batch_size=bs,
                                shuffle=True,
                                seed=SEED
                                ) 
del augImages
del augMasks

valid_gen = valid_data_gen.flow(tf.convert_to_tensor(images_valid),
                                masks_valid,
                                batch_size=bs,
                                shuffle=True,
                                seed=SEED
                                ) 

del images_valid
del masks_valid


In [None]:
train_dataset = tf.data.Dataset.from_generator(lambda: train_gen,
                                               output_types=(tf.float32, tf.float32),
                                               output_shapes=([None, img_h, img_w, 3], [None, img_h, img_w, 1]))
train_dataset = train_dataset.repeat()

valid_dataset = tf.data.Dataset.from_generator(lambda: valid_gen,
                                               output_types=(tf.float32, tf.float32),
                                               output_shapes=([None, img_h, img_w, 3], [None, img_h, img_w, 1]))
valid_dataset = valid_dataset.repeat()


In [None]:
gc.collect()

In [None]:
from tensorflow.keras import layers

inp_shape = (img_h, img_w, 3)
num_classes = 3
initializer = tf.keras.initializers.HeNormal()

def conv2D_block(numFilt_, previouslayer_) :

    convblock = layers.Conv2D(filters=numFilt_, kernel_size = (3,3), strides=1, padding="same", kernel_initializer=initializer)(previouslayer_)
    convblock = layers.BatchNormalization()(convblock)
    convblock = layers.ReLU()(convblock)
    convblock = layers.Conv2D(filters=numFilt_, kernel_size = (3,3), strides=1, padding="same", kernel_initializer=initializer)(convblock)
    convblock = layers.BatchNormalization()(convblock)
    convblock = layers.ReLU()(convblock)

    return convblock


def create_UNet(num_classes):    

    
#-----------------ENCODER PART--------------------------------
#-------------------------------------------------------------
    inputs = tf.keras.Input(shape=inp_shape)

    conv1 = conv2D_block(64, inputs)
    pool1 = layers.MaxPool2D(pool_size=(2,2), strides = 2)(conv1)
    
    conv2 = conv2D_block(128, pool1)
    pool2 = layers.MaxPool2D(pool_size=(2,2), strides = 2)(conv2)

    conv3 = conv2D_block(256 , pool2)
    pool3 = layers.MaxPool2D(pool_size=(2,2), strides = 2)(conv3)


#-----------------MIDDLE PART -- BOTTOM OF THE U-NET-------------------
#----------------------------------------------------------------------

    btn = conv2D_block(512 , pool3)
    
#----------------------DECODER PART----------------------------
#--------------------------------------------------------------
 
    up1 = layers.Conv2DTranspose(filters=256, kernel_size = (2,2), strides = 2)(btn)
    conv5 = layers.Add()([up1, conv3])
    conv5 = conv2D_block(256 ,conv5)

    up2 = layers.Conv2DTranspose(filters=128, kernel_size = (2,2), strides = 2)(conv5)
    conv6 = layers.Add()([up2, conv2])
    conv6 = conv2D_block(128 , conv6)

    up3 = layers.Conv2DTranspose(filters=64, kernel_size = (2,2), strides = 2)(conv6)
    conv7 = layers.Add()([up3, conv1])
    conv7 = conv2D_block(64 , conv7)

    outputs = layers.Conv2D(filters=num_classes,
                            kernel_size=(1, 1),
                            strides=(1, 1),
                            padding='same',
                            activation='softmax',
                            kernel_initializer = initializer)(conv7)

    # Define the model
    model = tf.keras.Model(inputs, outputs)
    return model 
  

In [None]:
model = create_UNet(num_classes)
model.summary()
gc.collect()

In [None]:
# Sparse Categorical Crossentropy to use integers (mask) instead of one-hot encoded labels
loss = tf.keras.losses.SparseCategoricalCrossentropy() 

lr = 1e-3   # learning rate
optimizer = tf.keras.optimizers.Adam(learning_rate=lr, amsgrad=True)

iou = tf.keras.metrics.MeanIoU(num_classes=3)
def IoU(y_true, y_pred): #MeanIoU function which uses keras MeanIoU
    y_pred = tf.expand_dims(tf.cast(tf.argmax(y_pred, -1), tf.float32), -1)
    y_true = tf.cast(y_true, tf.float32)
    weights = tf.where(tf.math.logical_or(y_true == 1.0 , y_true == 2.0), 1.0, 0.3)
    iou.update_state(y_true, y_pred, sample_weight = weights)
    return iou.result()

# Validation metrics
# ------------------
metrics = ['accuracy', IoU]


In [None]:
gc.collect()

In [None]:
num_ep = 75 #Training Epochs

# Compile Model
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

model.fit(x=train_dataset,
          epochs=num_ep,
          steps_per_epoch=(len(train_gen)),
          validation_data=valid_dataset,
          validation_steps=(len(valid_gen))
         )


In [None]:
#Model saving.
from datetime import datetime
md_dir = './'
if not os.path.exists(md_dir):
    os.makedirs(md_dir)
now = datetime.now().strftime('%b%d_%H-%M-%S')
model_path = os.path.join(md_dir,'model_' + str(now) + '.h5')
model.save(model_path)


In [None]:
#Output prediction of the model.
import matplotlib.pyplot as plt
import PIL

def reconstruct(patches, isMask = False): #Function used to reconstruct the predicted mask from the 48 predictions given.
  rows = tf.split(patches,1536//256,axis=0)
  rows = [tf.concat(tf.unstack(x),axis=1) for x in rows] 
  reconstructed = tf.concat(rows,axis=0)
  if not isMask:
    return reconstructed
  else:
    return np.squeeze(reconstructed)

# YOU HAVE THESE FUNCTIONS DEFINED ABOVE
#  -- def cropImg(img) -- def cropMask(img) -- def maskCleaner(img)
    

image = PIL.Image.open('../input/development-datasetzip/Development_Dataset/Training/Bipbip/Haricot/Images/Bipbip_haricot_im_05231.jpg')
mask = PIL.Image.open('../input/development-datasetzip/Development_Dataset/Training/Bipbip/Haricot/Masks/Bipbip_haricot_im_05231.png').convert('I')
image = np.array(image)
mask = np.array(mask)

images = cropImg(image)
predict = model.predict(x= images)
predict = np.array(tf.argmax(predict, axis = -1))
predict = reconstruct(predict, True)

prediction_img = np.zeros_like([predict])
prediction_img = prediction_img[0]

for i in range(1,3):
  prediction_img[np.where(predict == i)] = [i]

fig, ax = plt.subplots(1,3, figsize=(20, 20))
print(np.unique(predict))
ax[0].imshow(image)
ax[1].imshow(predict)
ax[2].imshow(maskCleaner(mask))