# UNet Model Encoder Decoder 

**Objectif:** le but de ce notebook est d'expliquer la partie Encodeur-Décodeur du modèle UNet.

Nous allons voir comment rendre le modèle UNet plus générique en 3 étapes :

- 1) ResUNet
- 2) Generic  UNet
- 3) Generic UNet en séparant Encodeur - Décodeur

### Root Variables 

In [1]:
import os 

In [2]:
root = '/home/ign.fr/ttea/Code_IGN/AerialImageDataset'
train_dir = os.path.join(root,'train/images')
gt_dir = os.path.join(root,'train/gt')
test_dir = os.path.join(root,'test/images')

In [3]:
import sys 

In [4]:
sys.path.insert(0, '/home/ign.fr/ttea/stage_segmentation_2021/Code')

In [5]:
from dataloader.dataloader import InriaDataset
from model.model import UNet

### Import Libraries 

In [6]:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd 

import torch
import torch.nn.functional as F
import torch.nn as nn

In [7]:
var= pd.read_json('variables.json')

## Dataset 

In [8]:
tile_size = (512,512)
train_dataset = InriaDataset(var['variables']['root'],tile_size,'train',None,False,1)
val_dataset = InriaDataset(var['variables']['root'],tile_size,'validation',None,False,1)

## U-Net Model

![title](../img/Unet.png)

![title](../img/archi_unet.jpeg)

Source de l'architecture du modèle UNet : 
https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47

## UNet - Fonctions

In [26]:
def conv_block(in_channel, out_channel):
    """
    in_channel : number of input channel, int 
    out_channel : number of output channel, int
    
    Returns : Conv Block of 2x Conv2D with ReLU 
    """
    
    conv = nn.Sequential(
        nn.Conv2d(in_channel, out_channel, kernel_size=3,padding=1),
        nn.BatchNorm2d(out_channel),
        nn.ReLU(inplace= True),
        nn.Conv2d(out_channel, out_channel, kernel_size=3,padding=1),
        nn.BatchNorm2d(out_channel),
        nn.ReLU(inplace= True),
    )
    return conv

### Encodeur 

La partie Encodeur Block suit l’architecture typique d’un réseau de neurones convolutif. 

Le réseau consiste en une application répétée de deux convolutions 3x3 chacune suivie d’une ReLU (Rectified Linear Unit) et d’une opération de MaxPooling 2x2 avec une stride de 2 pour le sous-échantillonnage (downsampling). A chaque étape de sous-échantillonnage, on double le nombre de canaux (features channels).

In [15]:
class EncoderBlock(nn.Module):
    def __init__(self,input_channel, output_channel,depth,n_block):
        super(EncoderBlock,self).__init__()
        self.input_channel = input_channel
        self.output_channel = output_channel 
        self.depth = depth 
        self.n_block = n_block 
        
        self.conv  = conv_block(self.input_channel, self.output_channel)
        self.pool = nn.MaxPool2d(kernel_size = 2, stride = 2)

        # weight initialization 
        self.conv[0].apply(self.init_weights)
            
    def init_weights(self,layer): #gaussian init for the conv layers
        nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')
    
    def forward(self,x):
        
        c = self.conv(x)
        if self.depth != self.n_block : 
            y = self.pool(c)
        else : 
            y = self.conv(x)
        
        return y,c 

### Decodeur 

Dans la partie du décodeur block, chaque étape consiste en un suréchantillonnage (upsampling) de la carte des caractéristiques suivi d’une convolution 2x2. 

On va diviser par 2 le nombre de canaux. Puis, s’opère une concatenation avec la carte des caractéristiques rognée par rapport à l’encodeur et d’une opération 3x3 convolutions chacune suivi d’une ReLU. 

Ensuite, le recadrage est nécessaire en raison de la perte de pixels de bordure dans chaque convolution. Dans notre cas on part du principe que la tuile à une taille de $2^N$ pour qu'on ait pas besoin de recadrer.

Au niveau de la couche finale, une opération de convolution 1x1 est utilisée pour mapper chaque vecteur d’entités à 64 composants au nombre de classes souhaité.

In [16]:
class DecoderBlock(nn.Module):
    def __init__(self,input_channel, output_channel):
        super(DecoderBlock,self).__init__()
        self.input_channel = input_channel
        self.output_channel = output_channel 
        
        self.conv_t  = nn.ConvTranspose2d(self.input_channel,self.output_channel, kernel_size= 2, stride=2)
        self.conv = conv_block(self.input_channel,self.output_channel)
        
        self.conv[0].apply(self.init_weights)
            
    def init_weights(self,layer): #gaussian init for the conv layers
        nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')
    
    def forward(self,x,skip):
        u = self.conv_t(x)
        concat =torch.cat([u,skip],1)
        x = self.conv(concat)
        
        return x

## Unet Base 

Dans un premier temps, nous allons rendre le modèle UNet plus modulaire en utilisant des blocs pour l'encodeur et le décodeur. 

### 1) ResUNet

![title](../img/resunet_archi.png)

![title](../img/resunet_archi1.png)

In [35]:
# Generic with encoder decoder block 
class ResUNet(nn.Module):
  """
  UNet network for semantic segmentation
  """
  
  def __init__(self, n_channels, conv_width,  n_class, n_block, cuda = 1):
    """
    initialization function
    n_channels, int, number of input channel
    conv_width, int list, depth of the convs
    n_class = int,  the number of classes
    """
    super(ResUNet, self).__init__() #necessary for all classes extending the module class
    self.is_cuda = cuda
    
    self.n_class = n_class
    self.n_block = n_block 
    
    #-------------------------------------------------------------
    
    ## Encoder 
    
    # Conv2D (input channel, outputchannel, kernel size)
    
    self.enc_1 = EncoderBlock(3,16,1,self.n_block)
    self.enc_2 = EncoderBlock(16,32,2,self.n_block)
    self.enc_3 = EncoderBlock(32,64,3,self.n_block)
    self.enc_4 = EncoderBlock(64,128,4,self.n_block)
    self.enc_5 = EncoderBlock(128,256,5,self.n_block)
    
    #--------------------------------------------------------------

    ## Decoder     
    
    # Transpose & UpSampling Convblock   
    
    self.dec_6 = DecoderBlock(256,128)
    self.dec_7 = DecoderBlock(128,64)
    self.dec_8 = DecoderBlock(64,32)
    self.dec_9 = DecoderBlock(32,16)
    
    # Final Classifyer layer 
    self.outputs = nn.Conv2d(16, self.n_class, kernel_size= 1)
    
    
    if cuda: #put the model on the GPU memory
      self.cuda()
    
  def init_weights(self,layer): #gaussian init for the conv layers
    nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')
    
  def forward(self, input):
    """
    the function called to run inference
    """  
    if self.is_cuda: #put data on GPU
        input = input.cuda()

    # Encoder (Left Side)
    enc_1, c1 = self.enc_1(input)
    enc_2, c2 = self.enc_2(enc_1)
    enc_3, c3 = self.enc_3(enc_2)
    enc_4, c4 = self.enc_4(enc_3)
    enc_5, c5 = self.enc_5(enc_4) 

    # Decoder (Right Side)
    dec_6 = self.dec_6(c5,c4)
    dec_7 = self.dec_7(dec_6,c3)
    dec_8 = self.dec_8(dec_7,c2)
    dec_9 = self.dec_9(dec_8,c1)
    
    # Final Output Layer 
    out = self.outputs(dec_9)
    
    return out

### Test ResUNet

On teste une prédiction sur le dataset de notre modèle UNet.

In [36]:
#==================TEST ORIGINAL UNET===============================
img, mask = train_dataset[42]
resunet = ResUNet(4,[3,16,32,64,128,256,128,64,32,16],2,5)
pred = resunet(img[None,:,:,:]) #the None indicate a batch dimension of 4 N,C,W,H
print('pred', pred)
print('output:',pred.shape)

pred tensor([[[[-0.2899, -2.2368, -3.0741,  ..., -2.5874, -1.0563, -1.0805],
          [ 0.0183, -0.9891, -2.5617,  ..., -0.9402, -0.5651, -0.2431],
          [ 0.0907, -2.2460, -2.7572,  ..., -1.0026, -1.1300, -0.3771],
          ...,
          [-1.1415, -1.5136, -1.8253,  ..., -1.1908, -0.8910, -0.2571],
          [-0.4873, -1.1744, -1.5253,  ..., -1.5095, -1.5102, -0.2014],
          [-1.1275,  0.2517, -0.2992,  ..., -1.2013, -1.2916,  0.4038]],

         [[ 0.3558,  0.7494, -0.6236,  ..., -0.4915,  0.2225,  0.1731],
          [ 1.7017,  1.7456,  0.3000,  ...,  1.3829,  0.1211,  0.3110],
          [ 1.0822,  1.3640, -0.8058,  ...,  0.9789,  1.0568,  0.4268],
          ...,
          [-0.0248,  0.7234, -0.0905,  ...,  1.5565,  0.0343,  1.3305],
          [ 0.3774,  0.6847,  0.1276,  ...,  1.1108,  0.1945,  1.2375],
          [ 0.4131,  0.7804,  0.2715,  ...,  0.6444, -0.0973,  0.7394]]]],
       device='cuda:0', grad_fn=<AddBackward0>)
output: torch.Size([1, 2, 256, 256])


### 2) Generic UNet

Dans cette partie, nous allons rendre le modèle UNet plus générique, on aura le choix sur le nombre de blocs que l'on souhaite mettre. 

In [27]:
class GenericUNet(nn.Module):
  """
  UNet network for semantic segmentation
  """
  
  def __init__(self, n_channels, conv_width,  n_class, n_block, cuda = 1):
    """
    initialization function
    n_channels, int, number of input channel
    conv_width, int list, depth of the convs
    n_class = int,  the number of classes
    n_block = int, the number of blocks 
    """
    super(GenericUNet, self).__init__() #necessary for all classes extending the module class
    self.is_cuda = cuda
    
    self.n_class = n_class
    self.n_block = n_block 
    self.conv_width = conv_width 
    
    self.enc = []
    self.dec = []
    
    #-------------------------------------------------------------
    
    ## Encoder 
    
    # Conv2D (input channel, outputchannel, kernel size)
    
    for i in range(self.n_block):
        self.enc.append(EncoderBlock(self.conv_width[i],self.conv_width[i+1],i+1,self.n_block))
    
    #--------------------------------------------------------------

    self.enc = nn.ModuleList(self.enc)
    
    ## Decoder     
    
    # Transpose & UpSampling Convblock   
    
    for i in range(self.n_block-1):
        self.dec.append(DecoderBlock(self.conv_width[self.n_block+i],self.conv_width[self.n_block+i+1]))
    
    self.dec = nn.ModuleList(self.dec)
    
    # Final Classifyer layer 
    
    self.outputs = nn.Conv2d(self.conv_width[-1], self.n_class, kernel_size= 1)
    
    if cuda: #put the model on the GPU memory
      self.cuda()
    
  def init_weights(self,layer): #gaussian init for the conv layers
    nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')
    
  def forward(self, input):
    """
    the function called to run inference
    """  
    if self.is_cuda: #put data on GPU
        input = input.cuda()

    #-------------------------------------------------
    # Encoder (Left Side)
    
    enc = []
    skip = [] 
    
    for i in range(self.n_block):
        
        if i == 0:
            enc.append(self.enc[i](input)[0])
            skip.append(self.enc[i](input)[1])
            
        else : 
            enc.append(self.enc[i](enc[i-1])[0])
            skip.append(self.enc[i](enc[i-1])[1])
        
    #--------------------------------------------------
    # Decoder (Right Side)
    
    dec = []
    
    for i in range(self.n_block-1):
        if i==0:
            dec.append(self.dec[i](skip[self.n_block -1 -i],skip[self.n_block -2 -i]))
            
        else :
            dec.append(self.dec[i](dec[i-1],skip[self.n_block -2 -i]))
            
    # Final Output Layer 
    out = self.outputs(dec[-1])
    
    return out

### Test Generic UNet

On teste le modèle UNet plus générique avec 6 blocs. 

In [32]:
img, mask = train_dataset[42]
unet = GenericUNet(4,[3,16,32,64,128,256,512,256,128,64,32,16],2,6)
pred = unet(img[None,:,:,:]) #the None indicate a batch dimension of 4 N,C,W,H
print('pred', pred)
print('output:',pred.shape)

pred tensor([[[[-0.2131,  0.5633,  0.2117,  ..., -1.6208, -0.9975, -1.4223],
          [-1.3358, -2.3274, -3.7664,  ..., -3.3160, -2.6979, -1.4786],
          [-0.2512, -0.5876, -0.8398,  ..., -0.3336, -0.8518, -0.9039],
          ...,
          [-0.7367, -0.3769,  0.1643,  ..., -0.3110, -0.5660, -1.0184],
          [-0.7237, -0.2038, -0.7381,  ..., -0.5771, -0.8188, -0.5405],
          [-0.1703,  0.3448,  0.1999,  ..., -0.0998, -0.6215, -0.2605]],

         [[-0.3502, -1.1615, -2.4874,  ..., -2.0258,  0.0356, -0.5310],
          [-1.3203, -2.0100, -3.2108,  ..., -2.7610, -2.0505, -0.9934],
          [ 0.3092, -0.4771,  0.8538,  ...,  0.2275,  0.4106, -0.4113],
          ...,
          [ 0.0856,  0.6214, -0.6505,  ..., -1.2896,  1.2099, -0.3725],
          [-0.1630,  0.3134, -0.3557,  ..., -0.3791,  0.3965, -0.1680],
          [ 1.4410,  0.9095,  0.5545,  ...,  0.2636,  0.4518, -0.0874]]]],
       device='cuda:0', grad_fn=<AddBackward0>)
output: torch.Size([1, 2, 256, 256])


### 3) Generic UNet en séparant Encodeur - Décodeur

Dans cette dernière partie, on sépare en 2 classes l'encodeur et le décodeur du modèle UNet générique. 

### Encodeur 

In [21]:
class GenericUNetEncoder(nn.Module):
  """
  UNet network for semantic segmentation
  """
  
  def __init__(self, n_channels, conv_width,  n_class, n_block, cuda = 1):
    """
    initialization function
    n_channels, int, number of input channel
    conv_width, int list, depth of the convs
    n_class = int,  the number of classes
    n_block = int, the number of blocks 
    """
    super(GenericUNetEncoder, self).__init__() #necessary for all classes extending the module class
    self.is_cuda = cuda
    
    self.n_class = n_class
    self.n_block = n_block 
    self.conv_width = conv_width 
    
    self.enc = []
    
    # Conv2D (input channel, outputchannel, kernel size)
    
    for i in range(self.n_block):
        self.enc.append(EncoderBlock(self.conv_width[i],self.conv_width[i+1],i+1,self.n_block))
    
    self.enc = nn.ModuleList(self.enc)
    
    if cuda: #put the model on the GPU memory
      self.cuda()
    
  def init_weights(self,layer): #gaussian init for the conv layers
    nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')
    
  def forward(self, input):
    """
    the function called to run inference
    """  
    if self.is_cuda: #put data on GPU
        input = input.cuda()
    
    enc = []
    skip = [] 
    
    for i in range(self.n_block):
        
        if i == 0:
            enc.append(self.enc[i](input)[0])
            skip.append(self.enc[i](input)[1])
            
        else : 
            enc.append(self.enc[i](enc[i-1])[0])
            skip.append(self.enc[i](enc[i-1])[1])
    
    return enc, skip 

### Décodeur 

In [22]:
class GenericUNetDecoder(nn.Module):
  """
  UNet network for semantic segmentation
  """
  
  def __init__(self, n_channels, conv_width,  n_class, n_block,encoder, cuda = 1):
    """
    initialization function
    n_channels, int, number of input channel
    conv_width, int list, depth of the convs
    n_class = int,  the number of classes
    n_block = int, the number of blocks 
    """
    super(GenericUNetDecoder, self).__init__() #necessary for all classes extending the module class
    self.is_cuda = cuda
    
    self.n_class = n_class
    self.n_block = n_block 
    self.conv_width = conv_width 
    
    self.skip = encoder[1]
    self.dec= []
    
    ## Decoder     
    
    # Transpose & UpSampling Convblock   
    for i in range(self.n_block-1):
        self.dec.append(DecoderBlock(self.conv_width[self.n_block+i],self.conv_width[self.n_block+i+1]))
    
    self.dec = nn.ModuleList(self.dec)
    
    # Final Classifyer layer 
    
    self.outputs = nn.Conv2d(self.conv_width[-1], self.n_class, kernel_size= 1)
    
    if cuda: #put the model on the GPU memory
      self.cuda()
    
  def init_weights(self,layer): #gaussian init for the conv layers
    nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')
    
  def forward(self, input):
    """
    the function called to run inference
    """  
    
    dec = []
    
    for i in range(self.n_block-1):
        if i==0:
            dec.append(self.dec[i](self.skip[self.n_block -1 -i],self.skip[self.n_block -2 -i]))
        else :
            dec.append(self.dec[i](dec[i-1],self.skip[self.n_block -2 -i]))
            
    # Final Output Layer 
    out = self.outputs(dec[-1])
    
    return out

### Generic UNet Class

In [24]:
class GenericUNetClass(nn.Module):
    """
    UNet network for semantic segmentation
    """

    def __init__(self, n_channels, conv_width,  n_class, n_block,encoder, decoder,cuda = 1):
        """
        initialization function
        n_channels, int, number of input channel
        conv_width, int list, depth of the convs
        n_class = int,  the number of classes
        n_block = int, the number of blocks 
        """
        super(GenericUNetClass, self).__init__() #necessary for all classes extending the module class
        self.is_cuda = cuda

        self.n_class = n_class
        self.n_block = n_block 
        self.conv_width = conv_width 

        self.encoder = encoder
        self.decoder = decoder

    def init_weights(self,layer): #gaussian init for the conv layers
        nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')

    def forward(self, input):
        """
        the function called to run inference
        """  

        pred_encoder = self.encoder(input)
        decoder = self.decoder 
        pred = decoder(pred_encoder)

        return pred

### Test Generic UNet en séparant Encodeur - Décodeur 

On teste le modèle UNet générique avec l'encodeur & décodeur séparer en 2 classes.

In [25]:
img, mask = train_dataset[42]
encoder =  GenericUNetEncoder(4,[3,16,32,64,128,256,128,64,32,16],2,5)
pred_encoder = encoder(img[None,:,:,:]) 

decoder =  GenericUNetDecoder(4,[3,16,32,64,128,256,128,64,32,16],2,5,pred_encoder)

generic_unet = GenericUNetClass(4,[3,16,32,64,128,256,128,64,32,16],2,5,encoder,decoder)
pred = generic_unet(img[None,:,:,:])

print('pred', pred)
print('output:',pred.shape)

pred tensor([[[[-0.0860,  0.1551, -1.2594,  ..., -0.1608, -0.0376,  0.0151],
          [ 0.8313,  0.9004, -1.3940,  ..., -1.3096, -0.5543, -0.8411],
          [-0.7892, -1.1880, -2.2894,  ..., -0.0720, -1.0506, -0.1327],
          ...,
          [-0.2811, -0.1471,  0.4162,  ..., -0.3011, -0.1364, -0.5183],
          [-1.1043, -1.5766, -1.1670,  ..., -2.0284, -2.0353, -0.4065],
          [ 0.3011, -0.5562,  0.2887,  ...,  0.1611,  0.8592, -0.1770]],

         [[-0.2717,  0.5485, -0.0673,  ...,  0.6280,  0.5707,  0.3428],
          [ 1.2669,  1.0678,  1.3669,  ...,  1.8978, -0.9913, -0.1743],
          [ 0.7372, -0.0643, -0.4601,  ...,  0.7972, -0.1540, -0.6832],
          ...,
          [ 0.0423,  1.5815, -0.4739,  ...,  2.5910,  1.3197, -0.0674],
          [ 0.5232,  0.8467, -0.4205,  ...,  0.0447,  0.9199,  0.0938],
          [ 1.5002, -0.0296,  0.6641,  ...,  1.2951,  0.9067,  1.1532]]]],
       device='cuda:0', grad_fn=<AddBackward0>)
output: torch.Size([1, 2, 256, 256])
