# Welcome to the HydraNet Workshop 🐸🐸🐸
In this workshop, you're going to learn how to build a Neural Network that has:
* Input: **a monocular RGB Image**
* Output: **a Depth Map**, and **a Segmentation Map**

A single model, two different outputs. For that, out model will need to use a principle called Multi Task Learning.<p>

# 1 - Imports

In [137]:
!pip install -U tensorflow

[0m

In [140]:
!wget https://hydranets-data.s3.eu-west-3.amazonaws.com/hydranets-data.zip && unzip -q hydranets-data.zip && mv hydranets-data/* . && rm hydranets-data.zip && rm -rf hydranets-data

--2022-12-19 20:36:02--  https://hydranets-data.s3.eu-west-3.amazonaws.com/hydranets-data.zip
Resolving hydranets-data.s3.eu-west-3.amazonaws.com (hydranets-data.s3.eu-west-3.amazonaws.com)... 3.5.226.127
Connecting to hydranets-data.s3.eu-west-3.amazonaws.com (hydranets-data.s3.eu-west-3.amazonaws.com)|3.5.226.127|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 110752264 (106M) [application/zip]
Saving to: ‘hydranets-data.zip’


2022-12-19 20:36:09 (15.0 MB/s) - ‘hydranets-data.zip’ saved [110752264/110752264]



In [141]:
# import os
# import shutil
# shutil.rmtree('/kaggle/working/')

In [143]:
%matplotlib inline
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import cv2
import tensorflow as tf
import math

# 2 — Creating the HydraNet
We now have 2 DataLoaders: one for training, and one for validation/test. <p>

In the next step, we're going to define our model, following the paper [Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations](https://arxiv.org/pdf/1809.04766.pdf) —— If you haven't read it yet, now is the time.<p>

A Note — This notebook has been adapted from DrSleep, a researcher named Vladimir, who authorized me to adapt it for education purposes. [Here's the notebook I'm refering to](https://github.com/DrSleep/multi-task-refinenet/blob/master/src/notebooks/ExpNYUDKITTI_joint.ipynb/).

<p>

> ![](https://d3i71xaburhd42.cloudfront.net/435d4b5c30f10753d277848a17baddebd98d3c31/2-Figure1-1.png)

Our model takes an input RGB image, make it go through an encoder, a lightweight refinenet decoder, and then has 2 heads, one for each task.<p>
Things to note:
* The only **convolutions** we'll need will be 3x3 and 1x1
* We also need a **MaxPooling 5x5**
* **CRP-Blocks** are implemented as Skip-Connection Operations
* **Each Head is made of a 1x1 convolution followed by a 3x3 convolution**, only the data and the loss change there


## 2.1 — Create a HydraNet class

In [144]:
class HydraNet(tf.keras.Model):
    def __init__(self):        
        super(HydraNet, self).__init__() # Python 3
        self.num_tasks = 2
        self.num_classes = 6
        
#     def Encoder(self):
#         mobilenet_config =[[1, 16, 1, 1],
#                     [6, 24, 2, 2],
#                     [6, 32, 3, 2],
#                     [6, 64, 4, 2],
#                     [6, 96, 3, 1],
#                     [6, 160, 3, 2],
#                     [6, 320, 1, 1],
#                     ]
#         c_layer = 2
#         in_channels = 32
#         layer1 = tf.keras.Sequential(convbnrelu(filters=32, kernel_size=3, stride=2))
#         layer1._name = 'layer1'
#         encoder = tf.keras.Sequential()
#         encoder.add(layer1)
#         for t,c,n,s in (mobilenet_config):
#             for i in range(n) :
#                 encoder.add(InvertedResidualBlock(in_channels, c, t, stride=s if i==0 else 1, name=f'layer{c_layer}'))
#                 in_channels = c
#                 c_layer += 1

#         return encoder


In [145]:
net = HydraNet()

```
Layer(1) S1
    conv2d(32, k=3, s=2, padding=1, bias=False)
    batchnorm(eps=1e-05, momentum=0.1)
    relu(6) 
    
Layer(2) IRB
    conv2d(32, k=1, s=1, bias=False)
    batchnorm(eps=1e-05, momentum=0.1)
    relu(6) 
    
    
```    

## 2.2 — Defining the Encoder: A MobileNetv2
![](https://iq.opengenus.org/content/images/2020/11/conv_mobilenet_v2.jpg)

In [146]:
def conv3x3(filters, stride=1, bias=False, dilation=1, groups=1):
    # 3x3 convolution
    return tf.keras.layers.Conv2D(filters, kernel_size=3, strides=stride,
                     padding='same', dilation_rate=dilation, use_bias=bias, groups=groups)

In [147]:
# Test conv3x3
conv3x3(filters=32)

<keras.layers.convolutional.conv2d.Conv2D at 0x7f8e9c74a510>

In [148]:
def conv1x1(filters, stride=1, bias=False, groups=1):
    # 1x1 convolution
    return tf.keras.layers.Conv2D(filters, kernel_size=1, strides=stride,
                     padding='valid', use_bias=bias, groups=groups)

In [149]:
# Test conv1x1
conv1x1(filters=32)

<keras.layers.convolutional.conv2d.Conv2D at 0x7f8e9c74fb50>

In [150]:
def batchnorm():
    # batch norm 2d
    batch_norm = tf.keras.layers.BatchNormalization(epsilon=1e-5, momentum=0.1)
    batch_norm.trainable = True
    return batch_norm

In [151]:
# Test batchnorm
batchnorm()

<keras.layers.normalization.batch_normalization.BatchNormalization at 0x7f8e9c753bd0>

In [152]:
def convbnrelu(filters, kernel_size, stride=1, groups=1, act=True):
    # conv-batchnorm-relu
    if int(kernel_size/2) == 1 :
        padding = 'same'
    if int(kernel_size/2) == 0 :
        padding = 'valid'
    if act:
        return tf.keras.Sequential([tf.keras.layers.Conv2D(filters, kernel_size, strides=stride, padding=padding, groups=groups, use_bias=False),
                             batchnorm(),
                             tf.keras.layers.ReLU(max_value=6)])
    else:
        return tf.keras.Sequential([tf.keras.layers.Conv2D(filters, kernel_size, strides=stride, padding=padding, groups=groups, use_bias=False),
                             batchnorm()])

In [153]:
# Test convbnrelu
display(convbnrelu(32,3,1,1,True).layers)
print()
display(convbnrelu(32,3,1,1,False).layers)

[<keras.layers.convolutional.conv2d.Conv2D at 0x7f8e9c753950>,
 <keras.layers.normalization.batch_normalization.BatchNormalization at 0x7f8e9c74f890>,
 <keras.layers.activation.relu.ReLU at 0x7f8f769b0d50>]




[<keras.layers.convolutional.conv2d.Conv2D at 0x7f8f769b0d50>,
 <keras.layers.normalization.batch_normalization.BatchNormalization at 0x7f8e9c74a4d0>]

In [154]:
class InvertedResidualBlock(tf.keras.Model) :
    def __init__(self,in_planes, filters, expansion_factor, stride) :
        super(InvertedResidualBlock, self).__init__()
        intermed_planes = in_planes * expansion_factor
        self.residual = (in_planes == filters) and (stride == 1) # Boolean/Condition
        self.IBR = tf.keras.Sequential([convbnrelu(in_planes, kernel_size=1, stride=stride, act=True), 
                               convbnrelu(intermed_planes, kernel_size=3, 
                                          stride=stride, groups=intermed_planes, act=True), 
                               convbnrelu(filters, kernel_size=1, stride=stride, act=False)])
        
        def call(self, inputs) :
            x = self.IBR(inputs)
            if self.residual :
                return (x + inputs)
            else :
                return x

In [155]:
 def define_mobilenet(self):
        layers = []
        mobilenet_config =[[1, 16, 1, 1],
                    [6, 24, 2, 2],
                    [6, 32, 3, 2],
                    [6, 64, 4, 2],
                    [6, 96, 3, 1],
                    [6, 160, 3, 2],
                    [6, 320, 1, 1],
                    ]
        in_channels = 32
        c_layer = 2
        layer1 = tf.keras.Sequential(convbnrelu(filters=32, kernel_size=3, stride=2))
        layer1._name = 'layer1'
        layers.append(layer1)
        encoder = tf.keras.Sequential()
        encoder.add(layer1)
        layer_num = 2
        for t,c,n,s in (mobilenet_config):
            ibr = tf.keras.Sequential()
            for i in range(n) :
                block = InvertedResidualBlock(in_channels, c, t, stride=s if i==0 else 1)
                in_channels = c
                c_layer += 1
            ibr.add(block)
            ibr._name = f'layer{layer_num}'
            layer_num += 1
            layers.append(ibr)

        return encoder, layers

In [156]:
# def define_mobilenet(self):
#     mobilenet_config = [[1, 16, 1, 1], # expansion rate, output channels, number of repeats, stride
#                     [6, 24, 2, 2],
#                     [6, 32, 3, 2],
#                     [6, 64, 4, 2],
#                     [6, 96, 3, 1],
#                     [6, 160, 3, 2],
#                     [6, 320, 1, 1],
#                     ]
#     self.in_channels = 32 # number of input channels
#     self.num_layers = len(mobilenet_config)
#     self.layer1 = convbnrelu(3, self.in_channels, kernel_size=3, stride=2) # This is the first layer of the first 
#     c_layer = 2
#     for t,c,n,s in (mobilenet_config):
#         layers = []
#         for idx in range(n):
#             layers.append(InvertedResidualBlock(self.in_channels, c, expansion_factor=t, stride=s if idx == 0 else 1))
#             self.in_channels = c
#         setattr(self, 'layer{}'.format(c_layer), nn.Sequential(*layers)) # setattr(object, name, value)
#         c_layer += 1

# HydraNet.define_mobilenet = define_mobilenet

In [157]:
# net.define_mobilenet()
# for model in net.define_mobilenet()[1] :
#     print(model._name)   

In [205]:
def define_mobilenet(self):
    LAYERS=[]
    mobilenet_config = [[1, 16, 1, 1], # expansion rate, output channels, number of repeats, stride
                    [6, 24, 2, 2],
                    [6, 32, 3, 2],
                    [6, 64, 4, 2],
                    [6, 96, 3, 1],
                    [6, 160, 3, 2],
                    [6, 320, 1, 1],
                    ]
    self.in_channels = 32 # number of input channels
    self.num_layers = len(mobilenet_config)
    self.layer1 = convbnrelu(filters=32, kernel_size=3, stride=2) # This is the first layer of the first 
    layer1_model = tf.keras.Sequential(self.layer1)
    layer1_model._name = 'layer1'
    LAYERS.append(layer1_model)
    encoder = tf.keras.Sequential()
    encoder.add(layer1_model)
    c_layer = 2
    for t,c,n,s in (mobilenet_config):
        layers = []
        
        for idx in range(n):
            layers.append(InvertedResidualBlock(self.in_channels, c, expansion_factor=t, stride=s if idx == 0 else 1))
            self.in_channels = c
        model = tf.keras.Sequential(layers)
        model._name = f'layer{c_layer}'
        # print(model._name)
        encoder.add(model)
        LAYERS.append(tf.keras.Sequential(layers))
        c_layer += 1
        
    for model, i in zip(LAYERS[1:], range(2,9)):
        model._name = f'layer{i}'
        
    return encoder, LAYERS

# HydraNet.define_mobilenet = define_mobilenet

In [206]:
HydraNet.define_mobilenet = define_mobilenet

```
layer1 = 1,1
layer2 = 1,1
layer3 = 2,2
layer4 = 3,3
layer5 = 4,4
layer6 = 3,3
layer7 = 3,3
layer8 = 1,1
```

In [215]:
for model in net.define_mobilenet()[1] :
    print(model._name)

layer1
layer2
layer3
layer4
layer5
layer6
layer7
layer8


In [27]:
# layers = [tf.keras.layers.Dense(10, input_shape=(5,)), tf.keras.layers.Dense(5), tf.keras.layers.Dense(1)]

# tf.keras.Sequentialacosh(layers).summary()

<br>idx : 0
t : 1 	c : 16 	n : 1 	s : 1 	in_channels : 16</br>

<br>idx : 0
t : 6 	c : 24 	n : 2 	s : 2 	in_channels : 24</br>
idx : 1
t : 6 	c : 24 	n : 2 	s : 2 	in_channels : 24</br>


<br>idx : 0
t : 6 	c : 32 	n : 3 	s : 2 	in_channels : 32</br>
idx : 1
t : 6 	c : 32 	n : 3 	s : 2 	in_channels : 32</br>
idx : 2
t : 6 	c : 32 	n : 3 	s : 2 	in_channels : 32</br>


<br>idx : 0
t : 6 	c : 64 	n : 4 	s : 2 	in_channels : 64</br>
idx : 1
t : 6 	c : 64 	n : 4 	s : 2 	in_channels : 64</br>
idx : 2
t : 6 	c : 64 	n : 4 	s : 2 	in_channels : 64</br>
idx : 3
t : 6 	c : 64 	n : 4 	s : 2 	in_channels : 64</br>


<br>idx : 0
t : 6 	c : 96 	n : 3 	s : 1 	in_channels : 96</br>
idx : 1
t : 6 	c : 96 	n : 3 	s : 1 	in_channels : 96</br>
idx : 2
t : 6 	c : 96 	n : 3 	s : 1 	in_channels : 96</br>


<br>idx : 0
t : 6 	c : 160 	n : 3 	s : 2 	in_channels : 160</br>
idx : 1
t : 6 	c : 160 	n : 3 	s : 2 	in_channels : 160</br>
idx : 2
t : 6 	c : 160 	n : 3 	s : 2 	in_channels : 160</br>


<br>idx : 0
t : 6 	c : 320 	n : 1 	s : 1 	in_channels : 320</br>

In [1]:
# mobilenet_config =[[1, 16, 1, 1],
#                 [6, 24, 2, 2],
#                 [6, 32, 3, 2],
#                 [6, 64, 4, 2],
#                 [6, 96, 3, 1],
#                 [6, 160, 3, 2],
#                 [6, 320, 1, 1],
#                 ]
# in_channels = 32 # number of input channels
# num_layers = len(mobilenet_config)
# layer1 = convbnrelu(filters=3, kernel_size=3, stride=2, groups=1, act=True)
# c_layer = 2
# for t,c,n,s in (mobilenet_config):
# #         layers = []
#     for idx in range(n):
#         print('idx :', idx)
#         #layers.append(InvertedResidualBlock(in_channels, c, expansion_factor=t, stride=s if idx == 0 else 1))
#         in_channels = c
#         print('t :', t, '\tc :', c, '\tn :', n, '\ts :', s, '\tin_channels :', in_channels)
#     print('************************************************')
#     c_layer += 1

In [104]:
# def MobileNetV2(input_image = (None,None,3), n_classes=6):
#     inputs = Input (input_shape)
#     x = Conv2D(32,3,strides=(2,2),padding='same', use_bias=False)(input)
#     x = BatchNormalization(name='conv1_bn')(x)
#     x = ReLU(6, name='conv1_relu')(x)
#     # 17 Bottlenecks
#     x = depthwise_block(x,stride=1,block_id=1)
#     x = projection_block(x, out_channels=16,block_id=1)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 24, stride = 2,block_id = 2)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 24, stride = 1,block_id = 3)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 32, stride = 2,block_id = 4)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 32, stride = 1,block_id = 5)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 32, stride = 1,block_id = 6)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 64, stride = 2,block_id = 7)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 64, stride = 1,block_id = 8)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 64, stride = 1,block_id = 9)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 64, stride = 1,block_id = 10)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 96, stride = 1,block_id = 11)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 96, stride = 1,block_id = 12)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 96, stride = 1,block_id = 13)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 160, stride = 2,block_id = 14)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 160, stride = 1,block_id = 15)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 160, stride = 1,block_id = 16)
#     x = Bottleneck(x, t = 6, filters = x.shape[-1], out_channels = 320, stride = 1,block_id = 17)
#     x = Conv2D(filters = 1280,kernel_size = 1,padding='same',use_bias=False, name = 'last_conv')(x)
#     x = BatchNormalization(name='last_bn')(x)
#     x = ReLU(6,name='last_relu')(x)
#     x = GlobalAveragePooling2D(name='global_average_pool')(x)
#     output = Dense(n_classes,activation='softmax')(x)
#     model = Model(inputs, output)
#     return model

In [None]:
MobileNetV2

## 2.3 — Defining the Decoder - A Multi-Task Lighweight RefineNet
Paper: https://arxiv.org/pdf/1810.03272.pdf

![](https://d3i71xaburhd42.cloudfront.net/4d653b19ce1c7cba79fc2f11271fb90f7744c95c/4-Figure1-1.png)

In [None]:
class CRPBlock(nn.Module):
    """CRP definition"""
    def __init__(self, in_planes, out_planes, n_stages, groups=False):
        super().__init__() #Python 3
        for i in range(n_stages):
            setattr(self, '{}_{}'.format(i + 1, 'outvar_dimred'),
                    conv1x1(in_planes if (i == 0) else out_planes,
                            out_planes, stride=1,
                            bias=False, groups=in_planes if groups else 1)) #setattr(object, name, value)

        self.stride = 1
        self.n_stages = n_stages
        self.maxpool = nn.MaxPool2d(kernel_size=5, stride=1, padding=2)

    def forward(self, x):
        top = x
        for i in range(self.n_stages):
            top = self.maxpool(top)
            top = getattr(self, '{}_{}'.format(i + 1, 'outvar_dimred'))(top)#getattr(object, name[, default])
            x = top + x
        return x

In [None]:
def _make_crp(self, in_planes, out_planes, stages, groups=False):
    layers = #Call a CRP BLOCK in Layers
    return nn.Sequential(*layers)

HydraNet._make_crp = _make_crp

In [None]:
def define_lightweight_refinenet(self):
    ## Light-Weight RefineNet ##
    self.conv8 = conv1x1(320, 256, bias=False)
    self.conv7 = conv1x1(160, 256, bias=False)
    self.conv6 = conv1x1(96, 256, bias=False)
    self.conv5 = conv1x1(64, 256, bias=False)
    self.conv4 = conv1x1(32, 256, bias=False)
    self.conv3 = conv1x1(24, 256, bias=False)
    self.crp4 = self._make_crp(256, 256, 4, groups=False)
    self.crp3 = self._make_crp(256, 256, 4, groups=False)
    self.crp2 = self._make_crp(256, 256, 4, groups=False)
    self.crp1 = self._make_crp(256, 256, 4, groups=True)

    self.conv_adapt4 = conv1x1(256, 256, bias=False)
    self.conv_adapt3 = conv1x1(256, 256, bias=False)
    self.conv_adapt2 = conv1x1(256, 256, bias=False)

    self.pre_depth = conv1x1(256, #TODO: Define the Purple Pre-Head for Depth
    self.depth = #TODO: Define the Final layer of Depth
    self.pre_segm = #TODO: Call the Purple Pre-Head for Segm
    self.segm = #TODO: Define the Final layer of Segmentation
    self.relu = #TODO: Define a RELU 6 Operation

    if self.num_tasks == 3:
        pass
        #TODO: Create a Normal Head

HydraNet.define_lightweight_refinenet = define_lightweight_refinenet

In [None]:
hydranet.define_lightweight_refinenet()

## 2.4 — Define the HydraNet Forward Function

> ![](https://d3i71xaburhd42.cloudfront.net/435d4b5c30f10753d277848a17baddebd98d3c31/2-Figure1-1.png)

In [None]:
def forward(self, x):
    # MOBILENET V2
    x = self.layer1(x)
    x = self.layer2(x) # x / 2
    l3 = self.layer3(x) # 24, x / 4
    l4 = self.layer4(l3) # 32, x / 8
    l5 = self.layer5(l4) # 64, x / 16
    l6 = self.layer6(l5) # 96, x / 16
    l7 = self.layer7(l6) # 160, x / 32
    l8 = self.layer8(l7) # 320, x / 32

    # LIGHT-WEIGHT REFINENET
    l8 = self.conv8(l8)
    l7 = self.conv7(l7)
    l7 = self.relu(l8 + l7)
    l7 = self.crp4(l7)
    l7 = self.conv_adapt4(l7)
    l7 = nn.Upsample(size=l6.size()[2:], mode='bilinear', align_corners=False)(l7)

    l6 = self.conv6(l6)
    l5 = self.conv5(l5)
    l5 = self.relu(l5 + l6 + l7)
    l5 = self.crp3(l5)
    l5 = self.conv_adapt3(l5)
    l5 = nn.Upsample(size=l4.size()[2:], mode='bilinear', align_corners=False)(l5)

    l4 = self.conv4(l4)
    l4 = self.relu(l5 + l4)
    l4 = self.crp2(l4)
    l4 = self.conv_adapt2(l4)
    l4 = nn.Upsample(size=l3.size()[2:], mode='bilinear', align_corners=False)(l4)

    l3 = self.conv3(l3)
    l3 = self.relu(l3 + l4)
    l3 = self.crp1(l3)

    # HEADS
    #TODO: Design the 3 Heads
    out_segm = 
    out_segm = 
    out_segm = 

    out_d = 
    out_d = 
    out_d = 

    if self.num_tasks == 3:
        out_n = 
        out_n = 
        out_n = 
        return out_segm, out_d, out_n
    else:
        return out_segm, out_d

HydraNet.forward = forward

# 3 — Run the Model

## 3.1 — Load the Model Weights

In [217]:
# if torch.cuda.is_available():
#     _ = hydranet.cuda()
# _ = hydranet.eval()

In [218]:
ckpt = tf.train.load_checkpoint('ExpKITTI_joint.ckpt') #torch.load('ExpKITTI_joint.ckpt')
hydranet.load_state_dict(ckpt['state_dict'])

2022-12-19 22:20:47.516330: W tensorflow/core/util/tensor_slice_reader.cc:96] Could not open ExpKITTI_joint.ckpt: DATA_LOSS: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?


DataLossError: Unable to open table file ExpKITTI_joint.ckpt: DATA_LOSS: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

## 3.2 — Preprocess Images

In [None]:
IMG_SCALE  = 1./255
IMG_MEAN = np.array([0.485, 0.456, 0.406]).reshape((1, 1, 3))
IMG_STD = np.array([0.229, 0.224, 0.225]).reshape((1, 1, 3))

def prepare_img(img):
    return (img * IMG_SCALE - IMG_MEAN) / IMG_STD

## 3.3 — Load and Run an Image

In [None]:
# Pre-processing and post-processing constants #
CMAP = np.load('cmap_kitti.npy')
NUM_CLASSES = 6

In [None]:
print(CMAP)

In [None]:
import glob
images_files = glob.glob('data/*.png')
idx = np.random.randint(0, len(images_files))

img_path = images_files[idx]
img = np.array(Image.open(img_path))
plt.imshow(img)
plt.show()

In [None]:
#TODO: Define the Pipeline by filling the Blanks
def pipeline(img):
    with torch.no_grad():
        img_var = #Put the Image in PYTorch Variable
        if torch.cuda.is_available():
            img_var = # Send to GPU
        segm, depth = # Call the HydraNet
        segm = #PostProcess / Resize
        depth = #PostProcess / Resize
        segm = #Use the CMAP
        depth = #Take the Absolute Value
        return depth, segm

In [None]:
depth, segm = pipeline(img)

In [None]:
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(30,20))
ax1.imshow(img)
ax1.set_title('Original', fontsize=30)
ax2.imshow(segm)
ax2.set_title('Predicted Segmentation', fontsize=30)
ax3.imshow(depth, cmap="plasma", vmin=0, vmax=80)
ax3.set_title("Predicted Depth", fontsize=30)
plt.show()

## 3.4 — Run on a Video

In [None]:
print(img.shape)
print(depth.shape)
print(segm.shape)

In [None]:
import matplotlib.cm as cm
import matplotlib.colors as co

def depth_to_rgb(depth):
    normalizer = co.Normalize(vmin=0, vmax=80)
    mapper = cm.ScalarMappable(norm=normalizer, cmap='plasma')
    colormapped_im = (mapper.to_rgba(depth)[:, :, :3] * 255).astype(np.uint8)
    return colormapped_im

depth_rgb = depth_to_rgb(depth)
print(depth_rgb.shape)
plt.imshow(depth_rgb)
plt.show()

In [None]:
print(img.shape)
print(depth_rgb.shape)
print(segm.shape)
new_img = np.vstack((img, segm, depth_rgb))
plt.imshow(new_img)
plt.show()

In [None]:
video_files = sorted(glob.glob("data/*.png"))

# Build a HydraNet
hydranet = HydraNet()
hydranet.define_mobilenet()
hydranet.define_lightweight_refinenet()
hydranet._initialize_weights()

# Set the Model to Eval on GPU
if torch.cuda.is_available():
    _ = hydranet.cuda()
_ = hydranet.eval()

# Load the Weights
ckpt = torch.load('ExpKITTI_joint.ckpt')
hydranet.load_state_dict(ckpt['state_dict'])

# Run the pipeline
result_video = []
for idx, img_path in enumerate(video_files):
    image = np.array(Image.open(img_path))
    h, w, _ = image.shape 
    depth, seg = pipeline(image)
    result_video.append(cv2.cvtColor(cv2.vconcat([image, seg, depth_to_rgb(depth)]), cv2.COLOR_BGR2RGB))

out = cv2.VideoWriter('output/out.mp4',cv2.VideoWriter_fourcc(*'MP4V'), 15, (w,3*h))

for i in range(len(result_video)):
    out.write(result_video[i])
out.release()

In [None]:
from IPython.display import HTML
from base64 import b64encode
mp4 = open('output/out.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=800 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

## 3D Segmentation

Did you ever wonder... How is segmentation used in self-driving cars? Like, **once you have the map, what do you do with it**?
<p>
Let's see something called 3D Segmentation — Fusing a Depth Map with a Segmentation Map!
<p>

In my course [MASTER STEREO VISION](https://courses.thinkautonomous.ai/stereo-vision), I teach how to do something called **3D Reconstruction** from a Depth Map and Calibration Parameters. <p>
In this course, we're going to see how to do it with Open3D, my go-to library for Point Clouds, and we'll see how to build 3D Segmentation Algorithms by fusing the Depth Map (3D) with the Segmentation Map.

In [None]:
!pip install open3d==0.14.1

In [None]:
import open3d as o3d

In [None]:
o3d.__version__

### RGBD - Fuse the RGB Image and the Depth Map

The first thing we'll implement is to create an RGBD Image by fusing the RGB Image with the Depth Map. For that, we'll use [Open3D's Class RGBD Image](http://www.open3d.org/docs/release/python_api/open3d.geometry.RGBDImage.html) and the function create_from_color_and_depth(color, depth).<p>
It looks pretty straghtforward, we just need to make sure that the image are loaded as [Open3D Images](http://www.open3d.org/docs/release/python_api/open3d.geometry.Geometry.html?highlight=image#open3d.geometry.Geometry.Image).

In [None]:
rgbd = #TODO: Call the Function

Next, we'll use the function create_from_rgbd_image to build a Point Cloud based on this. For that, we'll need the camera's intrinsic parameters. <p>
If you'd like to learn more about this, I invite you to take my course on [Stereo Vision](https://courses.thinkautonomous.ai/stereo-vision). In this course, I'm just going to give'em to you.

In [None]:
o3d.camera.PinholeCameraIntrinsic??

In [None]:
intrinsics = o3d.camera.PinholeCameraIntrinsic(width = 1242, height = 375, fx = 721., fy = 721., cx = 609., cy = 609.)

In [None]:
point_cloud = #TODO: Create A Point Cloud
o3d.io.write_point_cloud("test.pcd", point_cloud)

### 3D Segmentation — Fuse the Segmentation Map with the Depth Map
From now on, the process is exactly the same. But instead of creating a Point Cloud from an RGBD Image with the Normal RGB Image, we'll do it with the Depth Map.

In [None]:
rgbd = #TODO: Call the Function

In [None]:
point_cloud = #TODO: Create A Point Cloud

In [None]:
o3d.io.write_point_cloud("test_segm.pcd", point_cloud)