<h1 align="center">☁️ - Cloudy regions segmentation 👨‍💻🔬</h1>

<h2 align="center">Modelling - 2st iteration - Semantic Segmentation</h2>
<p style="text-align:center">
   Thomas Bury, Afonso Alves, Daniel Staudegger<br>
   Allianz<br>
</p>

In [2]:
import numpy as np 
import pandas as pd
import seaborn as sns
import scicomap as sc
import matplotlib as mpl
import yaml
from pprint import pprint
import cv2
import matplotlib.pyplot as plt
import os

#To get a progress bar for long loops:
from tqdm.notebook import trange, tqdm
from time import sleep

from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Activation, Input
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D

# Custom package for the project, save all the functions into appropriate sub-packages
from pyreidolia.plot import set_my_plt_style, plot_cloud, plot_rnd_cloud, draw_label_only
from pyreidolia.mask import bounding_box, rle_to_mask, get_binary_mask_sum, mask_to_rle
from pyreidolia.img import get_resolution_sharpness

In [3]:
%load_ext autoreload
%autoreload 1

In [4]:
%aimport pyreidolia

In [5]:
# A nicer style for mpl
set_my_plt_style(height=6, width=8, linewidth=1.5)

# A better colormap


sc_map = sc.ScicoSequential(cmap='tropical')
sc_map.unif_sym_cmap(lift=None, 
                     bitonic=False, 
                     diffuse=True)
sc_cmap = sc_map.get_mpl_color_map()

mpl.cm.register_cmap("tropical", sc_cmap)

sc_map = sc.ScicoSequential(cmap='neutral')
sc_map.unif_sym_cmap(lift=None, 
                     bitonic=False, 
                     diffuse=True)
sc_cmap = sc_map.get_mpl_color_map()

mpl.cm.register_cmap("neutral", sc_cmap)

sc_map = sc.ScicoSequential(cmap='neutral_r')
sc_map.unif_sym_cmap(lift=None, 
                     bitonic=False, 
                     diffuse=True)
sc_cmap = sc_map.get_mpl_color_map()

mpl.cm.register_cmap("neutral_r", sc_cmap)

#### Load the config file for the paths

In [6]:
def string_print(df):
    return print(df.to_string().replace('\n', '\n\t'))

In [7]:
# Where is my yaml ? "C:/Users/xtbury/Documents/Projects/Pyreidolia/paths.yml"

paths_yml = input("where is the paths.yml config file?")
with open(paths_yml, "r") as ymlfile:
    path_dic = yaml.load(ymlfile, Loader=yaml.FullLoader)

pprint(path_dic)

where is the paths.yml config file?C:\Users\e400086\Desktop\Docs\DS_Certificate_Sorbonne_and_CV\02_Project_Cloud_Regions\paths.yml
{'data': {'docs': 'C:/Users/e400086/Desktop/Docs/DS_Certificate_Sorbonne_and_CV/02_Project_Cloud_Regions/00_Docs_and_Links/',
          'test': 'C:/Users/e400086/Desktop/Docs/DS_Certificate_Sorbonne_and_CV/02_Project_Cloud_Regions/01_Data/test_images/',
          'train': 'C:/Users/e400086/Desktop/Docs/DS_Certificate_Sorbonne_and_CV/02_Project_Cloud_Regions/01_Data/train_images/'},
 'notebooks': 'C:/Users/e400086/Desktop/Docs/DS_Certificate_Sorbonne_and_CV/02_Project_Cloud_Regions/03_Notebooks/',
 'output': 'C:/Users/e400086/Desktop/Docs/DS_Certificate_Sorbonne_and_CV/02_Project_Cloud_Regions/02_Outputs/',
 'reports': 'C:/Users/e400086/Desktop/Docs/DS_Certificate_Sorbonne_and_CV/02_Project_Cloud_Regions/07_Reports/',
 'scripts': 'C:/Users/e400086/Desktop/Docs/DS_Certificate_Sorbonne_and_CV/02_Project_Cloud_Regions/04_Scripts/',
 'studies': 'C:/Users/e400086

In [8]:
train_csv_path = path_dic['data']['docs'] + 'train.csv'
train_pq_path = path_dic['data']['docs'] + "train_info_clean.parquet"
train_data = path_dic['data']['train'] 
test_data = path_dic['data']['test'] 
report_path = path_dic['reports']

## Semantic Segmentation modelling
### 1) Load the `X_train2` from file
**Note:** you need to at least run the _01a_preprocessing.ipynb_ section that generates **X_train2.npy** and **X_valid2.npy**.  

In [9]:
X_train2 = np.load(path_dic['data']['docs'] +'X_train2.npy')
print(">> X_train2 memory size: %.2f Gb."
      % (X_train2.nbytes*1.0*10**(-9)))

>> X_train2 memory size: 5.44 Gb.


* We don't need to load `X_valid2` at the moment; avoid doing it to reduce memory allocation.

### 2.a) Modelling - Fish cloud
The plan is to train 4 separate NN, each trying to predict a different cloud label. We should only load into memory data that we need, and delete data that's not being used.  
**Load the target arrays of Fish (total of 42):**

In [10]:
row_px_data, col_px_data = 700, 467
print("(row_px_data, col_px_data):", row_px_data,",", col_px_data)
row_px_target, col_px_target = 420, 280
print("(row_px_target, col_px_target):", row_px_target,",", col_px_target)

(row_px_data, col_px_data): 700 , 467
(row_px_target, col_px_target): 420 , 280


In [11]:
y_train2_SS_Fish = np.empty((0, row_px_target*col_px_target))

for section_number in range(0,41+1):
    #Load from local memory:
    target_section = np.load(path_dic['data']['docs'] +'y_train2_SS_Fish_'+str(section_number)+'.npy')
    #Add to main array:
    y_train2_SS_Fish = np.append(y_train2_SS_Fish, target_section, axis=0)

In [12]:
print(">> y_train2_SS_Fish memory size: %.2f Gb."
      % (y_train2_SS_Fish.nbytes*1.0*10**(-9)))

>> y_train2_SS_Fish memory size: 3.91 Gb.


In [13]:
print(">> y_train2_SS_Fish shape: "+ str(y_train2_SS_Fish.shape)+".")

>> y_train2_SS_Fish shape: (4160, 117600).


**Transform the data into a 4-dimensional array (nb_images, width, height, depth)**. Each of the images will be resized to (row_px, col_px, 1):

In [14]:
X_train2 = X_train2.reshape((-1, row_px_data, col_px_data, 1))
print('New shape of X_train2:', X_train2.shape)

New shape of X_train2: (4160, 700, 467, 1)


**LeNet neural network - create layer structure:**

In [26]:
inputs=Input(shape = (700, 467, 1), name = "Input")

conv_1 = Conv2D(filters = 30,
                kernel_size = (5, 5),
                padding = 'valid',
                input_shape = (700, 467, 1),
                activation = 'relu')

max_pool_1 = MaxPooling2D(pool_size = (2, 2))

conv_2 = Conv2D(filters = 30,                    
                kernel_size = (3, 3),          
                padding = 'valid',             
                activation = 'relu')

max_pool_2 = MaxPooling2D(pool_size = (2, 2))

flatten = Flatten()

dropout = Dropout(rate = 0.2)

dense_1 = Dense(units = 2000, activation = 'relu')
dense_2 = Dense(units = 1000, activation = 'relu')
dense_3 = Dense(units = 500, activation = 'relu')

dense_4 = Dense(units = row_px_target* col_px_target, activation = 'softmax')

**Add layers:**

In [27]:
x=conv_1(inputs)
x=max_pool_1(x)
x=conv_2(x)
x=max_pool_2(x)

x=dropout(x)
x=flatten(x)
x=dense_1(x)
x=dense_2(x)
x=dense_3(x)
outputs=dense_4(x)

lenet = Model(inputs = inputs, outputs = outputs)

**Compile:**

In [28]:
# Compilation
lenet.compile(loss='BinaryCrossentropy',
              optimizer='adam',
              metrics=['accuracy'])

**Fit:**

In [29]:
training_history_lenet = lenet.fit(X_train2, y_train2_SS_Fish,
                                   validation_split = 0.2,
                                   epochs = 2,
                                   batch_size = 25)

Epoch 1/2


ResourceExhaustedError: in user code:

    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\engine\training.py", line 878, in train_function  *
        return step_function(self, iterator)
    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\engine\training.py", line 867, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\engine\training.py", line 860, in run_step  **
        outputs = model.train_step(data)
    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\engine\training.py", line 816, in train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 532, in minimize
        return self.apply_gradients(grads_and_vars, name=name)
    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 639, in apply_gradients
        self._create_all_weights(var_list)
    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 830, in _create_all_weights
        self._create_slots(var_list)
    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\optimizer_v2\adam.py", line 119, in _create_slots
        self.add_slot(var, 'v')
    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 916, in add_slot
        weight = tf.Variable(
    File "C:\Users\e400086\.conda\envs\pyreidolia\lib\site-packages\keras\initializers\initializers_v2.py", line 144, in __call__
        return tf.zeros(shape, dtype)

    ResourceExhaustedError: OOM when allocating tensor with shape[591660,2000] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [Op:Fill]


In [22]:
300*300

90000

In [24]:
X_valid2 = np.load(path_dic['data']['docs'] +'X_valid2.npy')
print(">> X_valid2 memory size: %.2f Gb."
      % (X_valid2.nbytes*1.0*10**(-9)))

>> X_valid2 memory size: 1.81 Gb.


In [25]:
del X_valid2

In [1]:
del X_train2, y_train2_SS_Fish

NameError: name 'X_train2' is not defined