# Modeling
In this notebook I will create some deep learning models to address the colorization task.

In [8]:
import os
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

import tensorflow as tf

from tensorflow import keras

from keras.layers import Conv2D, UpSampling2D, Dense
from keras.models import Sequential
from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img, array_to_img

from sklearn.model_selection import train_test_split

from skimage.color import rgb2lab, lab2rgb
from skimage.transform import resize
from skimage.io import imsave, imshow

In [9]:
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

Num GPUs: 0


In [3]:
dataset_path = "../preprocessed/"
path = '../../dataset/'
models_path = "../../models/"
results_path = "../../results/"

In [5]:
N = 25000
W = 128
H = 128
SIZE = 128

### Image Size Problem
The literature addresses the colorization problem with rescaled images typically of size $256\times256$. However, due to limitated resources and time, I had to lower the number of input pixels to $128\times128$.

The difference in training time is quite large and it would be impossibile in a reasonable amount of time to test different model architecture an correctly validate some hyper-parameters.

Literature image input size: $$256\times256 = 2^8\times2^8 = 2^{16}$$

My lower image input size: $$128\times128 = 2^7\times2^7 = 2^{14}$$

There is a difference of $4$ times between the two image resolutions and this enables me to run $4$ times the number of test in order to validate my models. Eventually, once I found the best model, I will train it with the full $256\times256$ resolution, in order to compare to previous methods.

I know it is not an optimal procedure but I had to face time, memory and computational constraints.

## Load datasets

In [4]:
# resized = []
# grayscaled = []

In [8]:
# for f in tqdm(os.listdir(dataset_path+"resized/")):
#     resized = np.append(resized,np.load(dataset_path+"resized/"+f)["arr_0"])

In [None]:
# assert resized.shape == (N,W,H,3)

## Pre-processing

In [6]:
#Normalize images - divide by 255
train_datagen = ImageDataGenerator(rescale=1. / 255)

#Resize images, if needed
train = train_datagen.flow_from_directory(path+"download/", 
                                          target_size=(SIZE, SIZE), 
                                          batch_size=1000, 
                                          class_mode=None)

Found 1000 images belonging to 1 classes.


In [7]:
#Convert from RGB to Lab
"""
by iterating on each image, we convert the RGB to Lab. 
Think of LAB image as a grey image in L channel and all color info stored in A and B channels. 
The input to the network will be the L channel, so we assign L channel to X vector. 
And assign A and B to Y.
"""

X =[]
Y =[]
for img in tqdm(train[0]):
    try:
        lab = rgb2lab(img)
        X.append(lab[:,:,0]) 
        Y.append(lab[:,:,1:] / 128) #A and B values range from -127 to 128, 
        #so we divide the values by 128 to restrict values to between -1 and 1.
    except:
        print('error')
        
X = np.array(X)
Y = np.array(Y)
X = X.reshape(X.shape+(1,)) # dimensions to be the same for X and Y
print(X.shape)
print(Y.shape)

100%|█████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:07<00:00, 125.00it/s]


(1000, 128, 128, 1)
(1000, 128, 128, 2)


In [10]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
print("Train",X_train.shape, Y_train.shape)
print("Test",X_test.shape, Y_test.shape)

Train (800, 128, 128, 1) (800, 128, 128, 2)
Test (200, 128, 128, 1) (200, 128, 128, 2)


## Training

### Simple Deep AutoEncoder with 3+2 dense layers

This model is not powerful enough to reproduce the task of colorization.
With less than 85k trainable parameters, a simple deep autoencoder with only dense layers is not a valid option.

More important, I can clearly see that this model is not suitable for this task because even with a small dataset (100 images) and lots of epochs (50) it is not able to overfit the training data.

In [46]:
# Encoder
model = Sequential(name=("AE_Simple_Dense"))
model.add(Dense(128,activation="relu", input_shape=(SIZE, SIZE, 1)))
model.add(Dense(64,activation="relu"))
model.add(Dense(32,activation="relu"))

# Decoder
model.add(Dense(64,activation="relu"))
model.add(Dense(128,activation="relu"))

# output layer
model.add(Dense(2, activation='tanh'))
model.compile(optimizer='adam', loss='mse' , metrics=['accuracy'])

### Simple Deep AutoEncoder with 3+2 convolutional layers

In order to increase the power of the hidden state representation I used 2D convolutional layers instead of dense layers.

This autoencoder has the same number of filters compared to number of neurons of the previous simple dense autoencoder.

In [47]:
#Encoder
model = Sequential(name=("AE_Simple_Conv"))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(SIZE, SIZE, 1)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))

#Decoder
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(Conv2D(64, (3,3), activation='relu', padding='same'))          

# output layer
model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))
model.compile(optimizer='adam', loss='mse' , metrics=['accuracy'])

### Complex Deep AutoEncoder with several layers and upsampling

In [54]:
#Encoder
model = Sequential(name=("AE_Complex"))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', strides=2, input_shape=(SIZE, SIZE, 1)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3,3), activation='relu', padding='same', strides=2))
model.add(Conv2D(256, (3,3), activation='relu', padding='same'))
model.add(Conv2D(256, (3,3), activation='relu', padding='same', strides=2))
model.add(Conv2D(512, (3,3), activation='relu', padding='same'))
model.add(Conv2D(512, (3,3), activation='relu', padding='same'))
model.add(Conv2D(256, (3,3), activation='relu', padding='same'))

#Decoder
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(64, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(32, (3,3), activation='relu', padding='same'))
model.add(Conv2D(16, (3,3), activation='relu', padding='same'))
model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))
model.add(UpSampling2D((2, 2)))
model.compile(optimizer="adam", loss='mse' , metrics=['accuracy'])

### Hyper-parameter selection

Related to model architecture
- number of layers
- conv vs no-conv
- upsampling y/n
- maxpool y/n
- batch norm y/n

Related to training
- epochs
- validation split
- batch size

In [48]:
model.summary()

Model: "AE_Simple_Conv"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_6 (Conv2D)            (None, 128, 128, 64)      640       
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 128, 128, 128)     73856     
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 128, 128, 256)     295168    
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 128, 128, 128)     295040    
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 128, 128, 64)      73792     
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 128, 128, 2)       1154      
Total params: 739,650
Trainable params: 739,650
Non-trainable params: 0
______________________________________________

In [49]:
history = model.fit(X,Y,validation_split=0.2, epochs=50, batch_size=16)

Epoch 1/50
Epoch 2/50
 5/50 [==>...........................] - ETA: 6:40 - loss: 1.0212 - accuracy: 0.4650

KeyboardInterrupt: 

### History plots

In [59]:
fig, axes = plt.subplots(1, 2, figsize=(15, 4))
ax = axes.ravel()

# accuracy
ax[0].plot(history.history['accuracy'])
ax[0].plot(history.history['val_accuracy'])
ax[0].set_title('model accuracy')
ax[0].set_ylabel('accuracy')
ax[0].set_xlabel('epochs')
ax[0].legend(['train', 'validation'], loc='upper left')

# summarize history for loss
ax[1].plot(history.history['loss'])
ax[1].plot(history.history['val_loss'])
ax[1].set_title('model loss')
ax[1].set_ylabel('loss')
ax[1].set_xlabel('epochs')
ax[1].legend(['train', 'validation'], loc='upper right')

fig.tight_layout()
plt.show()      

### Save the model to disk

In [None]:
model.save(models_path+model.name)

---

### Load the model from disk

In [72]:
# choose model to load
model_name = model.name
# model_name = "colorize_autoencoder_VGG16_10000.model"

In [73]:
# loads a model from disk into memory
model = tf.keras.models.load_model(
    models_path+model_name,
    custom_objects=None,
    compile=True)

## Testing

Final model testing.

> I do **not** validate on this set, the following metrics are just meant to be _reported_ and not used as indicators.

In this phase I will also compare some other algorithms.

In [69]:
test_loss, test_acc = model.evaluate(X_test, Y_test, batch_size=128, verbose=0)

print(f"Accuracy loss: {test_loss} test: {test_acc}")

Accuracy loss: 0.02247125655412674 test: 0.665999174118042


## Visualization

In [38]:
def plot_comparison(img_resized,img_recolored, figsize=(10,5)):     
    
    fig, axes = plt.subplots(1, 2, figsize=figsize)
    ax = axes.ravel()

    ax[0].imshow(img_resized.astype('uint8'))
    ax[0].set_title("Resized")
    
    ax[1].imshow(img_recolored)
    ax[1].set_title("Recolored")

    fig.tight_layout()
    plt.show()      

### Visualization from test set

In [65]:
for i, x in enumerate(X_test[:10]):    

    output = model.predict(x)
    output = output*128
    result = np.zeros((SIZE, SIZE, 3))
    result[:,:,0] = img1_color[0][:,:,0]
    result[:,:,1:] = output[0]

    imsave(results_path, (lab2rgb(result)*255).astype(np.uint8))
    recolored = lab2rgb(result)
    
    plot_comparison(img1, recolored,(5,2.5))

ValueError: in user code:

    C:\Users\ghiot\anaconda3\lib\site-packages\keras\engine\training.py:1544 predict_function  *
        return step_function(self, iterator)
    C:\Users\ghiot\anaconda3\lib\site-packages\keras\engine\training.py:1527 run_step  *
        outputs = model.predict_step(data)
    C:\Users\ghiot\anaconda3\lib\site-packages\keras\engine\training.py:1500 predict_step  *
        return self(x, training=False)
    C:\Users\ghiot\anaconda3\lib\site-packages\keras\engine\base_layer.py:989 __call__  *
        input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
    C:\Users\ghiot\anaconda3\lib\site-packages\keras\engine\input_spec.py:227 assert_input_compatibility  *
        raise ValueError('Input ' + str(input_index) + ' of layer ' +

    ValueError: Input 0 of layer AE_Simple is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (32, 256, 1)


### Visualization with raw images

In [66]:
visualization_path = path+"download/landscapes1k/" # modify here to load images from a different folder

for img_path in os.listdir(visualization_path)[:10]:    
    
    img = img_to_array(load_img(visualization_path+img_path))
    img = resize(img, (SIZE,SIZE))
    
    img_color = np.array(img, dtype=float)
    img_color = rgb2lab(1.0/255*img_color)[:,:,:,0]
    img_color = img_color.reshape(img_color.shape+(1,))

    output = model.predict(img_color)
    output = output*128
    
    result = np.zeros((SIZE, SIZE, 3))
    result[:,:,0] = img_color[0][:,:,0]
    result[:,:,1:] = output[0]

    recolored = lab2rgb(result)
    imsave(results_path+img_path, (recolored*255).astype(np.uint8))
    
    plot_comparison(img, recolored)

IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed