# Data Preparation (important notes at the end)

Number of L and AB images should match. Our dataset is separated into l (25000 grayscale images), ab1 (10000), ab2 (10000) and ab3 (10000). Since this project is done in Google Colab, to avoid crashes we will first use 10000 of grayscale images with only ab1. This may lead to smaller accuracy.

In [None]:
import os
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt
import numpy as np
import cv2
import time

In [None]:
l_channel = np.load("image_colorization_data/l/gray_scale.npy")[:10000]
ab = np.load("image_colorization_data/ab/ab/ab1.npy")
print("Gray image shape:", l_channel.shape)
print("AB image shape:", ab.shape)

Gray image shape: (10000, 224, 224)
AB image shape: (10000, 224, 224, 2)


In [None]:
def resize_l_ab(l_array, ab_array, target_shape=(128, 128)):
    resized_l = []
    resized_ab = []

    for l_img, ab_img in zip(l_array, ab_array):
        # Resizing L channel
        l_resized = cv2.resize(l_img, target_shape, interpolation=cv2.INTER_AREA)

        # Resizin A and B channels separately
        a_resized = cv2.resize(ab_img[:, :, 0], target_shape, interpolation=cv2.INTER_AREA) #cv2.INTER_AREA is an interpolation method used in OpenCV for resizing images.
        b_resized = cv2.resize(ab_img[:, :, 1], target_shape, interpolation=cv2.INTER_AREA) #It uses pixel area relation for resampling, making it suitable for shrinking images (downsampling).
        ab_resized = np.stack((a_resized, b_resized), axis=-1)

        resized_l.append(l_resized)
        resized_ab.append(ab_resized)

    return np.array(resized_l), np.array(resized_ab)


In [None]:
l_channel, ab= resize_l_ab(l_channel, ab)
print("Gray image shape:", l_channel.shape)
print("AB image shape:", ab.shape) #printing new shape

Gray image shape: (10000, 128, 128)
AB image shape: (10000, 128, 128, 2)


Resized the images from 224x224 to 128x128 to reduce RAM usage and avoid crashes.

### Filter Outliers

In [None]:
# Removing over/under-exposed images (L channel)
mean_brightness = np.mean(l_channel, axis=(1, 2))
# Tighten the brightness range based on the distribution
valid_indices = np.where((mean_brightness >= 50) & (mean_brightness <= 170))[0]
l_filtered = l_channel[valid_indices]
ab_filtered = ab[valid_indices]

# Removing low-colorfulness images (AB channels)
colorfulness = np.std(ab, axis=(1, 2, 3))
# Increased threshold to remove bland/grayscale images
valid_indices = np.where(colorfulness > 10)[0]
l_filtered = l_channel[valid_indices]
ab_filtered = ab[valid_indices]

In [None]:
from sklearn.model_selection import train_test_split
l_train, l_test, ab_train, ab_test = train_test_split(l_filtered, ab_filtered, test_size=0.1, random_state=42)
l_train, l_val, ab_train, ab_val = train_test_split(l_train, ab_train, test_size=0.1, random_state=42)
#since we are no dealing with classes, we just used the regular 42 seed

In [None]:
#dimension is 3
if l_filtered.ndim == 3:
    l_filtered = l_filtered[..., np.newaxis]

# train_test_split
l_train, l_test, ab_train, ab_test = train_test_split(l_filtered, ab_filtered, test_size=0.1, random_state=42)
l_train, l_val, ab_train, ab_val = train_test_split(l_train, ab_train, test_size=0.1, random_state=42)

print(f"Shape of l_train after fix: {l_train.shape}") # Should be (..., 128, 128, 1)

Shape of l_train after fix: (4302, 128, 128, 1)


In [None]:
L_IN_MIN, L_IN_MAX = 0.0, 255.0
A_IN_MIN, A_IN_MAX = 43.0, 206.0
B_IN_MIN, B_IN_MAX = 22.0, 222.0 #l, a, b channels

In [None]:
def normalize_data(l_channel, ab_channels):
    """
    Casts data to float32 and normalizes from the CUSTOM source ranges to [-1, 1].
    """
    # Cast to float32 first
    #Neural networks perform calculations with floating-point numbers, so this step is essential.
    l_channel = tf.cast(l_channel, tf.float32)
    ab_channels = tf.cast(ab_channels, tf.float32)

    # Separate A and B channels from the (h, w, 2) tensor
    # We use slicing to keep the final dimension, which makes concatenation easy
    a_channel = ab_channels[..., 0:1]
    b_channel = ab_channels[..., 1:2]

    #Generic formula for mapping [min, max] to [-1, 1] is: 2 * (x - min) / (max - min) - 1
    l_norm = 2 * (l_channel - L_IN_MIN) / (L_IN_MAX - L_IN_MIN) - 1
    a_norm= 2 * (a_channel - A_IN_MIN) / (A_IN_MAX - A_IN_MIN) - 1
    b_norm= 2 * (b_channel - B_IN_MIN) / (B_IN_MAX - B_IN_MIN) - 1

    # Re-combine the normalized A and B channels
    ab_norm = tf.concat([a_norm, b_norm], axis=-1)

    return l_norm, ab_norm

l_train, ab_train =normalize_data(l_train, ab_train)
l_test, ab_test= normalize_data(l_test, ab_test)
l_val, ab_val=normalize_data(l_val, ab_val)

In [None]:
def augment(l_channel, ab_channels):
    """Applies identical random horizontal flip to both L and AB channels.""" #reason explained below
    if tf.random.uniform(()) > 0.5:
        l_channel = tf.image.flip_left_right(l_channel)
        ab_channels = tf.image.flip_left_right(ab_channels)
    return l_channel, ab_channels


# Notes about data preparation:

After performing changes that were previously announced we also took these steps:

* The filtered data was split into training, validation, and test sets.
* All L and AB channel data was then normalized from its original custom range to [-1, 1].
* Data Augmentation: A simple but effective data augmentation strategy was implemented by applying random horizontal flipping to the training dataset.This technique effectively doubles the variety of the training data without needing new images. It teaches the model that the color of an object is independent of its left-right orientation, making the model more robust and less prone to overfitting. While other augmentations like rotations, zooms, or color jitter could be used, they add complexity and potential artifacts (e.g., black padding from rotations). Horizontal flipping is a "safe" and computationally inexpensive augmentation that provides significant benefits for this task.