<br>
<br>

![](https://upload.wikimedia.org/wikipedia/en/5/5f/Western_Institute_of_Technology_and_Higher_Education_logo.png)

**InstitutoTecnológico y de Estudios Superiores de Occidente**

**Maestría Ciencia de Datos**

**Aprendizaje Profundo**

# Proyecto: clasificación y localización de objetos #

<br>
<br>

* * *

Estudiante: Daniel Nuño <br>
Profesor: Dr. Francisco Cervantes <br>
Fecha entrega: Marzo 26, 2023 <br>

* * *

<br>
<br>

## Libraries

In [1]:
import os
import random
import cv2 as cv
import numpy as np

import tensorflow as tf
from tensorflow import keras
from tensorflow.data import AUTOTUNE

#from tensorflow_addons.losses import GIoULoss

from tensorflow.keras.layers import Conv2D, Flatten, Dense, Input, GlobalAveragePooling2D, Dropout
from tensorflow.keras.applications import InceptionResNetV2
from tensorflow.keras.models import Model
from tensorflow.keras.utils import plot_model

In [21]:
import pickle

In [None]:
with open('training_list.data', 'rb') as filehandle:
    # Store the data as a binary data stream
    training_list = pickle.load(filehandle)

with open('validation_list.data', 'rb') as filehandle:
    # Store the data as a binary data stream
    validation_list = pickle.load(filehandle)

In [23]:
with open('training_list.data', 'wb') as filehandle:
    # Store the data as a binary data stream
    pickle.dump(training_list, filehandle)

with open('validation_list.data', 'wb') as filehandle:
    # Store the data as a binary data stream
    pickle.dump(validation_list, filehandle)

## Pre processing data and functions

Because the folder structure for training is different than validation and test, the pre processing is differently.

### Pre processing training data

Set paths for training and ids-categories

In [2]:
img_path= "C:/Users/nuno/Desktop/deep-learning-data/proyecto1/tiny-imagenet-200/train"

project_id_path = "C:/Users/nuno/Desktop/deep-learning-data/proyecto1/tiny-imagenet-200/wnids.txt"
all_id_cat_path = "C:/Users/nuno/Desktop/deep-learning-data/proyecto1/tiny-imagenet-200/words.txt"

Read project ids as list. Make sure it has unique values. Make and index of integers from it.

In [3]:
project_id_list = []
with open(project_id_path) as f:
    for line in f:
        project_id_list.append(line.strip())

Make sure it has unique values.

In [4]:
len(set(project_id_list)) == len(project_id_list)

True

Make an index of integers from it.

In [5]:
project_index_dict = {value: index for index, value in enumerate(project_id_list)}

In [6]:
len(project_index_dict.values())

200

Read file of categories as dictionary.

In [7]:
id_cat_dict = dict()
with open(all_id_cat_path, 'r') as f:
    for line in f:
        resulting_line = line.strip().split('\t')
        id_cat_dict[resulting_line[0]] = resulting_line[1]

Get all list of files while separating bounding box files frome images.

In [8]:
training_files_img = []
training_files_bb = []
for dirpath, dirnames, filenames in os.walk(img_path):
    for filename in filenames:
        path = os.path.join(dirpath, filename)
        if path.endswith('txt'):
            training_files_bb.append(path)
        else:
            training_files_img.append(path)

In [9]:
len(training_files_bb), len(training_files_img)

(200, 100000)

Process bounding boxes

In [10]:
bb_dict = dict()
for file in training_files_bb:
    with open(file, 'r') as f:
        for line in f:
            img_name, xmin, ymin, xmax, ymax = line.strip().split('\t')
            bb_dict[img_name] = [xmin, ymin, xmax, ymax]

Check elements of dictionary of bounding boxes

In [11]:
len(set(bb_dict.keys()))

100000

Create data set list that returns img full path, category, bounding box. Double back slash **'\\'** might needed to be updated depending on the users system.

In tensorflow you can't have a tensor with more than one data type (tf.data.Dataset.from_tensor_slices function). Hence a workaround could be to create a tensor with data type tf.String and, on the occurrence (load_element function below), cast the field to the desired data type.

> It is not possible to have a tf.Tensor with more than one data type. It is possible, however, to serialize arbitrary data structures as strings and store those in tf.Tensors.

In [12]:
training_list = []
for count, file in enumerate(training_files_img):
    #get category and file name
    _, category_key, _, image_name = training_files_img[count].split('\\')
    #convert category to index from dictionary
    category = project_index_dict[category_key]
    #open image
    img = cv.imread(file)
    #get dimensions
    h, w, _ = img.shape
    #get correct size bounding box
    original_bb = bb_dict[image_name]
    rs_bb = [float(original_bb[0])/w,
            float(original_bb[1])/h,
            float(original_bb[2])/w,
            float(original_bb[3])/h,
            ]
    # treat it as list
    #example = (file, category, rs_bb)
    #treat it as one string delimited by comas
    example = "".join([file, ',',
                        str(category), ',',
                        str(rs_bb[0]), ',',
                        str(rs_bb[1]), ',',
                        str(rs_bb[2]), ',',
                        str(rs_bb[3])])
    #appended to final list
    training_list.append(example)

Shuffle the list to grab randomly in each batch.

In [13]:
random.shuffle(training_list)

Check the lenght of the list.

In [14]:
len(training_list)

100000

### Pre processing validation data

Path of images and annotations.

In [15]:
img_path_val = "C:/Users/nuno/Desktop/deep-learning-data/proyecto1/tiny-imagenet-200/val/images"
val_annotations_txt = "C:/Users/nuno/Desktop/deep-learning-data/proyecto1/tiny-imagenet-200/val/val_annotations.txt"

Process annotations that contains category and bounding boxes.

In [19]:
validation_list = list()
with open(val_annotations_txt, 'r') as f:
    for line in f:
        img_name, category_key, xmin, ymin, xmax, ymax = line.strip().split('\t')
        full_path = img_path_val + '/' + img_name
        category = project_index_dict[category_key]
        img = cv.imread(full_path)
        h, w, _ = img.shape
        rs_xmin = float(xmin)/w
        rs_ymin = float(ymin)/h
        rs_xmax = float(xmax)/w
        rs_ymax = float(ymax)/h
        rs_bb = [rs_xmin, rs_ymin, rs_xmax, rs_ymax]
        #treat it as list
        #example = (full_path, category, rs_bb)
        #treat it as one string delimited by comas
        example = "".join([full_path, ',',
                        str(category), ',',
                        str(rs_bb[0]), ',',
                        str(rs_bb[1]), ',',
                        str(rs_bb[2]), ',',
                        str(rs_bb[3])])
        validation_list.append(example)

Check training and validation have the same format.

In [101]:
training_list[0]

'C:/Users/nuno/Desktop/deep-learning-data/proyecto1/tiny-imagenet-200/train\\n01917289\\images\\n01917289_323.JPEG,164,0.015625,0.078125,0.9375,0.8125'

In [20]:
validation_list[0]

['C:/Users/nuno/Desktop/deep-learning-data/proyecto1/tiny-imagenet-200/val/images/val_0.JPEG,163,0.0,0.5,0.6875,0.96875',
 'C:/Users/nuno/Desktop/deep-learning-data/proyecto1/tiny-imagenet-200/val/images/val_1.JPEG,1,0.8125,0.859375,0.890625,0.921875',
 'C:/Users/nuno/Desktop/deep-learning-data/proyecto1/tiny-imagenet-200/val/images/val_2.JPEG,132,0.0625,0.0,0.9375,0.859375']

### Load function and dataset

Because we want to use TensorFlow in batches, it is important to use TensorFlow classes. TensorFlow Dataset allows to load data in small groups, instead of loading all into memory at once.

> Supports writing descriptive and efficient input pipelines. Dataset usage follows a common pattern:

> - Create a source dataset from your input data.
> - Apply dataset transformations to preprocess the data.
>- Iterate over the dataset and process the elements.

> Iteration happens in a streaming fashion, so the full dataset does not need to fit into memory.

The following function load and treat the image element by element. To be used in the pre-fetched function. Each element is a string delimited by comas:
- full image name, category, ,x min, y min, x max, y max.

- category is an integer from indexing. To use Sparse Categorical Cross Entropy as loss. Expexted to save time in memory as well as computation because it simply uses a single integer for a class, rather than a whole vector. Activation function use softmax and neurons is num_classes. y_pred is [batch_size, num_classes].
- bounding box is a list of 4 float numbers: x_min, y_min, x_max, y_max. Activation function use linear and neurons is 4. y_pred is [batch_size, 4].

The function return a pair on input and targets (category and bounding box). This format is required for *model fit method*:

> Args
> x: Input data. It could be: ...A tf.data dataset. Should return a tuple of either (inputs, targets) or (inputs, targets, sample_weights)...

In [7]:
def load_element(element):
    #make tensors list delimited by ,
    element = tf.strings.split(element, sep=",")
    #load image
    img = tf.io.read_file(element[0])
    #make sure is 3 channels
    img = tf.image.decode_jpeg(img, channels=3)
    #conver to float [0,1)
    #img = tf.image.convert_image_dtype(img, dtype=tf.float16)
    #resize
    img = tf.image.resize(img, (128, 128))
    #category
    #category = tf.constant(element[1])
    category =tf.strings.to_number(element[1], tf.int32)
    #bounding box
    x_min = tf.strings.to_number(element[2])
    y_min = tf.strings.to_number(element[3])
    x_max = tf.strings.to_number(element[4])
    y_max = tf.strings.to_number(element[5])
    #bb = [y_min, x_min, y_max, x_max]
    bb = [x_min, y_min, x_max, y_max]

    labels = {'class_output': category, 'box_output':bb}

    return (img, labels)

In [8]:
input, labels = load_element(training_list[0])

In [9]:
input

<tf.Tensor: shape=(128, 128, 3), dtype=float32, numpy=
array([[[ 36.    , 104.    ,  89.    ],
        [ 32.5   , 101.25  ,  86.    ],
        [ 25.5   ,  95.75  ,  80.    ],
        ...,
        [ 77.25  , 174.5   , 153.75  ],
        [ 69.75  , 167.5   , 145.25  ],
        [ 66.    , 164.    , 141.    ]],

       [[ 28.75  ,  96.75  ,  81.75  ],
        [ 25.875 ,  94.4375,  79.25  ],
        [ 20.125 ,  89.8125,  74.25  ],
        ...,
        [ 77.3125, 173.9375, 152.375 ],
        [ 70.4375, 167.3125, 144.625 ],
        [ 67.    , 164.    , 140.75  ]],

       [[ 14.25  ,  82.25  ,  67.25  ],
        [ 12.625 ,  80.8125,  65.75  ],
        [  9.375 ,  77.9375,  62.75  ],
        ...,
        [ 77.4375, 172.8125, 149.625 ],
        [ 71.8125, 166.9375, 143.375 ],
        [ 69.    , 164.    , 140.25  ]],

       ...,

       [[ 26.25  ,  85.25  ,  63.25  ],
        [ 27.3125,  86.9375,  64.5625],
        [ 29.4375,  90.3125,  67.1875],
        ...,
        [ 47.25  , 131.625 , 107.4

In [10]:
labels

{'class_output': <tf.Tensor: shape=(), dtype=int32, numpy=164>,
 'box_output': [<tf.Tensor: shape=(), dtype=float32, numpy=0.015625>,
  <tf.Tensor: shape=(), dtype=float32, numpy=0.078125>,
  <tf.Tensor: shape=(), dtype=float32, numpy=0.9375>,
  <tf.Tensor: shape=(), dtype=float32, numpy=0.8125>]}

Following function **tf.data.Dataset** represents a potentially large set of elements. Iteration in training a streaming fashion.

In [11]:
bath_size = 16
train_dataset = tf.data.Dataset.from_tensor_slices(training_list)
train_dataset = (train_dataset
                 .shuffle(len(training_list))
                 .map(load_element, num_parallel_calls=AUTOTUNE)
                 .cache()
                 .batch(bath_size)
                 .prefetch(AUTOTUNE)
                 )
train_dataset.element_spec

(TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name=None),
 {'class_output': TensorSpec(shape=(None,), dtype=tf.int32, name=None),
  'box_output': TensorSpec(shape=(None, 4), dtype=tf.float32, name=None)})

In [12]:
val_dataset = tf.data.Dataset.from_tensor_slices(validation_list)
val_dataset = (val_dataset
                 .shuffle(len(validation_list))
                 .map(load_element, num_parallel_calls = AUTOTUNE)
                 .cache()
                 .batch(bath_size)
                 .prefetch(AUTOTUNE)
                 )

val_dataset.element_spec

(TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name=None),
 {'class_output': TensorSpec(shape=(None,), dtype=tf.int32, name=None),
  'box_output': TensorSpec(shape=(None, 4), dtype=tf.float32, name=None)})

## Define error (loss/metrics) function

### Generalized intersection over union

In [185]:
def my_GIoU(bb_true, bb_pred):
    #make zero as tensor
    zero = tf.convert_to_tensor(0.0, bb_true.dtype)
    #convert them to tensor clases
    Ax1, Ay1, Ax2, Ay2 = tf.unstack(bb_true, 4, axis=-1)
    Bx1, By1, Bx2, By2 = tf.unstack(bb_pred, 4, axis=-1)

    #for the bounding box predicted make sure Bx2 > Bx1 y By2 > By1
    bx1 = tf.math.minimum(Bx1, Bx2)
    by1 = tf.math.minimum(By1, By2)
    bx2 = tf.math.maximum(Bx1, Bx2)
    by2 = tf.math.maximum(By1, By2)

    #calculate area of true bounding box
    A_area = (Ax2 - Ax1)*(Ay2 - Ay1)
    #calculate area of predicted bounding box
    B_area = (bx2 - bx1)*(by2 - by1)
    
    #calculate intersection over true and pred
    #find the box overlaps both boxes
    #each inter calculates the smallest stride
    x_inter_1 = tf.math.maximum(bx1, Ax1)
    y_inter_1 = tf.math.maximum(by1, Ay1)
    x_inter_2 = tf.math.minimum(bx2, Ax2)
    y_inter_2 = tf.math.minimum(by2, Ay2)
    #get width
    w_inter = tf.maximum(zero, x_inter_1 - x_inter_2)
    #get height
    h_inter = tf.maximum(zero, y_inter_1 - y_inter_2)
    #intersection
    I = w_inter * h_inter
    #area over union
    area_union = (B_area + A_area) - I
    iou = tf.math.divide_no_nan(I, area_union)

    #find the b box C smaller that surrounding/fits both A and B
    Cx1 = tf.math.minimum(bx1, Ax1)
    Cy1 = tf.math.minimum(by1, Ay1)
    Cx2 = tf.math.maximum(bx2, Ax2)
    Cy2 = tf.math.maximum(by2, Ay2)

    #calculate the C area
    C_area = (Cx2 - Cx1) * (Cy2 - Cy1)
    #calculate giou
    giou = iou - tf.math.divide_no_nan(C_area - area_union, C_area)
    #calculate mean of all observations
    m_giou = tf.reduce_mean(giou, axis=0)

    return m_giou

def my_GIoULoss(bb_true, bb_pred):
    return 1.0 - my_GIoU(bb_true, bb_pred)

### Custom classification accuracy error

In [None]:
def my_sparse_category_accurary(y_true, y_pred):
    #y_true is expected to be integer
    #y_pred is a numpy.ndarray with the normalized probabilities
    acc = np.dot(1, np.not_equal(y_true, np.argmax(y_pred, axis=1)))
    return acc

### Custom bounding box IoU

In [None]:
def my_IoU_np(y_true, y_pred):
    #becausse model predict returns a numpy.ndarray the convert to tensors because le function already works a tensors
    bb_true = tf.convert_to_tensor(y_true, y_true.dtype)
    bb_pred = tf.convert_to_tensor(y_pred, y_pred.dtype)

    #make zero as tensor
    zero = tf.convert_to_tensor(0.0, bb_true.dtype)
    #convert them to tensor clases
    Ax1, Ay1, Ax2, Ay2 = tf.unstack(bb_true, 4, axis=-1)
    Bx1, By1, Bx2, By2 = tf.unstack(bb_pred, 4, axis=-1)

    #for the bounding box predicted make sure Bx2 > Bx1 y By2 > By1
    bx1 = tf.math.minimum(Bx1, Bx2)
    by1 = tf.math.minimum(By1, By2)
    bx2 = tf.math.maximum(Bx1, Bx2)
    by2 = tf.math.maximum(By1, By2)

    #calculate area of true bounding box
    A_area = (Ax2 - Ax1)*(Ay2 - Ay1)
    #calculate area of predicted bounding box
    B_area = (bx2 - bx1)*(by2 - by1)
    
    #calculate intersection over true and pred
    #find the box overlaps both boxes
    #each inter calculates the smallest stride
    x_inter_1 = tf.math.maximum(bx1, Ax1)
    y_inter_1 = tf.math.maximum(by1, Ay1)
    x_inter_2 = tf.math.minimum(bx2, Ax2)
    y_inter_2 = tf.math.minimum(by2, Ay2)
    #get width
    w_inter = tf.maximum(zero, x_inter_1 - x_inter_2)
    #get height
    h_inter = tf.maximum(zero, y_inter_1 - y_inter_2)
    #intersection
    I = w_inter * h_inter
    #area over union
    area_union = (B_area + A_area) - I
    iou = tf.math.divide_no_nan(I, area_union)

    #convert back to numpy
    iou = iou.numpy()
    #check if is over 50% then 0 otherwise 1
    over_50 = np.where(iou>0.5, 0, 1)

    return over_50

### Combine both to return a correct match

In [None]:
def custom_error(y_true, y_pred):
    d = my_sparse_category_accurary(y_true, y_pred)
    f = my_IoU_np(y_true, y_pred)
    e = np.max(d, f)
    return e.mean()

## Model

Models that are pre trained already did the heavy lifting on getting a tested architecture and fine tuned parameters. Premade architectures with pre-trained weights.

For EfficientNetV2, by default input preprocessing is included as a part of the model (as a Rescaling layer), and thus tf.keras.applications.efficientnet_v2.preprocess_input is actually a pass-through function. In this use case, EfficientNetV2 models expect their inputs to be float tensors of pixels with values in the [0-255] range. At the same time, preprocessing as a part of the model (i.e. Rescaling layer) can be disabled by setting include_preprocessing argument to False. With preprocessing disabled EfficientNetV2 models expect their inputs to be float tensors of pixels with values in the [-1, 1] range.

### Load architecture

We’ll first download the model but exclude the top since we’ll be adding our own custom top and resize the input.

In [6]:
base_model = tf.keras.applications.EfficientNetV2M(
    include_top=False,
    weights="imagenet",
    input_tensor=Input(shape=(128,128,3)),
    input_shape=(128,128,3),
    pooling=None,
    include_preprocessing=True)

### GIoU loss from tensorflow addons

GIoU loss from tensorflow addons is a function that requires following arguments. 

- true: true targets tensor. The coordinates of the each bounding box in boxes are encoded as [y_min, x_min, y_max, x_max].
- pred: predictions tensor. The coordinates of the each bounding box in boxes are encoded as [y_min, x_min, y_max, x_max].
- de: one of ['giou', 'iou'], decided to calculate GIoU or IoU loss.

### Define rest of the multi-output mode for the two tasks

We won’t be training from scratch so we will use transfer learning and split output in two.

- class_output
- box_output

In [13]:
base_model.trainable=False
base_model_output = base_model.output

no_of_classes = 200

# We could also use Flatten()(x) but GAP is more effective, it reduces 
# Parameters and controls overfitting.
#flattened_output = GlobalAveragePooling2D()(base_model_output)
flattened_output = Flatten()(base_model_output)

# Create our Classification Head, final layer contains 
# Ouput units = no. classes
class_prediction = Dense(256, activation="relu")(flattened_output)
#class_prediction = Dense(256, activation="relu")(class_prediction)
#class_prediction = Dropout(0.2)(class_prediction)
#class_prediction = Dense(256, activation="relu")(class_prediction)
class_prediction = Dropout(0.2)(class_prediction )
#class_prediction = Dense(256, activation="relu")(class_prediction)
class_prediction = Dense(no_of_classes, activation='softmax',name="class_output")(class_prediction)

# Create Our Localization Head, final layer contains 4 nodes for x1,y1,x2,y2
# Respectively.
box_output = Dense(256, activation="relu")(flattened_output)
box_output = Dense(128, activation="relu")(box_output)
box_output = Dropout(0.2)(box_output)
box_output = Dense(64, activation="relu")(box_output)
box_output = Dropout(0.2)(box_output)
box_output = Dense(32, activation="relu")(box_output)
box_predictions = Dense(4, activation='linear', name= "box_output")(box_output)

# Now combine the two heads
model = Model(inputs=base_model.input, outputs=[class_prediction, box_predictions])

In [14]:
model.outputs

[<KerasTensor: shape=(None, 200) dtype=float32 (created by layer 'class_output')>,
 <KerasTensor: shape=(None, 4) dtype=float32 (created by layer 'box_output')>]

### Compile and train

In [15]:
# For classification we will have cateogirical crossentropy
# For the bouding boxes we will have mean squared error
losses = { 
    "class_output": tf.keras.losses.SparseCategoricalCrossentropy(),
    "box_output": GIoULoss()
    }

# For the class labels we want to know the Accuracy
# And for the bounding boxes we need to know the Mean squared error
metrics = {
    'class_output': tf.keras.metrics.SparseCategoricalAccuracy(), 
    'box_output': 'mse'
    }

model.compile(optimizer='adam', loss=losses)

In [16]:
model.fit(x=train_dataset, validation_data=val_dataset , epochs=1)



<keras.callbacks.History at 0x1e8df7b9670>

## Test data

### Pipeline for test and new data

1. define folder path for images
2. get images names
3. recursively:
    - load image
    - resize
    - get category and bounding box from nn
    - save results to a list containing the image name, category and bounding box
4. save list of results to csv.

## References