###### Computer vision: Classifying car make, model and year

Computer vision could potentially be used to automate traffic censuses and other tasks that require identification of vehicles.
The <a href="https://www.tensorflow.org/datasets/catalog/cars196">cars196</a> dataset contains 16,185 images of 196 different types of cars, which
can be used to train a supervised learning system to determine the make and model of a vehicle in a photograph.

# Prelude
Stanford cars are the collection of images of cars that are from the dataset "Cars196". As the dataset from "Cars196" included in Tensorflow has been rendered disabled since the original author of the dataset has removed the original link to the dataset. Now, this dataset has been the only available dataset apart from another "Stanford Cars" dataset on Kaggle. 

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
from matplotlib import pyplot as plt

First, let's get the data. This dataset only works on Kaggle as setting download to True will only trigger errors from the Tensorflow framework.

In [None]:
DATA_DIR = '/kaggle/input/cars196'

[train_ds, test_ds], ds_info = tfds.load(
    "cars196",
    split=["train", "test"],
    as_supervised=True,  # Include labels
    with_info=True,
    download=False,
    data_dir=DATA_DIR,
)

Now, let's use the built-in visualization function to show some example images:

In [None]:
tfds.visualization.show_examples(train_ds, ds_info)

## Standardizing the data
Our raw images have a variety of sizes. In addition, each pixel consists of 3 integer values between 0 and 255 (RGB level values). This isn't a great fit for feeding a neural network. We need to do 2 things:

* Standardize to a fixed image size.
* Normalize pixel values between -1 and 1. We'll do this using a Normalization layer as part of the model itself.

In general, it's a good practice to develop models that take raw data as input, as opposed to models that take already-preprocessed data. The reason being that, if your model expects preprocessed data, any time you export your model to use it elsewhere (in a web browser, in a mobile app), you'll need to reimplement the exact same preprocessing pipeline. This gets very tricky very quickly. So we should do the least possible amount of preprocessing before hitting the model.

Here, we'll do image resizing in the data pipeline (because a deep neural network can only process contiguous batches of data), and we'll do the input value scaling as part of the model, when we create it.

In [None]:
height, width = 224, 224

train_ds = train_ds.map(lambda x, y: (tf.image.resize(x,(height,width)), y)) 
#remember hat y is the dependent variable -label
# in this case, y is the category of the car

test_ds = test_ds.map(lambda x, y: (tf.image.resize(x,(height,width)), y)) 

## Preprocessing: Resizing and random data augmentation

When you don't have a large image dataset, it's a good practice to artificially introduce sample diversity by applying random yet realistic transformations to the training images, such as random horizontal flipping or small random rotations. This helps expose the model to different aspects of the training data while slowing down overfitting.

Additionally, let's the data and use caching and prefetching to optimize load speed:

In [None]:
batch_size = 32 #to optimize the load speed

def augment_func(image,label):
  image = tf.image.resize_with_crop_or_pad(image, height + 6, width +6) 
#randomizing the size of the car, so the model doesnt identify cars by size
  image = tf.image.random_crop(image, size=[height, width,3])
#a border in the image, so kind of reducing the size
  image = tf.image.random_flip_left_right(image)
  image = tf.image.random_hue(image, 0.2) #randomizing the color (red, blue, green...)
# identifying the car even if it is different color
  image = tf.image.random_contrast(image, 0.5, 2) 
#parameters that he obtained from tutorials
  image = tf.image.random_saturation(image, 0, 2)
  return image, label

# RANDOMIZING THE CHARACTERISTICS THAT ARE NOT IMPORTANT
# a human can identify a car, no matter the color and size in the image. However, machine learning algorithm have struggled on this


train_ds = train_ds.cache().map(augment_func).shuffle(100).batch(batch_size).prefetch(buffer_size=10)
test_ds = test_ds.cache().batch(batch_size).prefetch(buffer_size=10)
    #.cache and prefetch are to make it faster
    # shuffling so the images are not in the same order every time. 32 is the standard batch size

Let's visualize what the first 18 images of the first batch looks like after various random transformations.

Note that because the augmentations in the previous cell are applied randomly, these images will look different everytime they are run through the model during training.

In [None]:
for (image_batch, label_batch) in train_ds.take(18):
    print(label_batch)

In [None]:
for i,(image_batch, label_batch) in enumerate(train_ds.take(18)):
    print(i,label_batch)

In [None]:
plt.figure(figsize=(10,20))

for i,(image_batch,label) in enumerate(train_ds.take(18)):
   #the item + index number if we use enumerate(). Also, we are taking the first 18 batches 
    ax=plt.subplot(6, 3, i + 1) #index is 1 for the first
    plt.imshow(image_batch[0].numpy().astype("int32"))
   # we are taking the first image, converting to numpy as it was a tensor and transforming it to an integer     
    plt.title(ds_info.features["label"].names[int(label[0])])
    plt.axis("off")

# printing the first, 18 images

## Build a model

Now let's built a model.

1. We add a Normalization layer to scale input values (initially in the [0, 255] range) to the [-1, 1] range, because this is the format that is expected by the pre-trained model that comes next.
1. We start with a pre-trained model that's trained on the [ImageNet](http://image-net.org/about-overview) dataset, which includes a large number of images with a large number of different labels, but doesn't not include as much specificity regarding vehicle types as the cars196 dataset does. Training these models from scratch is tricky; it is much easier to start with a pre-trained model and fine tune it for use for a different task.
3. We add our own classification layer at the end of the model, with 96 outputs representing our 96 vehicle classes, and "softmax" activation which forces the output values to all be between 0 and 1, and to all sum to 1.
4. We add a Dropout layer before the above classification layer, for regularization.


We need the number of outputs in the final layer to equal the number of variables or classes we want to predict: in this case, 196 vehicle types. 
We use a softmax activation on the on the final layer for classification problems, but if we want to use this model for regression we would only have to change the number of desired outputs and set `activation=None`.

Here, we do imports for the necessary library for the first and possibly other CNN architecture models.

In [None]:
from keras.applications.resnet50 import ResNet50
from keras.applications import VGG16
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, GlobalAveragePooling2D, Dropout
from tensorflow.keras.models import load_model
from tensorflow.keras.callbacks import ModelCheckpoint

Model Checkpoint provides a fast saving method by recording the result of each epochs, and selecting from all the epochs to a best model. That will be saved to the Kaggle directory.

In [None]:
checkpoint_filepath = '/kaggle/working/'
model_checkpoint_callback = ModelCheckpoint(
        filepath=checkpoint_filepath,
        monitor='val_accuracy',
        mode='max',
        save_best_only=True)

## Fine-tune the model

We use a relatively low learning rate to prevent the model from unlearning what it learned when being trained on the larger imagenet dataset.

Finally, we can save the model for later use. If you are doing this on Kaggle, there is an option to download the saved file in the panel on the right side of the screen. It is advisable that to use gpu as accelerator as the running speed of the cpu on training model will be exceptionally slow.

# First Model: ResNet50

In [None]:
base_model = ResNet50(
    weights="imagenet",
    input_shape=(height, width, 3),
    include_top=False, 
)  

#Freeze
base_model.trainable = True

#Declare input layer
inputs = tf.keras.Input(shape=(height, width, 3))

#Normalization Layer
norm_layer = keras.layers.experimental.preprocessing.Normalization()
mean = np.array([127.5] * 3)
var = mean ** 2
# Scale inputs to [-1, +1]
x = norm_layer(inputs)
norm_layer.set_weights([mean, var])

#ResNet50 Architecture
x = base_model(x, training=False) 
x = keras.layers.GlobalAveragePooling2D()(x) 

#Dropout to improve result and reduce overfitting
x = keras.layers.Dropout(0.5)(x)  
x = keras.layers.Dense(512, activation='relu')(x)
num_outputs = ds_info.features['label'].num_classes 
outputs = keras.layers.Dense(num_outputs, activation="softmax")(x) 

#Here we don't use Sequential as it provides a worse result.
model = keras.Model(inputs, outputs)

#Summary of the model layers
model.summary()

> Fine Tuning
* Training on 50 epochs
* Lower learning rate so model will not forget what it learnt
* SparseCategorical as there are too many classes in our data

In [None]:
learning_rate = 1.0e-5 #low learning rate so the model does not forget 

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(), #for cases with multiple categories for conversions as shown below
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],

)

epochs = 50
history = model.fit(train_ds, epochs=epochs, validation_data=test_ds)

In [None]:
#Plotting the accuracy graph
plt.plot(history.history['val_sparse_categorical_accuracy'])
plt.title('model validate accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train'], loc='upper left')
plt.show()

In [None]:
#Callback for EarlyStopping when the model has stopped improving
callback = keras.callbacks.EarlyStopping(monitor='loss', patience =5)

In [None]:
#This run is just to test the saved model
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1.0e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(), #for cases with multiple categories for conversions as shown below
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],

)

history = model.fit(train_ds, epochs=5, callbacks=[callback], validation_data=test_ds)

In [None]:
#Saving the model into a .h5 file
model.save("/kaggle/working/model_resnet_final.h5", save_format="h5")

In [None]:
plt.plot(history.history['val_sparse_categorical_accuracy'])
plt.title('model validate accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train'], loc='upper left')
plt.show()

In [None]:
plt.plot(history.history['sparse_categorical_accuracy'])
plt.title('model training accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train'], loc='upper left')
plt.show()

In [None]:
#This helps us to load the saved model
model = load_model('/kaggle/input/resnetv1/tensorflow2/v4.1/3/model_resnet_final.h5')

In [None]:
#Importing more library to calculate and evaluate the score of the model
from sklearn.metrics import f1_score, precision_score, recall_score

In [None]:
#Create two array of empty for labels and images
labels = []
images = []

#Limit the size of the figure plotted
plt.figure(figsize=(10,20))

#From the test dataset, take the first ten and plot them out with their respective labels
for i,(image_batch,label) in enumerate(test_ds.take(10)):
    ax=plt.subplot(6, 3, i + 1)
    plt.imshow(image_batch.numpy().astype("int32"))
    plt.title(ds_info.features['label'].str2int(ds_info.features["label"].names[int(label)]))
    plt.axis("off")

In [None]:
#Using the same method for plotting the test dataset first 10 elements, using it to let the model predict
images = []
labels = []

#take(z) will take elements from 0 to z, however, prediction can be only done on one image
#skip(x).take(1) will be done to take skip x number of element and take the next one
for i,(image_batch,label) in enumerate(test_ds.take(1)):
    image_batch = image_batch.numpy()
    label = label.numpy()
    image_batch = np.expand_dims(image_batch, axis=0)
    print(image_batch.shape)
    images.append(image_batch)
    labels.append(label)

In [None]:
#Use the model to predict the image
classes = model.predict(images)

In [None]:
#Print out the predicted class
classes = np.argmax(classes)
print(classes)

In [None]:
#Prediction array based on the true label.
true_label = [141, 2, 87, 35, 189, 111, 154, 78, 98, 94]
predicted_label = []

In [None]:
#Calculating preicison, f1 and recall
print(precision_score(true_label, predicted_label, average = 'weighted'))
print(f1_score(true_label, predicted_label, average = 'weighted'))
print(recall_score(true_label, predicted_label, average = 'weighted'))

# VGG16

In [None]:
#Pretrained VGG16 base model
base_model = VGG16(
    weights="imagenet",
    input_shape=(224, 224, 3),
    include_top=False, 
)  

#Freeze
base_model.trainable = False

#The VGG16 architecture
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(4096, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1000, activation='softmax'))

model.summary()

> Fine Tuning

In [None]:
learning_rate = 1.0e-5 #low learning rate so the model does not forget 

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(), #for cases with multiple categories for conversions as shown below
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

epochs = 50
history = model.fit(train_ds, epochs=epochs, validation_data=test_ds)

In [None]:
plt.plot(history.history['val_sparse_categorical_accuracy'])
plt.title('model training accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train'], loc='upper left')
plt.show()

In [None]:
model.save("/kaggle/working/model_vgg_v1.h5", save_format="h5")

In [None]:
model = load_model('/kaggle/working/model_vgg_v1.h5')

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1.0e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(), #for cases with multiple categories for conversions as shown below
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],

)

history = model.fit(train_ds, epochs=50, validation_data=test_ds)

In [None]:
plt.plot(history.history['sparse_categorical_accuracy'])
plt.title('model training accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train'], loc='upper left')
plt.show()

In [None]:
model.save("/kaggle/working/model_vgg_v2.h5", save_format="h5")

In [None]:
model = load_model('/kaggle/input/vggnet/tensorflow2/vggv2/1/model_vgg_v2.h5')

In [None]:
images = []
labels = []

for i,(image_batch,label) in enumerate(test_ds.skip(9).take(1)):
    image_batch = image_batch.numpy()
    label = label.numpy()
    image_batch = np.expand_dims(image_batch, axis=0)
    images.append(image_batch)
    labels.append(label)

In [None]:
print(images)

In [None]:
classes = model.predict(images)

In [None]:
classes = np.argmax(classes)
print(classes)

In [None]:
true_label = [141, 2, 87, 35, 189, 111, 154, 78, 98, 94]
predicted_label = [190, 50, 165, 44, 171, 5, 79, 108, 98, 83]

# LeNet5

In [None]:
from keras.layers import BatchNormalization

In [None]:
#Design the model based on the LeNet5 architecture
model = Sequential() 

#BatchNormalization layer is used to filter the input image and scaling
#mean output close to 0 and the output standard deviation close to 1.
#The input shape is not the usual 28,28,1 provided in the LeNet architecture
#As the images from this dataset has a lot of details
#Decreasing the size of the image will not help the model to capture the features 
#in the image
#So, we choose 120,120,3 as it seems like this is where the model starts to 
#capture features from the images
model.add(BatchNormalization(), input_shape=(120,120,3))

#32 filters were used, as said the image has a lot of details, low number of filters
#will not help in capturing the details of the image
#Activation tanh is used because 'relu' had been proving that it has bad relation
#with 'softmax', 'tanh' provide a more promising result
model.add(Conv2D(32, kernel_size=(5, 5), activation='tanh')) 
model.add(MaxPooling2D(pool_size=(2, 2))) 
model.add(Conv2D(48, kernel_size=(5, 5), activation='tanh')) 
model.add(MaxPooling2D(pool_size=(2, 2))) 
model.add(Flatten()) 
model.add(Dense(512, activation='tanh')) 
model.add(Dropout(0.25))
model.add(Dense(496, activation='tanh')) 
model.add(Dropout(0.25))
model.add(Dense(196, activation='softmax')) 

model.summary()

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1.0e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(), #for cases with multiple categories for conversions as shown below
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],

)

history = model.fit(train_ds, epochs=50, validation_data=test_ds)

In [None]:
plt.plot(history.history['val_sparse_categorical_accuracy'])
plt.title('model training accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train'], loc='upper left')
plt.show()

In [None]:
images = []
labels = []

for i,(image_batch,label) in enumerate(test_ds.take(1)):
    image_batch = image_batch.numpy()
    label = label.numpy()
    image_batch = np.expand_dims(image_batch, axis=0)
    print(image_batch.shape)
    images.append(image_batch)
    labels.append(label)

In [None]:
classes = model.predict(iamges)
classes = np.argmax(classes)
print(classes)

In [None]:
true_label = [141, 2, 87, 35, 189, 111, 154, 78, 98, 94]
predicted_label = []

In [None]:
print(precision_score(true_label, predicted_label, average = 'weighted'))
print(f1_score(true_label, predicted_label, average = 'weighted'))
print(recall_score(true_label, predicted_label, average = 'weighted'))

# GoogleNet

In [None]:
base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(224,224,3))

#Freeze
base_model.trainable = True

#Declare input layer
inputs = tf.keras.Input(shape=(height, width, 3))

#Normalization Layer
norm_layer = keras.layers.experimental.preprocessing.Normalization()
mean = np.array([127.5] * 3)
var = mean ** 2
# Scale inputs to [-1, +1]
x = norm_layer(inputs)
norm_layer.set_weights([mean, var])

#ResNet50 Architecture
x = base_model(x, training=False) 
x = keras.layers.GlobalAveragePooling2D()(x) 

#Dropout to improve result and reduce overfitting
x = keras.layers.Dropout(0.25)(x)  
x = keras.layers.Dense(1024, activation='relu')(x)
outputs = keras.layers.Dense(num_outputs, activation="softmax")(x) 

#Here we don't use Sequential as it provides a minimal improvement
model = keras.Model(inputs, outputs)

#Summary of the model layers
model.summary()

# Additional Note
**From now on, I stop rewriting the same code for training the model.**
**Here I use the model.compile() from before to do the training**
**And also the other function that has been repeating**

# AlexNet

In [None]:
model = Sequential() 
model.add(Conv2D(96, kernel_size=(11, 11), strides=(4, 4), activation='relu', input_shape=(28, 28, 1))) 
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2))) 
model.add(Conv2D(256, kernel_size=(5, 5), padding='same', activation='relu')) 
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2))) 
model.add(Conv2D(384, kernel_size=(3, 3), padding='same', activation='relu')) 
model.add(Conv2D(384, kernel_size=(3, 3), padding='same', activation='relu')) 
model.add(Conv2D(256, kernel_size=(3, 3), padding='same', activation='relu')) 
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2))) 
model.add(Flatten()) 
model.add(Dense(4096, activation='relu')) 
model.add(Dropout(0.5)) 
model.add(Dense(4096, activation='relu')) 
model.add(Dropout(0.5)) 
model.add(Dense(196, activation='softmax')) 

model.summary()