<a href="https://colab.research.google.com/github/AntoniaCarrizo/Machine-learning-projects-artificial-intelligence/blob/main/Classifying_Images_of_toys.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classifying Images of toys

## Install and import dependencies

We will use a dataset from TensorFlow Datasets. 

In [None]:
!pip install -U tensorflow_datasets

We proceed to import dependencies.

In [None]:
# Import Tensorflow
import tensorflow as tf
# Import TensorFlow Datasets
import tensorflow_datasets as tfds
tfds.disable_progress_bar()

import math
import numpy as np
import matplotlib.pyplot as plt

To show us the error messages

In [None]:
import logging
logger = tf.get_logger()
logger.setLevel(logging.ERROR)

## Import the dataset

In [None]:
dataset, metadata = tfds.load(name='smallnorb', as_supervised=True, with_info=True)

We divide the dataset into its training and test sub datasets. We will divide our dataset into 24.300 images for training and 24.300 images for testing. We don't need to do this division because the dataset is already divided.

In [None]:
train_dataset, test_dataset = dataset['train'], dataset['test']

The images are 96 $\times$ 96 arrays, with pixel values in the range `[0, 255]` (because the images are black and white). The labels are an array of integers, in the range `[0, 4]`. These correspond to the class of toy the image represents:

<table>
  <tr>
    <th>Label</th>
    <th>Class</th>
  </tr>
  <tr>
    <td>0</td>
    <td>Animals</td>
  </tr>
  <tr>
    <td>1</td>
    <td>Human</td>
  </tr>
    <tr>
    <td>2</td>
    <td>Planes</td>
  </tr>
    <tr>
    <td>3</td>
    <td>Trucks</td>
  </tr>
    <tr>
    <td>4</td>
    <td>Cars</td>
</table>


In [None]:
class_names = ['Animal', 'Human', 'Plane', 'Truck', 'Car']

### Explore the data

Let's explore the format of the dataset before training the model. The following shows there are 24.300 images in the training set, and 24.300 images in the test set:

In [None]:
num_train_examples = metadata.splits['train'].num_examples
num_test_examples = metadata.splits['test'].num_examples
print("Number of training examples: {}".format(num_train_examples))
print("Number of test examples:     {}".format(num_test_examples))

## Preprocess the data

The value of each pixel in the image data is an integer in the range `[0,255]`. For the model to work properly, these values need to be normalized to the range `[0,1]`. So here we create a normalization function, and then apply it to each image in the test and train datasets.It ensures that each input pixel has a similar data distribution. This makes convergence faster while training the network.

By not normalizing the precision drops drastically, the results do not exceed 20% and the predictions are mostly wrong.

We will divide each element of training and test by the number of pixels, that is, 255

In [None]:
def normalize(images, labels):
  images = tf.cast(images, tf.float32)
  images /= 255
  return images, labels

train_dataset =  train_dataset.map(normalize)
test_dataset  =  test_dataset.map(normalize)



We keep our database in cache, in ram memory. You can train the model more quickly since the model does not have to put the hard disk.

In [None]:
train_dataset =  train_dataset.cache()
test_dataset  =  test_dataset.cache()

### Explore the processed data

We will analyze the database to understand it a little better.

As it is a black and white image, it will not have more dimensions, only one. With reshape we take out a couple of brackets. If the image had more colors it could not be done since there is more than one list:

In [None]:

image, label = tf.data.experimental.get_single_element(test_dataset.take(1))
image = image.numpy().reshape((96,96))
# Plot the image
plt.figure()
plt.imshow(image, cmap=plt.cm.binary)
plt.colorbar()
plt.grid(False)
plt.show()

We show some images to verify if the data correspond to the image and that they are in the correct format to build and train the network.

In [None]:
plt.figure(figsize=(15,15))
i = 0
for (image, label) in test_dataset.take(15):
    image = image.numpy().reshape((96,96))
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(image, cmap=plt.cm.binary)
    plt.xlabel(class_names[label])
    i += 1
plt.show()

## Build the model


### Setup the layers


This network has three layers:

* **input** `tf.keras.layers.Flatten` — This layer transforms the images from a 2d-array of 96 $\times$ 96 pixels, to a 1d-array of 9216 pixels. 

* **"hidden"** `tf.keras.layers.Dense`— Two dense layers with 100 neurons

* **output**  `tf.keras.layers.Dense` — 5-node * softmax * layer, each node represents a class of toy. 

Differences between number of layers:
- with 1 hidden layer: the loss function remains constant and does not improve. Acurracy only reaches 19% and category hits are minimal.
- with 2 hidden layer: the loss function is decreasing (improving). The acurracy reaches up to 78 %% and the category hits are adequate.
- with 3 hidden layers: the sva loss function decreasing (improving). The acurracy decreases and is obtained up to 72%.
We conclude that with 2 hidden layer the model works correctly.

Differences in the number of nurones:
- 10 neurons: the loss function stopped improving, it remained constant after the first epoch. The accuracy only reached 19% and the hits in the categories are minimal.
- 50 neurons: the loss function was decreasing and the accuracy reached 74%. The number of hits in the categories increased considerably.
- 100 neurons: the loss function was decreasing and the accuracy reached 78%. The number of hits in the categories is good.
- between 200 and 500 neurons, the accuracy remained in the range between 70% and 75%.

The differences between the results of very few neurons and many neurons is due to the fact that:
- when using few neurons it is misadjusted, that is, there are few neurons to detect the input data signals.
- When using many neurons, overfitting can occur, that is, the amount of information in the training data is not enough to train all the neurons there are.

In conclusion, we believe that the number of neurons that obtained the best results was 100.

Differences in the activation function:
Different activation functions were used in the hidden layers to see that we chose the one with the best results.
- Relu: Rapid learning, offers much better performance and generalizability in deep learning, all values less than zero are set to zero. 77% acurracy was reached, the categorical_hinge was increasing, the number of hits in some test images is good.
- Leaky_relu: Has a small slope for negative values. A lower percentage of acurracy is reached, 73%, however, it works well in the number of hits in test images.
- Tanh: Has a range between -1 and 1, the function may produce some dead neurons during the calculation process. The loss function does not improve, it remains constant and the accuracy is very small 20%, the number of hits in some test images is bad.
-Elu: It has a limitation: it is not zero-centered, but it is a good alternative to Relu. 77% acurracy was reached, the categorical_hinge was increasing, however, the number of hits in some test images is good.

We decided to occupy relu for its good accuracy and number of hits.

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(96, 96, 1)),
    tf.keras.layers.Dense(100, activation=tf.nn.relu),
    tf.keras.layers.Dense(100, activation=tf.nn.relu),
    tf.keras.layers.Dense(5, activation=tf.nn.softmax)
])

### Compile the model

To compile it we pass the optimizer, the loss function and the metrics (accuracy).
The optimizer used is Adam as it is the best for these models. As a loss function we will use SparseCategoricalCrossentropy, we use this loss function because there are two or more kinds of labels.

Metrics:
- accuracy: It helps us since it is good at classification problems. It helps us to evaluate the model since it is the proportion of true results among the total number of cases examined, the higher this number is, it means that there are more correct predictions.
- mean_squared_logarithmic_error: see the relative difference between the true and predicted value, or in other words, it only cares about the percentage difference between them. As this value increases, it means that there are better predictions.
- categorical_hinge: Computes the categorical hinge metric between y_true and y_pred. As it increases there are better predictions

In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy','mean_squared_logarithmic_error', 'categorical_hinge'])

## Train the model

Train the model
- Repeat: repeats the data set as many times as possible.
-Shuffle prevents the model from memorizing the images, messes up the images.
-Batch: tells fit how many images we will pass through epoch.
-Fit: The training is done by calling the model.fit method.

As the epoch increases the loss function begins to rise and fall without finding a good result, the more epoch the more the loss function changes. We consider that a good number of epoch for this project is 10 since the function of loss remains in descent.

In [None]:
BATCH_SIZE = 64
train_dataset = train_dataset.cache().repeat().shuffle(num_train_examples).batch(BATCH_SIZE)
test_dataset = test_dataset.cache().batch(BATCH_SIZE)

In [None]:
history=model.fit(train_dataset, epochs=10, steps_per_epoch=math.ceil(num_train_examples/BATCH_SIZE))

As the model trains, the loss and accuracy metrics are displayed. This model reaches an accuracy of about 0.80 (or 80%) on the training data.

We graph the loss function. The loss function usually goes down.

In [None]:
plt.xlabel('Epoch Number')
plt.ylabel("Loss Magnitude")
plt.plot(history.history['loss'])

## Make predictions and explore

With the model trained, we can use it to make predictions about some images.

In [None]:
for test_images, test_labels in test_dataset.take(1):
  test_images = test_images.numpy()
  test_labels = test_labels.numpy()
  predictions = model.predict(test_images)

In [None]:
predictions.shape

The first prediction:

In [None]:
predictions[0]

We can see which label has the highest confidence value:

In [None]:
np.argmax(predictions[0])

So the model is most confident that this image is a Truck.

In [None]:
class_names[test_labels[0]]

We can graph this to see the complete set.

In [None]:
def plot_image(i, predictions_array, true_labels, images):
  predictions_array, true_label, img = predictions_array[i], true_labels[i], images[i]
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])
  
  plt.imshow(img[...,0], cmap=plt.cm.binary)

  predicted_label = np.argmax(predictions_array)
  if predicted_label == true_label:
    color = 'blue'
  else:
    color = 'red'
  
  plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
  predictions_array, true_label = predictions_array[i], true_label[i]
  plt.grid(False)
  plt.xticks(np.arange(5), class_names, rotation=45)
  plt.yticks([])
  thisplot = plt.bar(range(5), predictions_array, color="#777777")
  plt.ylim([0, 1]) 
  predicted_label = np.argmax(predictions_array)
  
  thisplot[predicted_label].set_color('red')
  thisplot[true_label].set_color('blue')

We plot the first prediction to see the accuracy of the model

In [None]:
i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions, test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions, test_labels)


We observe some predicted images:

In [None]:
num_rows = 6
num_cols = 4
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
  plt.subplot(num_rows, 2*num_cols, 2*i+1)
  plot_image(i, predictions, test_labels, test_images)
  plt.subplot(num_rows, 2*num_cols, 2*i+2)
  plot_value_array(i, predictions, test_labels)
