<a href="https://colab.research.google.com/github/WDSEatBNL/Intro-to-Machine-Learning-and-AI-Files/blob/master/Machine_Learning_Larger_Set.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Import the libraries that we need to train and test a neural network

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from skimage import io
import os
from IPython.display import Image, display

Load in our image files from github

In [None]:
!git clone https://github.com/WDSEatBNL/Intro-to-Machine-Learning-and-AI-Files

Read the files from the "bigdata" and "test" folders (found in "content/Intro-to-Machine-Learning-and-AI-Files) and assign categories based on the folder name in which they are located (cat, bird, or dog)

In [None]:
IMG_HEIGHT = 288
IMG_WIDTH = 288
BATCH_SIZE = 32

class_names=['bird', 'cat', 'dog']

train_data_dir = r'/content/Intro-to-Machine-Learning-and-AI-Files/bigdata'
validation_data_dir = r'/content/Intro-to-Machine-Learning-and-AI-Files/test'

train_ds = tf.keras.utils.image_dataset_from_directory(
    train_data_dir,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    labels='inferred',
    seed=42
)

val_ds = tf.keras.utils.image_dataset_from_directory(
    validation_data_dir,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    labels='inferred',
    seed=42
)

Set up the model for training (normalize images, adjust settings for the model, and compile the model)

In [None]:
normalization_layer = tf.keras.layers.Rescaling(1./255)
norm_train_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
norm_val_ds = val_ds.map(lambda x, y: (normalization_layer(x), y))

num_classes = 3

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
    input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy']
)

Train the model and assign a catgory name to each image in the validation set (epochs tells you the number of times the model will train and check accuracy before training is complete)

In [None]:
epochs = 10
history = model.fit(norm_train_ds, validation_data=norm_val_ds, epochs=epochs)

prediction = model.predict(norm_val_ds)

predicted_classes = np.argmax(prediction, axis=1)
predicted_class_names = [class_names[i] for i in predicted_classes]

actual_classes = []
for images, labels in norm_val_ds:
    actual_classes.append(labels.numpy())
concatenated_labels = np.concatenate(actual_classes, axis=0)
actual_class_names = [class_names[i] for i in concatenated_labels]

Show each validation image with its predicted category

In [None]:
fig = plt.figure(figsize=(10, 10))
for images, labels in val_ds.take(1):
    for i in range(len(predicted_class_names)):
        ax = plt.subplot(6, 5, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(predicted_class_names[i])
        plt.axis("off")
plt.tight_layout()
plt.show()

Print out the accuracy of the model in the form of percent correctly identified

In [None]:
print('Percentage correct: ', 100*np.sum(concatenated_labels == predicted_classes)/len(predicted_classes))

Create a confusion matrix to determine where the model needs further improvement

In [None]:
cm = confusion_matrix(concatenated_labels, predicted_classes, normalize='all')
cm_rounded = np.around(cm, decimals=2)
fig, ax_cm = plt.subplots(figsize=(8, 6))
sns.heatmap(cm_rounded, annot=True, annot_kws={"size": 20}, cbar=False, cmap='Blues', xticklabels=class_names, yticklabels=class_names)
ax_cm.set_ylabel('True Values', fontsize=20)
ax_cm.set_xlabel('Predicted Values', fontsize=20)
ax_cm.set_title('Confusion Matrix', fontsize=20)
ax_cm.tick_params(axis='x', labelsize=20)
ax_cm.tick_params(axis='y', labelsize=20)
plt.tight_layout()
plt.show()