<a href="https://colab.research.google.com/github/Nordic-OG-Raven/ML/blob/main/Universal_Workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#The Universal Workflow consists of:

1) Defining the task

2) Developing a model (main focus of this template)

3) Deplying a model







#1) Define the task
- What is the prediction target


#### CV1
In this exercise, you will expand last weeks exercise by training pre-trained model.

**About the CIFAR-10 dataset**

The CIFAR-10 dataset is a widely-used dataset for benchmarking machine learning algorithms, especially in the field of image recognition. It consists of 60,000 32x32 color images in 10 different classes, with 6,000 images per class. The dataset is divided into 50,000 training images and 10,000 test images. The classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. Each image is labeled with one of these 10 classes, making it a standard dataset for evaluating algorithms for image classification tasks.

The CIFAR-10 dataset was created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton and is a subset of the 80 million tiny images dataset. Due to its moderate size and complexity, CIFAR-10 serves as an excellent benchmark for algorithms and techniques in computer vision, particularly for methodologies that are aimed at performing well on small to medium-sized datasets in image recognition tasks.

#### CV2 - Using a pre-trained ResNet50 model
In this exercise, you'll expand on last weeks exercise by adding pre-trained models to the mix. In particular, you'll work with ResNet50, a powerful pre-trained model.Here is the task:

Try a ResNet50 with data augmentation on the cifar10 dataset.

Two tips:
1.   Remember to turn on the GPU. Don't let it stand idle. Whenever you need to take a break, remember to switch it off.
2.   When preprocessing the model, do this to avoid errors

```
# Apply the preprocess_input function
x = Lambda(preprocess_input)(x)
```

This allows arbitrary expressions to be inserted into Keras models.


#### CV3 - Building a Mini Exception Model
In this exercise, we will explore how far we can push our model performance on the cifar10 dataset. Your task now is to implement modern convnet architecture patterns like batchnorm, seperable depthwise convolutions and residual connections and examine how far this will boost performance.

##Choose a measure of success
- Single-label classification --> Accuracy
- Scalar regression --> MAE
- Balanced classification --> ROC AUC
- Imbalanced CLassification --> Precision and recall
- Multi-label Classification --> Mean average precision


##Collecting the dataset, and understanding it to check
- The reliability of the labels
- The quality of the features
- The number of data-points

Thus, ensure to check for instance,
- balance between classes, features, noise etc.
  - DL models tend to latch on to noise if that is unbalanced between target classes.
  - Data representativeness, e.g. in regards to minorities

### Loading data

In [None]:
from tensorflow.keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(
    num_words=10000)

# num_words=10000 - we’ll keep the top 10,000 most frequent words

EDA

In [None]:
# We can see the current list of indices:
train_data[0]

# we can get the label for a specific review:
train_labels[0]

### Computer Vision

#### Loading libraries

In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Input, Lambda, Layer
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import layers
from tensorflow.keras.applications.resnet50 import preprocess_input

#### Loading the Data (CV1)

In [None]:
# Load the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Class names in the CIFAR-10 dataset
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

#### Loading the Data (CV2)

In [None]:
# Load CIFAR-10 data
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

#### Loading the Data (CV3)

In [None]:
# Load the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Convert labels to one-hot encoding
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

**Plot images**

Plot some images from the dataset (CV1)

In [None]:
import numpy as np

# Initialize a plot with 1 row and 10 columns
fig, axes = plt.subplots(2, 5, figsize=(14, 6))
axes = axes.ravel()

# Plot one image from each class
for i in range(10):
    # Find the index of the first image of each class
    index = np.where(train_labels.flatten() == i)[0][0]
    img = train_images[index]

    # Plot the image
    axes[i].imshow(img, cmap=plt.cm.binary)
    axes[i].set_title(class_names[i])
    axes[i].axis('off')

plt.tight_layout()
plt.show()

In [None]:
# Convert labels to one-hot encoding
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

#### Plotting images (CV1)

In [None]:
import numpy as np

# Initialize a plot with 1 row and 10 columns
fig, axes = plt.subplots(2, 5, figsize=(14, 6))
axes = axes.ravel()

# Plot one image from each class
for i in range(10):
    # Find the index of the first image of each class
    index = np.where(train_labels.flatten() == i)[0][0]
    img = train_images[index]

    # Plot the image
    axes[i].imshow(img, cmap=plt.cm.binary)
    axes[i].set_title(class_names[i])
    axes[i].axis('off')

plt.tight_layout()
plt.show()

#2) Develop a model



##1) Prepare the data;
- Vecorize inputs and targets
- Turn data into tensors
- Normalize values to have uniform ranges




Two roads we can take when preparing the data:

1) Let lists be integer tensors and feed into an Embedding layer

2) Multihot encode lists into vectors of 0s and 1s and feed into a Dense layer.


Defining a function that loops through the 10,000 dimensional vector, and inserts a 1 when there is a word.

### **Encoding the integer sequences via multi-hot encoding**

I.e. Converting training and test data to a binary matrix

In [None]:
import numpy as np
def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        for j in sequence:
            results[i, j] = 1.
    return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

1. Defines a function to vectorize sequences with a default dimension size of 10,000.
2.Creates a 2D NumPy array filled with zeros, with one row for each sequence and one column for each word in the specified dimension.
3. Iterates over the sequences, with 'i' tracking the index of the current sequence.
4. Iterates over each word index in the current sequence.
5. Sets the value at position [i, j] in the results array to 1, indicating the presence of word 'j' in sequence 'i'.
6. returns the results.

*To see what the sample looks like now*

In [None]:
x_train[0]

# or (from example 2) to look at the first 100 values
x_train[10,1:100]

### One-hot dummy encoding

In [None]:
def to_one_hot(labels, dimension=46):
    results = np.zeros((len(labels), dimension))
    for i, label in enumerate(labels):
        results[i, label] = 1.
    return results
y_train = to_one_hot(train_labels)
y_test = to_one_hot(test_labels)

 Caution: By turning our lists into vectors of 0s and 1s this removes a lot of information, for instance in regards to positions of words.


### Vectorizing the labels
- i.e. converting the train and test labels into a NumPy array,
- and changing the data_type to float32, i.e. containing sentiment labels (0 or 1) for the training dataset

In [None]:
y_train = np.asarray(train_labels).astype("float32")
y_test = np.asarray(test_labels).astype("float32")

This conversion is typically done to ensure compatibility with TensorFlow/Keras, which prefers working with NumPy arrays of specific data types for inputs and labels. Using float32 helps reduce memory consumption and can improve computational efficiency on GPUs.

In [None]:
from tensorflow.keras.utils import to_categorical
y_train = to_categorical(train_labels)
y_test = to_categorical(test_labels)

### CV1 - One-hot encoding

In [None]:
# Convert labels to one-hot encoding
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

##2) Chose an evaluation protocol
- Large input --> Hold-out,
  - BUT: ordered data can misrepresent the sets.
- Small input, or performance dependent on train-test split --> K-fold CV,
  - BUT: more expensive, as it creates K models
- Small input with higher need for precision --> Iterated K-fold CV with shuffling
  - BUT: may be very expensive, as it creates P*K models



##3) Beat a baseline
- Filter out uninformative features
- Select the correct architecture priors;
    - Dense
    - Images --> Convent
    - Time-series --> Recurrent
    - Text --> Transformer
  - Select a good-enough training configuration
    - Loss function
    - Batch size
    - Learning rate


Example: Input data is vectors and labels are scalars (1s and 0s)
--> stack dense network.

Model specifications

- Loss function: binary_crossentropy
- optimizer: msprop
- epochs: 4
- Batch size: 512
- Validation: 1000



Two key architecture decisions:

1) How many layers to use

2) How many units to choose for each layer

### Architecture


In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])

# two representation layers, with one output layer
# this is our architectre; where we put in our assumptions.

Caution: Information bottlenecks can be created if the change in the number of neurons from one layer to the next is too extreme.

- Relu: 0 if negative, else leave be
- Softmax: Outputs a probability score for each target class
- Sigmoid: Squish values into 0 or 1,
  - I.e. a probability via losistic regression

### Computer Vision

#### CV1 model building
Building a convent without adding data augmentation

In [None]:
inputs = tf.keras.Input(shape=(32, 32, 3))  # Input layer

# Normalization layer
x = layers.Rescaling(1./255)(inputs)

# First Convolutional Block
x = layers.Conv2D(32, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

# Second Convolutional Block
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

# Third Convolutional Block
x = layers.Conv2D(128, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

# Flatten and Dense Layers
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)  # 10 classes in CIFAR-10

# Model creation
model = models.Model(inputs=inputs, outputs=outputs)

model.summary()

NameError: name 'tf' is not defined

##### CV1 model building with data augmentation

In [None]:
data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

inputs = keras.Input(shape=(32, 32, 3))

# Applying augmentation on the inputs
x = data_augmentation(inputs)

# Normalization layer
x = layers.Rescaling(1./255)(x)

# Convolutional blocks and rest of the model
x = layers.Conv2D(32, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(128, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

# Create the model
model = keras.Model(inputs=inputs, outputs=outputs)

model.summary()

##### Experiment with more data augmentation techniques

In [None]:
# Enhanced data augmentation
data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal", input_shape=(32, 32, 3)),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
        layers.RandomContrast(0.1),  # Adjusts the contrast
        layers.RandomTranslation(height_factor=0.1, width_factor=0.1),  # Translates the image
        # Potentially add more augmentation techniques here
    ]
)

inputs = keras.Input(shape=(32, 32, 3))

# Applying enhanced augmentation on the inputs
x = data_augmentation(inputs)

# Continue with normalization and the rest of your model
x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(32, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(128, (3, 3), activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

# Create the model
model = keras.Model(inputs=inputs, outputs=outputs)

In [None]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="more_aug_convnet_from_scratch.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(train_images, train_labels, epochs=100,
                    validation_split=0.2, batch_size=256,
                    callbacks = callbacks)

In [None]:
# Assuming 'history' is the return value from model.fit()
history_dict = history.history

# Extracting loss and accuracy history
train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
train_accuracy = history_dict['accuracy']
val_accuracy = history_dict['val_accuracy']

epochs = range(1, len(train_loss) + 1)

# Plotting training and validation loss
plt.figure(figsize=(14, 5))

# Training and validation loss plot
plt.subplot(1, 2, 1)
plt.plot(epochs, train_loss, 'bo-', label='Training Loss')
plt.plot(epochs, val_loss, 'ro-', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Training and validation accuracy plot
plt.subplot(1, 2, 2)
plt.plot(epochs, train_accuracy, 'bo-', label='Training Accuracy')
plt.plot(epochs, val_accuracy, 'ro-', label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()  # Adjusts the plots to ensure they don't overlap
plt.show()

#### CV2 Model building
Building a convent without adding data augmentation


In [None]:
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.2),
])

In [None]:
# Make sure you define the 'base_model' after the augmentation and normalization
# Ensure the 'input_tensor' argument in ResNet50 is replaced with 'input_shape'
# if you are defining 'x' as the output of previous layers
base_model = keras.applications.ResNet50(include_top=False,
                                         weights='imagenet',
                                         input_shape=(32, 32, 3))
base_model.trainable = False

# Define the input tensor for your model
input_tensor = Input(shape=(32, 32, 3))

# Apply data augmentation to the inputs
x = data_augmentation(input_tensor)

# Apply the preprocess_input function
x = Lambda(preprocess_input)(x)

# Add custom layers on top of the base model
x = base_model(x)
x = GlobalAveragePooling2D()(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
predictions = Dense(10, activation='softmax')(x)

# Create the final model
model = Model(inputs=input_tensor, outputs=predictions)

#### CV3 Model building (with data augmentation)
Building a convent WITH data augmentation (CV3)

#### Building a mini Xception model

In [None]:
data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

In [None]:
inputs = keras.Input(shape=(32, 32, 3))
x = data_augmentation(inputs)

x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=5, use_bias=False)(x)

for size in [32, 64, 128, 256]:
    residual = x

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

    residual = layers.Conv2D(
        size, 1, strides=2, padding="same", use_bias=False)(residual)
    x = layers.add([x, residual])

x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(10, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()

### Model Compilation and training

In [None]:
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])

For the optimizer rmsprop is always a decent default.

Loss functions:

Metrics:

Justification:

- "rmsprop", because it's an excellent default.
  - If we cannot get it to work with anything else, that an ADAm are the go-to's.

- "Binary_crossentropy", because the outcome is binary

- We monitor accuracy

In [None]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="convnet_from_scratch.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(train_images, train_labels, epochs=50,
                    validation_split=0.2, batch_size=256,
                    callbacks = callbacks)

#### CV1 Compiling and training


In [None]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="convnet_from_scratch.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(train_images, train_labels, epochs=50,
                    validation_split=0.2, batch_size=256,
                    callbacks = callbacks)

#### CV2 Compiling and training

Compile

In [None]:
model.compile(optimizer="Adam", loss='categorical_crossentropy', metrics=['accuracy'])

Train

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="resnet50.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(x_train, y_train,
                    epochs=40,
                    batch_size=256,
                    validation_split=0.2,  # Use 20% of the data for validation
                    callbacks=callbacks,  # Include the callbacks in training
                    verbose=1)

#### CV3 compile and train

In [None]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="mini_xception.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(train_images, train_labels, epochs=100,
                    validation_split=0.2, batch_size=256,
                    callbacks = callbacks)

## Splitting the data,
i.e. setting aside a validation set

In [None]:
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]

# validation data is data that was not used for training.

Caution: If the data is ordered, simply taking the first 10,000 is problematic
- To know if ordered:
    - Trial and error (try both), or
    - Check distribution

In [None]:
history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val),
                    verbose=0) # Set verbose=0 to suppress the output

# For each batch we run backpropagation, etc. and we do this for 20 rounds or "epochs".

# Return validation accuracy for plotting outside the function
return history.history['val_accuracy']


### Model Validation

In [None]:
import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict["loss"]
val_loss_values = history_dict["val_loss"]
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, "bo", label="Training loss")
plt.plot(epochs, val_loss_values, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

In [None]:
# Access the history for the validation loss
val_loss = history.history['val_loss']

# Find the index of the minimum validation loss
min_val_loss_epoch = val_loss.index(min(val_loss))

print(f"Epoch with minimum validation loss: {min_val_loss_epoch + 1}")

### Computer Vision

#### CV1 - Plot training and validation curves

In [None]:
import matplotlib.pyplot as plt
# Assuming 'history' is the return value from model.fit()
history_dict = history.history

# Extracting loss and accuracy history
train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
train_accuracy = history_dict['accuracy']
val_accuracy = history_dict['val_accuracy']

epochs = range(1, len(train_loss) + 1)

# Plotting training and validation loss
plt.figure(figsize=(14, 5))

# Training and validation loss plot
plt.subplot(1, 2, 1)
plt.plot(epochs, train_loss, 'bo-', label='Training Loss')
plt.plot(epochs, val_loss, 'ro-', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Training and validation accuracy plot
plt.subplot(1, 2, 2)
plt.plot(epochs, train_accuracy, 'bo-', label='Training Accuracy')
plt.plot(epochs, val_accuracy, 'ro-', label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()  # Adjusts the plots to ensure they don't overlap
plt.show()

In [None]:
def print_best_val_loss_and_accuracy(history):
    history_dict = history.history

    # Find the index of the best validation loss
    best_val_loss_index = np.argmin(history_dict['val_loss'])

    # Retrieve the best validation loss
    best_val_loss = history_dict['val_loss'][best_val_loss_index]

    # Retrieve the validation accuracy corresponding to the best validation loss
    best_val_accuracy = history_dict['val_accuracy'][best_val_loss_index]

    print(f"Best Validation Loss: {best_val_loss}")
    print(f"Validation Accuracy at Best Loss: {best_val_accuracy}")

In [None]:
print_best_val_loss_and_accuracy(history)

In [None]:
best_model = keras.models.load_model("resnet50.keras", safe_mode=False)

test_loss, test_acc = best_model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc}")

#### Test data evaluation

In [None]:
best_model = keras.models.load_model("convnet_from_scratch.keras")

test_loss, test_acc = best_model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc}")

#### CV3 - Plot training and validation curves

In [None]:
# Assuming 'history' is the return value from model.fit()
history_dict = history.history

# Extracting loss and accuracy history
train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
train_accuracy = history_dict['accuracy']
val_accuracy = history_dict['val_accuracy']

epochs = range(1, len(train_loss) + 1)

# Plotting training and validation loss
plt.figure(figsize=(14, 5))

# Training and validation loss plot
plt.subplot(1, 2, 1)
plt.plot(epochs, train_loss, 'bo-', label='Training Loss')
plt.plot(epochs, val_loss, 'ro-', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Training and validation accuracy plot
plt.subplot(1, 2, 2)
plt.plot(epochs, train_accuracy, 'bo-', label='Training Accuracy')
plt.plot(epochs, val_accuracy, 'ro-', label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()  # Adjusts the plots to ensure they don't overlap
plt.show()

In [None]:
import numpy as np
def print_best_val_loss_and_accuracy(history):
    history_dict = history.history

    # Find the index of the best validation loss
    best_val_loss_index = np.argmin(history_dict['val_loss'])

    # Retrieve the best validation loss
    best_val_loss = history_dict['val_loss'][best_val_loss_index]

    # Retrieve the validation accuracy corresponding to the best validation loss
    best_val_accuracy = history_dict['val_accuracy'][best_val_loss_index]

    print(f"Best Validation Loss: {best_val_loss}")
    print(f"Validation Accuracy at Best Loss: {best_val_accuracy}")

In [None]:
print_best_val_loss_and_accuracy(history)

#### CV3 evaluation of best model on test set

In [None]:
best_model = keras.models.load_model("mini_xception.keras")

In [None]:
test_loss, test_acc = best_model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc}")

##4) Develop an overfitting model
- Add layers
- Enlarge layers
- Train on more epochs
- Still not overfitting? Complicate it



##5) Regularize and tune;
- Try different architectures (Add or remove layers)
- Add dropout
- (Regularization if small model, but rarely worth considering)
- Tuning hyperparameters;
  - Number of units per layer
  - Optimizer learning rate
- Iterate on data curation or feature engineering, by
  - collecting and annotating more data,
  - developing better features, or
  - removing uninformative features

Careful: Do not from the beginning create a grid search loop that goes through all these optionalities, as that woul create data leakage (besides it costing a house and a horse)

In [None]:
# Repeat the above batches with corresponing change, e.g. removed layer.

In [None]:
# Configurations of neurons to try
neuron_configs = [(4, 4), (64, 4), (64, 64)]
val_accuracies = []

# Run training for each neuron configuration and collect validation accuracies
for nl1, nl2 in neuron_configs:
    val_acc = news_nn(nl1, nl2)
    val_accuracies.append(val_acc)

# Plotting all the validation accuracies
epochs = range(1, 21)
for i, val_acc in enumerate(val_accuracies):
    plt.plot(epochs, val_acc, label=f'Config {neuron_configs[i][0]}-{neuron_configs[i][1]} neurons')

plt.title('Validation accuracy with varying neuron counts')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

#### CV2 - Fine-tuning a pre-trained ResNet50
We we not able to beat a from-scratch approach using the frozen ResNet50. Let's unfreeze the last layer to see if it helps.

In [None]:
base_model = keras.applications.ResNet50(include_top=False,
                                         weights='imagenet',
                                         input_shape=(32, 32, 3))
base_model.trainable = False
# To unfreeze the last layer, we set its 'trainable' attribute to True
base_model.layers[-1].trainable = True

# Define the input tensor for your model
input_tensor = Input(shape=(32, 32, 3))

# Apply data augmentation to the inputs
x = data_augmentation(input_tensor)

# Apply the preprocess_input function
x = Lambda(preprocess_input)(x)

# Add custom layers on top of the base model
x = base_model(x)
x = GlobalAveragePooling2D()(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
predictions = Dense(10, activation='softmax')(x)

# Create the final model
model = Model(inputs=input_tensor, outputs=predictions)

In [None]:
model.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="resnet50_finetune.keras",
        save_best_only=True,
        monitor="val_loss")
]

In [None]:
history = model.fit(x_train, y_train,
                    epochs=100,
                    batch_size=256,
                    validation_split=0.2,  # Use 20% of the data for validation
                    callbacks=callbacks,  # Include the callbacks in training
                    verbose=1)

In [None]:
print_best_val_loss_and_accuracy(history)

In [None]:
# Assuming 'history' is the return value from model.fit()
history_dict = history.history

# Extracting loss and accuracy history
train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
train_accuracy = history_dict['accuracy']
val_accuracy = history_dict['val_accuracy']

epochs = range(1, len(train_loss) + 1)

# Plotting training and validation loss
plt.figure(figsize=(14, 5))

# Training and validation loss plot
plt.subplot(1, 2, 1)
plt.plot(epochs, train_loss, 'bo-', label='Training Loss')
plt.plot(epochs, val_loss, 'ro-', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Training and validation accuracy plot
plt.subplot(1, 2, 2)
plt.plot(epochs, train_accuracy, 'bo-', label='Training Accuracy')
plt.plot(epochs, val_accuracy, 'ro-', label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()  # Adjusts the plots to ensure they don't overlap
plt.show()

In [None]:
best_model = keras.models.load_model("resnet50_finetune.keras", safe_mode=False)

test_loss, test_acc = best_model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc}")

# Deploy

Stakeholder communication example: "With these settings, the fraud detection model would have a 5% false negative rate and a 2.5% false positive rate. Every day, an average of 200 valid transactions would be flagged as fraudulent and sent for manual review, and an average of 14 fraudulent transactions would be missed. An average of 266 fraudulent transactions would be correctly caught.”

Chose distribution; API, On-device etc., specify requirements and last but not least monitor and maintain.

### Discussion

#### CV1

What's next? To further increase the performance of our model, we could make some tweaks here and there, like icreasing the number of filters in the convolutional layers can help the model learn more complex features. Similarly, adjusting the number of neurons in the dense layers might improve learning capacity.

However, at this stage, gains from such changes are likely going to marginal.

Instead, there are two things that would likely boost performance:

1) More and better data! The data is greatly pixilated. Higher resolution data would allow us to build deeper models that generalize better.

2) More and better tricks! Next in the course, we'll learn about advanced computer vision techniques that can push performance even further.

#### CV2
We were not able to beat the from-scratch approach despite having adjusted the weights in the last layer and running for twice the number of epochs. What are the ways forward? Here are a couple of suggestions:

*   **Unfreeze more layers**. Fine-tuning more layers may allow the pretrained network to better adapt to the specifics of the dataset.

*   **Advanced Data Augmentation**: Explore more sophisticated data augmentation techniques. Sometimes, introducing variations in the augmentation can help the model generalize better.

*   **Try a different pre-trained model**. The problem might be ResNet50 itself. Keras includes many different pre-trained models. See here for a [list](https://keras.io/api/applications/).


#### CV3
Test set accuracy of 78.1% with only 189,642 - our best model so far!

How good is this compared the current state-of-the-art on CIFAR-10? On the benchmark overview, this would place us at position 218. This might not seem impressive, but if you look at the parameters counts of the models in the visinity, you'll notice the our mini xception model is much, much smaller.

For instance, at position 206, we'll find a CCN Vision Transformer model with an accuracy of 83.36, but a parameter count of 906,075M! That's about 5 times more parameters than our 189,642 parameters.

Further, we are beating other more complicated models like the Hybrid Vision Nystromformer at place 221 with an accuracy of 75.26 and 623,706 parameters. That's 3%-points lower accuracy with around 3 times more parameters. Not bad!

Finally, we haven't even started tweaking our model yet. Based on the plots, we could lower the learning rate a bit a later epochs, to remove some noisy behavior. Adjusting dropout might also help a bit and the last dense layer. Of course, we could also build the model even deeper, but that would increase the parametrer count - hopefully while getting higher accuracy.