# <center> Deep Learning Classification of Video Game Art Images: Doom vs Animal Crossing Classifier
    
## <center> National Centre of Scientific Research "Demokritos"
    
### <center> $${\color{gray}{Author: Alexandros Filios}}$$
### <center> $${\color{gray}{Supervisor: Theodoros Giannakopoulos}}$$ 

### <center> https://www.kaggle.com/code/alexandrosfilios/the-ultimate-doom-vs-animal-crossing-classifier

# 1. Introduction

## 1.1 Background and Motivation
Image classification using deep learning has become a powerful tool in various domains, including the recognition of art styles in the digital and traditional art world. While previous projects have focused on classifying digits or simple objects, this project aims to tackle a more engaging and relevant task - classifying images from the popular video games "Doom" and "Animal Crossing". By developing an image classification model for this task, we can explore the capabilities of deep learning in discerning different art styles and the potential applications in the broader field of image recognition.

**Doom:** Doom is a legendary first-person shooter video game that revolutionized the gaming industry when it was released in 1993. In Doom, players assume the role of a space marine who battles demonic creatures unleashed from hell. Known for its fast-paced gameplay, intense action, and atmospheric levels, Doom offers a thrilling and adrenaline-pumping experience. With its iconic weapons, such as the shotgun and chainsaw, and memorable enemies like the Cyberdemon, Doom has become a cornerstone of the first-person shooter genre, inspiring countless games and leaving a lasting impact on gaming culture.

**Animal Crossing:** Animal Crossing is a charming and relaxing life simulation game that invites players to escape to a peaceful virtual village inhabited by anthropomorphic animals. In Animal Crossing, players can engage in various activities such as fishing, bug catching, gardening, and interacting with villagers. The game follows a real-time clock, with seasons, holidays, and special events mirroring the real world. With its delightful visuals, cheerful music, and emphasis on creativity and social interaction, Animal Crossing provides a soothing and immersive experience, allowing players to unwind and create their own virtual paradise.

## 1.2 Overview of the Project
The objective of this project is to build a deep learning image classification system that can accurately classify images as either "Doom" or "Animal Crossing". The dataset consists of 1597 image posts extracted from both `r/doom` and `r/animalcrossing` (Reddit communities), showcasing the distinctive art styles of these two video games. With a balanced class distribution of 840 images from "Doom" and 757 images from "Animal Crossing", the project benefits from the importance of class balance in training robust and accurate models. Having a well-balanced dataset with equal representation from both classes (i.e., "Doom" and "Animal Crossing") is crucial for training a reliable image classification model. A balanced dataset helps prevent the model from being biased towards one class and ensures that it learns to differentiate between the distinct features and characteristics of each class. In the context of recognizing art styles, a balanced dataset allows the model to capture the nuances and patterns specific to each video game, enabling more accurate classification and interpretation of the art styles present.

## 1.3 Scope and Significance
The scope of this project extends beyond the realm of video games and has implications for recognizing different art styles not only within the gaming industry but also in the broader art world. By developing an image classification model that can accurately classify images from "Doom" and "Animal Crossing", we can shed light on the distinct visual characteristics of these two video games and their relevance to the broader field of art and design. This understanding can be valuable in fields such as game development, art curation, and digital media analysis.

## 1.4 Project Structure
The project follows a systematic approach to build the image classification system for distinguishing between "Doom" and "Animal Crossing" images. It involves steps such as data collection, preprocessing, model development, training, evaluation, and testing. The classification model is implemented using convolutional neural network (CNN) architectures, which have demonstrated excellent performance in image classification tasks. Furthermore, transfer learning techniques are employed to leverage pre-trained models and improve the classification performance.

## 1.5 Objectives
- Understanding and preprocessing the dataset that contains the images and information for both "Doom" and "Animal Crossing" with equal representation.
- Developing a CNN-based image classification model to accurately classify images into their respective classes.
- Evaluating the performance of the model using appropriate evaluation metrics, such as accuracy, precision, recall.
- Incorporating transfer learning techniques to enhance the classification model's performance by leveraging pre-trained models.
- Comparing the performance of the transfer learning-based model with the initially hand-made CNN model.

By achieving these objectives, this project aims to contribute to the understanding of image classification techniques, highlight the importance of class balance in training robust models, and demonstrate the potential of deep learning, including transfer learning, in recognizing and interpreting different art styles in the context of video games and beyond.


# 2 Dataset

https://www.kaggle.com/datasets/andrewmvd/doom-crossing

## 2.1 Description
The Doom or Animal Crossing dataset is a collection of 1597 image posts extracted from both r/doom and r/animalcrossing subreddits. The dataset showcases the distinct art styles of these two popular video games. Each image post is accompanied by metadata, providing valuable insights into the images.

## 2.2 Contents - Dataset Structure
The dataset is organized into two main folders: "doom" and "animal_crossing," representing the respective classes for classification. The "doom" folder contains 840 image files, while the "animal_crossing" folder contains 757 image files. These images were selected as the best of the month in July 2020, ensuring their quality and relevance.

The dataset consists of two main folders:

1. `doom`: Contains 857 image files representing the "Doom" class.
2. `animal_crossing`: Contains 740 image files representing the "Animal Crossing" class.

The dataset also includes two CSV files, namely `animal_crossing_dataset.csv` and`doom_crossing_dataset.csv`, which contain additional information about each image post. Although these CSV files are not directly utilized in this project, they can be valuable resources for future work. The columns in these CSV files provide insights into factors such as upvotes, downvotes, number of comments, and creation time, which can be further explored for analyzing the popularity and engagement of different art styles within the gaming community.

By leveraging the Doom or Animal Crossing dataset, we can delve into the world of image classification, explore the nuances of different art styles, and develop models that accurately classify images from these video games.


## 2.3 Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) have emerged as a powerful deep learning technique for image classification tasks. CNNs are specifically designed to capture spatial hierarchies and local patterns within images, making them well-suited for extracting meaningful features from visual data. The utilization of CNNs in this project enables us to leverage their capabilities in learning and recognizing the unique art styles of the "Doom" and "Animal Crossing" video games.

## 2.4 Data Preprocessing
Before training the CNN models, data preprocessing techniques are applied to ensure the data is in a suitable format for effective model training. This involves steps such as resizing the images to a consistent size, normalizing the pixel values, and dividing the dataset into training and testing sets. Data augmentation techniques, such as random rotations and horizontal flips, may also be employed to enhance the robustness and generalization of the models.

## 2.6 Model Development
The initial step in model development involves designing and configuring the CNN architecture. This includes defining the number and type of layers, selecting appropriate activation functions, and determining the model's overall structure. For this project, multiple CNN architectures can be explored, such as variations of Convolutional, Pooling, and Fully Connected layers, along with Dropout and Batch Normalization layers to prevent overfitting and improve model performance.

## 2.6 Transfer Learning
To further enhance the performance of the classification models, transfer learning techniques are incorporated. Transfer learning involves utilizing pre-trained models that have been trained on large-scale datasets, such as ImageNet, to extract high-level features from images. By leveraging the knowledge gained from these pre-trained models, we can expedite the training process and potentially improve the accuracy of our models.

## 2.7 Training and Evaluation
The training phase involves feeding the preprocessed data into the CNN models and optimizing the model's parameters through a process called backpropagation. During training, metrics such as accuracy, loss, and validation accuracy are monitored to evaluate the model's performance and ensure it is learning effectively. The models are trained using appropriate optimization algorithms, such as Stochastic Gradient Descent (SGD) or Adam, and a suitable loss function, such as categorical cross-entropy.

# 3. Implementation

**Install Dependancies and Setup:**


In [None]:
# Tools
import warnings
# Disable specific UserWarnings
warnings.filterwarnings("ignore", category=UserWarning, module="tensorflow_io")
# Disable DecompressionBombWarning
warnings.filterwarnings("ignore", category=DeprecationWarning)

# Tools
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import os
import shutil

# Tensorflow
import tensorflow as tf 
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.models import load_model
from tensorflow.keras.metrics import Precision, Recall, BinaryAccuracy

# Transfer Learning
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.metrics import Precision, Recall, BinaryAccuracy
from keras.applications.vgg16 import VGG16
from keras.layers import Input, GlobalAveragePooling2D
from keras.models import Model
from keras.optimizers import Adam

In [None]:
# Constants for both the regular CNN learning an dthe tranfer learning
BATCH_SIZE = 32
IMG_WIDTH = 256
IMG_HEIGHT = 256
IMG_CHANNELS = 3
IMG_SHAPE = (IMG_WIDTH, IMG_HEIGHT, IMG_CHANNELS)

print(f"Batch Size: {BATCH_SIZE}")
print(f"Image Shape: {IMG_SHAPE}")

# Define custom color palette
doom_color = '#B9121B'  # Red color for Doom theme
ac_color = '#3CAEA3'  # Green color for Animal Crossing theme

# number of epochs
initial_epochs = 20

## 3.1 Setup

**Configure the GPU**

This code retrieves the list of available physical GPUs using `tf.config.experimental.list_physical_devices('GPU')`.
It iterates over each GPU in the list and sets the memory growth option to True using `tf.config.experimental.set_memory_growth(gpu, True)`.
By setting `memory_growth` to True, TensorFlow will allocate memory on-demand for GPU operations, rather than pre-allocating the entire GPU memory at once.
This can help avoid OOM (Out of Memory) errors by allowing the GPU memory to be dynamically allocated based on the memory requirements of each operation.


In [None]:
# Avoid OOM errors by setting GPU Memory COnsumption Growth
#gpus = tf.config.experimental.list_physical_devices('GPU')
#for gpu in gpus:
    #tf.config.experimental.set_memory_growth(gpu, True)

**Create the data pipeline**

To create the pipeline, the following steps were followed:

1. Loading the Dataset: The `image_dataset_from_directory` function from the TensorFlow Keras library was used to load the dataset from the specified directory `/kaggle/input/doom-crossing`. This function automatically creates a dataset from the images in the directory, along with their corresponding labels.

2. Converting to Numpy Iterator: The `as_numpy_iterator` method was applied to the dataset to obtain an iterator that can be used to iterate over the data. This allows us to access the individual batches of images and labels.

3. Extracting a Batch and Visualizing: The next method was used on the iterator to extract the next batch of images and labels. In this case, the code extracted the first batch using `batch = data_iterator.next()`. The code then visualized a subset of images from the batch by creating a figure with four subplots and displaying the images along with their corresponding labels.

4. Preprocessing the Data: The data variable was reassigned by applying the map function to the dataset. Within the map function, a lambda function was used to divide the pixel values of the images by 255, effectively normalizing the image data. This preprocessing step is commonly performed to ensure that the pixel values are in the range of 0 to 1, which is suitable for neural network training.

In [None]:
# Loading the dataset
data = tf.keras.utils.image_dataset_from_directory('/kaggle/input/doom-crossing')
# Converting to Numpy Iterator
data_iterator = data.as_numpy_iterator()
#Extracting a batch
batch = data_iterator.next()

In [None]:
# Visualizing
fig, ax = plt.subplots(ncols=4, figsize=(20,20))
for idx, img in enumerate(batch[0][:4]):
    ax[idx].imshow(img.astype(int))
    ax[idx].title.set_text(batch[1][idx])

In [None]:
# Preprocessing the data
data = data.map(lambda x,y: (x/255, y))

**Split the Data**

We split the data into 3 sets. 
* Training set: 70%
* Validation set: 20%
* Test set: 10%

In case there was a class imbalance we would use a stratified method that would make a fair split of the classes for all the sub-sets. Because the data is well balanced and also because tehre were issues when we used such functions with the return types, the data were split by just distributing the batches to all the sets.

In [None]:
train_size_percentage = 0.7  # 70% of our data
val_size_percentage = 0.2  # 20% of our data
test_size_percentage = 0.1  # 10% of our data

spared_batches = len(data) % 10  # Number of batches left outside due to division remainder

train_size = int(len(data) * train_size_percentage)
val_size = int(len(data) * val_size_percentage)
test_size = int(len(data) * test_size_percentage) + spared_batches

# Informative Print
print(f"Total dataset size: {len(data)}")
print(f"Train set size: {train_size}")
print(f"Validation set size: {val_size}")
print(f"Test set size: {test_size}")

# Training
train = data.take(train_size)
# Validation
val = data.skip(train_size).take(val_size)
# Testing
test = data.skip(train_size + val_size).take(test_size)

## 3.2 Model Architecture

**Sequential Model**

The Sequential model is a linear stack of layers, where each layer has exactly one input tensor and one output tensor. It is a straightforward way to build deep learning models by stacking layers one after the other.


The chosen architecture for the deep learning model is a Convolutional Neural Network (CNN) which is well-suited for image classification tasks. The model consists of multiple layers that progressively extract and learn features from the input images. Each layer plays a specific role in the network's ability to understand and classify the images accurately.

1. **Convolutional Layer:**
   - Number of filters: 16
   - Filter Size: 3x3 pixels
   - Stride: 1 pixel
   - Activation function: ReLU

   Explanation: Convolutional layers are responsible for scanning over the input image and extracting relevant features by convolving filters of a specific size. The chosen number of filters (16) allows the network to learn various low-level and high-level features. The ReLU activation function introduces non-linearity, enabling the model to capture complex patterns and improve its ability to generalize.

2. **MaxPooling Layer:**

   Explanation: MaxPooling layers help reduce the spatial dimensions of the feature maps generated by the convolutional layers. By using a 2x2 pixel window and taking the maximum value within each window, the layer condenses the information, preserving the most important features while reducing computational complexity and memory requirements.

3. **Convolutional Layer:**
   - Number of filters: 32
   - Filter Size: 3x3 pixels
   - Stride: 1 pixel
   - Activation function: ReLU

   Explanation: Adding another convolutional layer with increased filters allows the model to capture more complex and abstract features from the previous layer's output. The additional filters enable the network to learn a wider range of image representations, aiding in better discrimination and classification.

4. **MaxPooling Layer:**

   Explanation: Another max pooling layer follows the second convolutional layer to further downsample the feature maps, reducing spatial dimensions and retaining important features.

5. **Convolutional Layer:**
   - Number of filters: 32
   - Filter Size: 3x3 pixels
   - Stride: 1 pixel
   - Activation function: ReLU

   Explanation: The third convolutional layer continues to extract higher-level features by convolving more filters over the previous layer's output. This increased depth allows the network to learn more abstract representations and patterns.

6. **MaxPooling Layer:**

   Explanation: The final max pooling layer further reduces the spatial dimensions of the feature maps, providing a more compact representation while preserving relevant features.

7. **Flatten Layer:**

   Explanation: The flatten layer is used to convert the 2-dimensional arrays from the previous layer into a single long continuous linear vector. This flattening step is necessary to connect the convolutional layers' outputs to the fully connected layers that follow, enabling the network to make predictions based on the extracted features.

8. **Dense Layer:**
   - Size: 256
   - Activation function: ReLU

   Explanation: The dense layer, also known as a fully connected layer, consists of 256 neurons. Each neuron in this layer receives input from all the neurons of the previous layer. The dense layer performs classification based on the extracted features from the convolutional layers. The ReLU activation function is applied to introduce non-linearity, enabling the network to learn complex relationships and representations.

9. **Dense Layer:**
   - Size: 1
   - Activation function: Sigmoid

   Explanation: The final dense layer consists of a single neuron, representing the output of the network. With a sigmoid activation function, this layer performs binary classification, assigning a probability value between 0 and 1 to determine the presence or absence of NSFW content.

**Adam Optimizer**

The Adam optimizer is an algorithm for stochastic gradient descent that combines the benefits of two other popular optimizers, AdaGrad and RMSProp. It adapts the learning rate dynamically during training, allowing it to converge faster and handle sparse gradients effectively.

The Adam optimizer uses estimates of the first and second moments of the gradients to update the model weights. It maintains a separate learning rate for each weight parameter and adjusts the learning rate based on the gradients' magnitude. This adaptive learning rate helps the model to converge more efficiently and handle different types of data and architectures effectively.

By using the Adam optimizer with the Binary Crossentropy loss function, the model aims to minimize the difference between the predicted probabilities and the true labels, ensuring the model learns to classify images accurately.

In [None]:
# Create the model
model = Sequential()

# Add the layers
model.add(Conv2D(16, (3,3), 1, activation='relu', input_shape=IMG_SHAPE))
model.add(MaxPooling2D())

model.add(Conv2D(32, (3,3), 1, activation='relu'))
model.add(MaxPooling2D())

model.add(Conv2D(16, (3,3), 1, activation='relu'))
model.add(MaxPooling2D())

model.add(Flatten())

model.add(Dense(256, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile our model
model.compile('adam', loss=tf.losses.BinaryCrossentropy(), metrics=['accuracy'])

# Print the result
model.summary()

In [None]:
tf.keras.utils.plot_model(model)

![custom_cnn_architecture.png](attachment:7d4841cf-8583-4b5f-9c3c-f4d6a981f92e.png)

## 3.3 Model Training

In theory, the code performs the training of a deep learning model using TensorFlow Keras and incorporates the use of a callback for logging the training process. Here's an explanation of what happens:

#### Creating a Log Directory

A directory named 'logs' is created to store the logs generated during the training process. These logs will contain information such as the loss, accuracy, and other metrics at each epoch.
The variable `logdir` is assigned the path of the log directory.

#### Creating a Callback

In TensorFlow, callbacks are objects that enable the execution of specific actions during training, such as logging, saving model checkpoints, or modifying the learning rate.
In this code, a `TensorBoard` callback is created using `tf.keras.callbacks.TensorBoard`. This callback allows the logging of various metrics and visualizations during training, which can be later examined using TensorBoard, a visualization tool provided by TensorFlow.
The `log_dir` parameter is set to the path of the log directory created in the previous step.

#### Model Training

The `model.fit()` function is called to train the model.
The `train` dataset is passed as the training data, and the `val` dataset is provided as the validation data for assessing the model's performance during training.
The `epochs` parameter is set to 20, indicating the number of complete passes over the training dataset.
The `callbacks` parameter is set to `[tensorboard_callback]`, which includes the TensorBoard callback created earlier. This ensures that the training process logs relevant information to the specified log directory.

During the training process, the model iteratively learns from the training data, optimizing its parameters based on the specified loss function and optimizer. The progress of the training, including metrics and visualizations, is recorded in the log directory using the TensorBoard callback.

The `hist` variable holds a `History` object that contains information about the training process, such as the loss and accuracy values at each epoch. This object can be used for further analysis or visualization of the training results.

In [None]:
# Create a log directory
logdir='logs'
# Create a callback
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)

In [None]:
hist = model.fit(train, 
                 epochs=initial_epochs, 
                 validation_data=val, 
                 callbacks=[tensorboard_callback])

In [None]:
fig = plt.figure()
plt.plot(hist.history['loss'], color=doom_color, label='loss')
plt.plot(hist.history['val_loss'], color=ac_color, label='val_loss')
fig.suptitle('Loss', fontsize=20)
plt.legend(loc="upper left")
plt.show()

![custom_cnn_training_loss.png](attachment:bc8663ff-e0fc-4de2-acc3-1ebe420f001c.png)

In [None]:
fig = plt.figure()
plt.plot(hist.history['accuracy'], color=doom_color, label='accuracy')
plt.plot(hist.history['val_accuracy'], color=ac_color, label='val_accuracy')
fig.suptitle('Accuracy', fontsize=20)
plt.legend(loc="upper left")
plt.show()

![custom_cnn_training_accuracy.png](attachment:db135b81-2401-4314-bdac-c7dded886012.png)

**Discussion:**

The training process showed significant improvements in the model's performance. The initial accuracy on the training set was 55.71%, but it steadily increased to an impressive 99.64% by the final epoch. Similarly, the validation accuracy improved from 62.19% to 89.38%. The model successfully learned to classify images from the "Doom" and "Animal Crossing" classes with high accuracy. The loss values consistently decreased, indicating that the model effectively minimized errors. Overall, the results demonstrate the model's ability to capture the distinct art styles of each game and classify images with remarkable accuracy.

## 3.4 Model Evaluation and Results Explanation

The trained model undergoes evaluation using a separate test dataset to assess its performance on unseen data. The evaluation is carried out using three important metrics: precision, recall, and binary accuracy. These metrics provide valuable insights into the model's predictive capabilities and its overall effectiveness.

To perform the evaluation, the code follows the following steps:

First, the necessary metrics, namely precision, recall, and binary accuracy, are initialized using their respective TensorFlow objects. These metrics serve as performance indicators and help evaluate different aspects of the model's classification performance.

Next, the code iterates over the test dataset, extracting the input features X and the corresponding true labels y for each batch. For each batch, the model predicts the labels yhat using the predict() function. The predicted labels are then used to update the internal state of each metric using the update_state() function. This process is repeated for all batches in the test dataset.

Once the predictions and metric updates are completed, the evaluation metrics are computed using the result().numpy() method of each metric. This method retrieves the current value of the metric as a NumPy array, representing the metric's performance based on the provided test data.

Finally, the evaluation results assess its classification accuracy, its ability to correctly identify positive instances (recall), and its ability to avoid false positives (precision).

The evaluation process with the test data is a crucial step in assessing the model's generalization capabilities and its performance on unseen instances. 

In [None]:
precision = Precision()
recall = Recall()
binary_accuracy = BinaryAccuracy()

for batch in test.as_numpy_iterator(): 
    X, y = batch
    yhat = model.predict(X)
    precision.update_state(y, yhat)
    recall.update_state(y, yhat)
    binary_accuracy.update_state(y, yhat)
    
print(f'Precision: {precision.result():.2f}, Recall: {recall.result():.2f}, Binary Accuracy: {binary_accuracy.result():.2f}')

**CNN Model:**

|   Model   | Precision | Recall | Binary Accuracy |
|:---------:|:---------:|:------:|:--------------:|
|   CNN     |   0.79    |  0.82  |      0.79      |

**Discussion:**

During the testing phase, the model achieved a precision of 0.79, recall of 0.82, and a binary accuracy of 0.79. Precision represents the proportion of correctly predicted positive instances (true positives) out of all instances predicted as positive. Recall, also known as sensitivity, measures the proportion of true positive instances correctly identified by the model. Binary accuracy calculates the overall accuracy of the model in correctly classifying both positive and negative instances. These metrics indicate that the model performed well in accurately predicting and distinguishing between the "Doom" and "Animal Crossing" images during the testing phase, with a relatively high precision and recall.

## 3.5 Save the model

Saving a model refers to storing the learned weights, architecture, and associated parameters of a trained model in a file format that can be later retrieved and used for inference or further training.

In the code, the following steps are followed:

**Model Saving:**

The model.save() function is used to save the trained model. The model is saved in the Hierarchical Data Format (HDF5) file format, which is a commonly used format for storing deep learning models. Saving the model ensures that the model's architecture, learned weights, and optimizer state are preserved. The saved model can be considered as a snapshot of the trained model at a specific point in time.

**Model Loading:**

After saving the model, the load_model() function is used to load the saved model into memory. This function reads the saved model file and reconstructs the model object with its original architecture, weights, and parameters. The loaded model can be used for making predictions on new data or further training.

In [None]:
# Save
model.save(os.path.join('models','imageclassifier.h5'))

# Load
loaded_model = load_model('models/imageclassifier.h5') # just in case

# 4. Transfer Learning

## 4.1 Model Architecture

We start by loading the VGG16 base model, pre-trained on the ImageNet dataset, using the VGG16 class from Keras. We set the `input_shape` to match our desired image size and exclude the top classification layers using `include_top=False`.

Next, we freeze the weights of the base model by setting `base_model.trainable = False`. This ensures that only the classification layers added on top will be trained, while the pre-trained weights of the VGG16 model will remain unchanged.

We then add our own classification layers to the base model. This includes a global average pooling layer to reduce the spatial dimensions of the feature maps, followed by a dropout layer for regularization, and a dense layer with a sigmoid activation function to produce the final binary classification output.

The model is compiled with the Adam optimizer, a learning rate of 0.001, and the binary cross-entropy loss function. We also specify the accuracy as the evaluation metric.

**Fine-Tuning**

Fine-tuning the base model: By unfreezing and fine-tuning some of the top layers of the VGG16 base model, we allow the model to adapt and learn task-specific features from our dataset. The earlier layers of the pre-trained VGG16 model capture more generic features, while the later layers capture more specific features. By freezing the earlier layers, we can leverage the pre-trained weights and focus on training the classification layers on top.

Adjusting the learning rate: The learning rate determines the step size at which the model's parameters are updated during training. A smaller learning rate allows for more precise updates, while a larger learning rate allows for faster updates but risks overshooting the optimal values. By decreasing the learning rate to 0.0001, we aim to make smaller updates to fine-tune the model more carefully and potentially find better optima.

In [None]:
# Load the VGG16 base model with pre-trained weights from ImageNet
base_model = VGG16(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')

# Unfreeze some top layers of the base model for fine-tuning
base_model.trainable = True
fine_tune_at = 10  # Fine-tune from this layer onwards
for layer in base_model.layers[:fine_tune_at]:
    layer.trainable = False

# Add classification layers on top of the base model
inputs = Input(shape=IMG_SHAPE)
x = base_model(inputs, training=False)
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
outputs = Dense(1, activation='sigmoid')(x)

# Create the model
vgg_model = Model(inputs=inputs, outputs=outputs)

# Compile the model
optimizer = Adam(learning_rate=0.0001)  # Adjusted learning rate
vgg_model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

This is how the model looks. We have added a Global Average Pooling layer after the VGG Model to flatten the output and then a Dropout and Dense Layer to predict the classes.

In [None]:
tf.keras.utils.plot_model(vgg_model)

![vgg_cnn_architecture.png](attachment:94ebb08d-c90d-45da-9f9e-3cfe11ab9a51.png)

## 4.2 Model Training

We train the model using the fit method. We specify the training dataset (`train`) and the number of epochs to train for (`initial_epochs`). We also provide the validation dataset (`val`) to monitor the model's performance during training.

The fit method performs the training process, optimizing the model's weights based on the specified loss function and evaluation metric. The training progress is stored in the history object, which contains information about the loss and accuracy values for both the training and validation sets at each epoch.

During the training process, we can monitored the training and validation metrics to assess the model's performance. This information can be used to identify potential issues such as overfitting or underfitting. 

In [None]:
hist_vgg = vgg_model.fit(
    train,
    epochs=initial_epochs,
    validation_data=val
)

In [None]:
fig = plt.figure()
plt.plot(hist_vgg.history['loss'], color=doom_color, label='loss')
plt.plot(hist_vgg.history['val_loss'], color=ac_color, label='val_loss')
fig.suptitle('Loss', fontsize=20)
plt.legend(loc="upper left")
plt.show()

![vgg_cnn_training_loss.png](attachment:1393871f-4c3c-429d-85ab-285023e1900b.png)

In [None]:
fig = plt.figure()
plt.plot(hist_vgg.history['accuracy'], color=doom_color, label='accuracy')
plt.plot(hist_vgg
         .history['val_accuracy'], color=ac_color, label='val_accuracy')
fig.suptitle('Accuracy', fontsize=20)
plt.legend(loc="upper left")
plt.show()

![vgg_cnn_training_accuracy.png](attachment:ead2ccc9-a8fa-42df-8e0e-f341856af628.png)

**Discussion:**

The training process of the VGG model involved 20 epochs with a batch size of 35. Throughout the training, the model's loss and accuracy metrics were monitored on both the training and validation sets.

During the initial epochs, the model's accuracy gradually improved, starting from around 69% and reaching approximately 88% by the fifth epoch. This suggests that the model quickly learned to distinguish between the two classes based on their visual features. The validation accuracy also exhibited a similar trend, indicating that the model was generalizing well to unseen data.

From the sixth epoch onward, the model's accuracy continued to improve steadily, eventually surpassing 90%. This indicates that the model was able to capture more intricate patterns and details in the images as the training progressed. Notably, the model achieved a peak accuracy of approximately 97.9% by the 20th epoch, suggesting that further training did not significantly improve its performance.

Regarding the loss metric, the model's loss gradually decreased throughout the training process, indicating that the model was converging towards a better solution. The validation loss also followed a similar trend, indicating that the model was not overfitting the training data.

In terms of the validation accuracy, it reached a peak value of approximately 93.1% by the 17th epoch. This suggests that the model performed well in correctly classifying images from the validation set.

Overall, the training process of the VGG model demonstrated its ability to learn and discriminate between the art styles of "Doom" and "Animal Crossing" with high accuracy. The model's performance highlights its potential in accurately classifying images from video games based on their distinct art styles.

## 4.3 Model Evaluation and Results Explanation

Finally, we evaluate the trained model on the test dataset using the evaluate method. We pass the test dataset (`test_data`) to the method, which computes the loss and accuracy values on the unseen test samples.

The test loss and accuracy values are then printed to the console. These metrics provide an indication of how well the model generalizes to new, unseen data.

Additionally, you can analyze the performance of the model using other evaluation metrics such as `precision` and `recall`. These metrics can provide insights into the model's behavior for each class and help identify any biases or imbalances in the predictions.

In [None]:
precision_vgg = Precision()
recall_vgg = Recall()
binary_accuracy_vgg = BinaryAccuracy()

for batch in test.as_numpy_iterator():
    X, y = batch
    yhat = vgg_model.predict(X)
    precision_vgg.update_state(y, yhat)
    recall_vgg.update_state(y, yhat)
    binary_accuracy_vgg.update_state(y, yhat)

precision_result_vgg = precision_vgg.result()
recall_result_vgg = recall_vgg.result()
accuracy_result_vgg = binary_accuracy_vgg.result()

print(f'Precision: {precision_result_vgg:.2f}, Recall: {recall_result_vgg:.2f}, Binary Accuracy: {accuracy_result_vgg:.2f}')


**VGG Model:**

|   Model   | Precision | Recall | Binary Accuracy |
|:---------:|:---------:|:------:|:--------------:|
|   VGG     |   0.93    |  0.86  |      0.90      |

**Discussion:**


The test evaluation of the VGG model yielded impressive results. The precision score of 0.93 indicates that the model correctly identified the art style of 93% of the images it classified as either "Doom" or "Animal Crossing." This demonstrates the model's ability to make accurate positive predictions.

With a recall score of 0.86, the model successfully captured 86% of the actual positive instances in the test set. This suggests that the model has a good ability to recognize and classify images belonging to the target art styles.

The binary accuracy of 0.90 reflects the overall accuracy of the model in classifying the images correctly. This means that the model achieved an impressive 90% accuracy rate in predicting the correct art style for the given images.

These evaluation metrics collectively illustrate the VGG model's strong performance in accurately classifying the test images. The high precision and binary accuracy scores highlight the model's overall accuracy, while the recall score emphasizes its effectiveness in identifying the positive instances within the dataset.

# 5. Comparison & Conclusion 

## 5.1 Model Comparison

In [None]:
# Define custom color palette
doom_color = '#B9121B'  # Red color for Doom theme
ac_color = '#3CAEA3'  # Green color for Animal Crossing theme

# Calculate metrics for the model
precision = Precision()
recall = Recall()
binary_accuracy = BinaryAccuracy()

for batch in test.as_numpy_iterator():
    X, y = batch
    yhat = model.predict(X)
    precision.update_state(y, yhat)
    recall.update_state(y, yhat)
    binary_accuracy.update_state(y, yhat)

precision_result = precision.result()
recall_result = recall.result()
accuracy_result = binary_accuracy.result()

# Calculate metrics for the VGG model
precision_vgg = Precision()
recall_vgg = Recall()
binary_accuracy_vgg = BinaryAccuracy()

for batch in test.as_numpy_iterator():
    X, y = batch
    yhat = vgg_model.predict(X)
    precision_vgg.update_state(y, yhat)
    recall_vgg.update_state(y, yhat)
    binary_accuracy_vgg.update_state(y, yhat)

precision_result_vgg = precision_vgg.result()
recall_result_vgg = recall_vgg.result()
accuracy_result_vgg = binary_accuracy_vgg.result()

# Plotting the metrics for both models
labels = ['Precision', 'Recall', 'Binary Accuracy']
model_metrics = [precision_result, recall_result, accuracy_result]
vgg_metrics = [precision_result_vgg, recall_result_vgg, accuracy_result_vgg]

x = np.arange(len(labels))
width = 0.35

fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, model_metrics, width, label='Alec`s CNN', color=ac_color)
rects2 = ax.bar(x + width/2, vgg_metrics, width, label='VGG Model', color=doom_color)

# Add some text for labels, title, and custom x-axis tick labels, etc.
ax.set_ylabel('Metrics')
ax.set_title('Model Comparison Metrics')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

# Function to add the metric values on top of the bars
def autolabel(rects):
    for rect in rects:
        height = rect.get_height()
        ax.annotate(f'{height:.2f}', xy=(rect.get_x() + rect.get_width() / 2, height), xytext=(0, 3),
                    textcoords="offset points", ha='center', va='bottom')

autolabel(rects1)
autolabel(rects2)

plt.show()

![comparison.png](attachment:fdba9d7f-d607-4394-932d-5696562fb90f.png)

**Discussion:**

When comparing the performance of our custom CNN model and the VGG model, several key points emerge. The VGG model outperforms our CNN model in terms of precision, recall, and binary accuracy. The VGG model achieves a precision score of 0.93, indicating its higher ability to accurately classify images as either "Doom" or "Animal Crossing" based on their art style. On the other hand, our CNN model achieves a precision score of 0.79, implying a slightly lower precision in correctly identifying the art style of positive instances.

When considering binary accuracy, the VGG model performs significantly better with a score of 0.90, surpassing the CNN model's accuracy of 0.79. This signifies that the VGG model achieves a higher percentage of correct classifications overall, indicating its superior performance in accurately classifying images.

In summary, the VGG model demonstrates superior performance compared to our custom CNN model, with higher precision, comparable recall, and better binary accuracy. The VGG model's architecture, which includes pre-trained weights, contributes to its enhanced ability to accurately classify the images, making it a more reliable choice for this specific art style classification task.

## 5.2 Conclusion 

In conclusion, our journey through this project has been a captivating exploration of art style classification using deep learning techniques. We started by curating a diverse dataset of artwork from the "Doom" and "Animal Crossing" games, enabling us to delve into the distinctive visual aesthetics of these two contrasting worlds. Through our research, we implemented and fine-tuned two models: a custom CNN model and the renowned VGG model.

Throughout the project, we encountered various challenges. The custom CNN model showcased our ability to design and train a model from scratch, allowing us to gain valuable insights into the intricacies of convolutional neural networks. We carefully adjusted hyperparameters, conducted experiments, and witnessed the model's gradual improvement.

However, the true marvel emerged with the VGG model. Leveraging the power of pre-trained weights, the VGG model demonstrated its capability to extract complex features and make accurate predictions. Its exceptional precision, recall, and binary accuracy highlighted the significance of leveraging existing knowledge and architectural advancements.

Beyond the technical achievements, this project has been a testament to the captivating nature of art and the vast potential of artificial intelligence. It has provided us with a deeper appreciation for the visual storytelling in video games and the ability of machine learning models to decipher and categorize artistic styles.