# Coursework 2 – Machine Learning  
## Image Classification
**Student Name:** Bakhtiyor Sohibnazarov   
**Student ID:** Z22590018   
**Module:** CMP-X303-0 – Machine Learning   
**Updated:** 15th November 2025  

## Project Overview
This coursework focuses on building and evaluating a neural network for image classification using the Intel Image dataset. The objective is to develop a supervised learning model capable of learning from labelled images and accurately predicting their categories. The notebook covers dataset preparation, model design, training, evaluation, and potential enhancements to improve performance and generalisation.


### Importing Libraries

For this image classification task, we need a few essential libraries. If they are not already installed in your environment, you can run the cell below to install them first.

**Required Packages:**  
- TensorFlow – for building and training the neural network
- NumPy – for numerical operations and array manipulation
- Matplotlib – for plotting graphs and visualizing results

In [2]:
# pip command to install above packages for our ML model to run.
# Please note TensorFlow will take around 650MB of Storage space. 
# Make sure you have sufficient storage before running this notebook

!pip -q install tensorflow numpy matplotlib kagglehub

In [3]:
# Import Required Libraries
import os
import shutil

# To download dataset if it does not exist
import kagglehub

# NumPy for numerical and array manipulation
import numpy as np

# Matplotlib for creating graphs
import matplotlib.pyplot as plt

# TensorFlow and Keras for building and training the neural network
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing import image_dataset_from_directory


  from .autonotebook import tqdm as notebook_tqdm
2025-11-22 16:57:38.104857: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-11-22 16:57:39.446335: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-11-22 16:57:42.719815: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.


### Load Dataset, Training, and Testing

For this project, we are working with the Intel Image Classification Dataset from Kaggle. Kagglehub which is already imported above will download dataset in the directory `intel-data`.

Kagglehub downloads data to its pre-defined cache directory and thats why we need to manually move downloaded data into our current directory.

One advantage of this dataset is that it comes pre-split into training and testing sets, which removes the need to manually divide the data. The training folder also allows us to create a validation set directly using TensorFlow’s built-in `validation_split` feature.

Each class has its own folder containing images, which makes it ideal for loading through TensorFlow’s `image_dataset_from_directory` utility.

In [5]:
# Download Intel Image Classification Dataset and capture its path in below variable
path = kagglehub.dataset_download("puneet6060/intel-image-classification")

# Get currect working directory
current_dir = os.getcwd()

# Define target directory to copy dataset
target_dir = os.path.join(current_dir, "intel-data")

if os.path.exists(os.path.join(current_dir, "intel-data"):
    shutil.rmtree(target_dir)
    # Move dataset to target directory
    shutil.move(path, target_dir)
else:
    shutil.move(path, target_dir)

'/home/bakhtiyor/Documents/Coursework2-ML/intel-data/2'

In [14]:
train_dir = os.path.join(target_dir, "seg_train/seg_train")
val_dir = os.path.join(target_dir, "seg_test/seg_test")

train_data = image_dataset_from_directory(
train_dir,
validation_split=0.2,
subset='training',
seed=123,
image_size=(150,150),
batch_size=32,
label_mode='categorical'
)


val_data = image_dataset_from_directory(
val_dir,
validation_split=0.2,
subset='validation',
seed=123,
image_size=(150,150),
batch_size=32,
label_mode='categorical'
)


num_classes = len(train_data.class_names)

Found 14034 files belonging to 6 classes.
Using 11228 files for training.
Found 3000 files belonging to 6 classes.
Using 600 files for validation.


2025-11-21 16:31:33.882362: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


In [11]:
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(150,150,3)),
MaxPooling2D(2,2),


Conv2D(64, (3,3), activation='relu'),
MaxPooling2D(2,2),


Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(num_classes, activation='softmax')
])


model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [12]:
# ---------------------------
# 3. Compile Model
# ---------------------------
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)


# ---------------------------
# 4. Train Model
# ---------------------------
history = model.fit(
train_data,
validation_data=val_data,
epochs=10
)


# ---------------------------
# 5. Plot Accuracy & Loss
# ---------------------------
plt.figure(figsize=(12,5))
plt.subplot(1,2,1)
plt.plot(history.history['accuracy'], label='Train')
plt.plot(history.history['val_accuracy'], label='Validation')
plt.title('Accuracy')
plt.legend()


plt.subplot(1,2,2)
plt.plot(history.history['loss'], label='Train')
plt.plot(history.history['val_loss'], label='Validation')
plt.title('Loss')
plt.legend()
plt.show()


# ---------------------------
# 6. Confusion Matrix & Predictions
# ---------------------------
y_true = np.concatenate([y for x, y in val_data], axis=0)
y_pred = np.argmax(model.predict(val_data), axis=1)
y_true_classes = np.argmax(y_true, axis=1)


conf_matrix = tf.math.confusion_matrix(y_true_classes, y_pred)
print('Confusion Matrix:\n', conf_matrix)


# ---------------------------
# 7. Optional: Simple NN for LO3 Comparison
# ---------------------------
simple_model = Sequential([
Flatten(input_shape=(150,150,3)),
Dense(64, activation='relu'),
Dense(num_classes, activation='softmax')
])


simple_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
simple_model.fit(train_data.map(lambda x,y: (tf.reshape(x,(x.shape[0],-1)), y)),
val_data.map(lambda x,y: (tf.reshape(x,(x.shape[0],-1)), y)),
epochs=5)

Epoch 1/10


2025-11-20 21:36:35.614762: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 89718784 exceeds 10% of free system memory.
2025-11-20 21:36:36.462675: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 89718784 exceeds 10% of free system memory.


[1m  1/351[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m21:58[0m 4s/step - accuracy: 0.1562 - loss: 50.1692

2025-11-20 21:36:36.861000: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 89718784 exceeds 10% of free system memory.
2025-11-20 21:36:37.553234: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 89718784 exceeds 10% of free system memory.


[1m  2/351[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m5:49[0m 1s/step - accuracy: 0.1328 - loss: 291.5613

2025-11-20 21:36:37.861407: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 89718784 exceeds 10% of free system memory.


[1m 25/351[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m4:43[0m 868ms/step - accuracy: 0.1858 - loss: 300.0310

KeyboardInterrupt: 