This project trains an image classification model using TensorFlow and Keras. The dataset is preprocessed with data augmentation techniques, and the model is trained using a convolutional neural network (CNN).
- Installation
- Dataset Preparation
- Data Augmentation
- Model Architecture
- Mathematical Explanation
- Training the Model
- Evaluation
- Results
- Conclusion
To run this project, install the required dependencies:
pip install tensorflow pandas matplotlibThe dataset is stored in directories:
../data/train/(Training images categorized into subfolders)../data/test/(Testing images categorized into subfolders)
Data augmentation is applied using ImageDataGenerator to improve model generalization:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Data augmentation helps prevent overfitting by artificially increasing the size and diversity of the training dataset.
# It introduces variations such as rotations, zooming, and flipping, forcing the model to learn more generalized patterns rather than memorizing specific details of the training images.
datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.2
)The dataset is loaded using:
train_data = datagen.flow_from_directory(train_dir, target_size=(128,128), batch_size=32, class_mode='categorical', subset='training')
val_data = datagen.flow_from_directory(train_dir, target_size=(128,128), batch_size=32, class_mode='categorical', subset='validation')A CNN model is built with convolutional and pooling layers:
from tensorflow.keras import layers, models
model = models.Sequential([
layers.Conv2D(32, (3,3), activation='relu', input_shape=(128,128,3)),
layers.BatchNormalization(),
layers.MaxPooling2D(2,2),
layers.Conv2D(64, (3,3), activation='relu'),
layers.BatchNormalization(),
layers.MaxPooling2D(2,2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.BatchNormalization(),
layers.Dense(num_classes, activation='softmax')
])
layers.Conv2D(32, (3,3), activation='relu', input_shape=(128,128,3)),
layers.MaxPooling2D(2,2),
layers.Conv2D(64, (3,3), activation='relu'),
layers.MaxPooling2D(2,2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes, activation='softmax')
])A convolutional layer applies a filter (kernel) of size k x k to the input image. Each filter performs element-wise multiplication and sums the results, creating feature maps:
[ Y(i, j) = \sum_{m=0}^{k-1} \sum_{n=0}^{k-1} X(i+m, j+n) W(m, n) + b ]
where:
- (X) is the input image,
- (W) is the kernel,
- (b) is the bias term,
- (Y(i, j)) is the output feature map.
ReLU (Rectified Linear Unit) introduces non-linearity: [ f(x) = \max(0, x) ] This helps the model learn complex patterns.
Max-pooling reduces spatial dimensions while retaining important features:
[
Y(i, j) = \max(X(i:i+2, j:j+2))
]
where a 2x2 window is applied with a stride of 2.
The flattened output is passed to a dense layer: [ Y = WX + b ] where (W) are weights and (b) is the bias.
For multi-class classification, the final layer uses softmax: [ P(y_i) = \frac{e^{z_i}}{\sum_j e^{z_j}} ] This converts logits into probabilities.
The model is compiled and trained with early stopping:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=3)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# The Adam optimizer is used because it combines the advantages of both the Momentum and RMSprop optimizers, leading to faster convergence and adaptive learning rates.
# Categorical cross-entropy is chosen as the loss function because this is a multi-class classification problem, where the model predicts probabilities for multiple categories.
model.fit(train_data, validation_data=val_data, epochs=20, callbacks=[early_stop])After training, the model is evaluated on the test dataset:
test_data = datagen.flow_from_directory(test_dir, target_size=(128,128), batch_size=32, class_mode='categorical')
# The evaluate method calculates the loss and accuracy on the test dataset.
# - Loss is computed using categorical cross-entropy, which measures the dissimilarity between true labels and predicted probabilities.
# - Accuracy is computed as the ratio of correctly predicted samples to total samples in the test set.
evaluation_results = model.evaluate(test_data)
print(f'Test Loss: {evaluation_results[0]}')
print(f'Test Accuracy: {evaluation_results[1]}')- The training and validation accuracy and loss are plotted.
- The final accuracy on the test dataset is reported.
This project demonstrates how to preprocess an image dataset, apply data augmentation, and train a CNN for image classification using TensorFlow.