# Deep and Convolutional Neural Networks for Image Classification

At surface value there is not much difference between deep neural networks (DNNs) and convolutional neural networks (CNNs). Both are types of artificial neural networks used in machine learning and deep learning. However, they have different architectures and are suited for different types of tasks.

DNNs are composed of multiple layers of interconnected neurons, where each neuron in one layer is connected to every neuron in the next layer. This architecture allows DNNs to learn complex patterns and relationships in data, making them suitable for a wide range of tasks, including image classification, natural language processing, and speech recognition.

CNNs, on the other hand, are specifically designed for processing grid-like data, such as images. They use convolutional layers that apply filters to the input data to extract features, followed by pooling layers that reduce the spatial dimensions of the data. This architecture allows CNNs to effectively capture spatial hierarchies and patterns in images, making them particularly well-suited for image classification tasks.

## Network Architecture Design

Starting with architecture, there are a number of layers that are commonly used in DNNs and CNNs. For DNNs, these include:
- Input Layer: The first layer that receives the input data.
- Hidden Layers: Multiple layers of neurons that process the input data and learn patterns.
- Output Layer: The final layer that produces the output predictions.
For CNNs, the common layers include:
- Convolutional Layers: Apply filters to the input data to extract features.
- Pooling Layers: Reduce the spatial dimensions of the data.
- Fully Connected Layers: Similar to the hidden layers in DNNs, where each neuron is connected to every neuron in the next layer.
- Output Layer: The final layer that produces the output predictions.

We will use tensorflow and keras to build both types of networks for image classification.

In [3]:
from tensorflow.keras import Model, Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Input, AveragePooling2D
from tensorflow.keras.optimizers import Adam

In [18]:
#The size of the input images
input_shape = (432,288,3)

#The number of classes
classes_count = 10

In [19]:
model = Sequential([
    Input(input_shape),

    # First convolutional layer
    Conv2D(32,kernel_size=(5,5),activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    
    # Second convolutional layer
    Conv2D(64, kernel_size=(3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    
    # Flatten the feature map
    Flatten(),
    
    # Fully connected layers
    Dense(128, activation='relu'),
    Dropout(0.5),

    Dense(64, activation='relu'),
    
    # Output layer
    Dense(classes_count, activation='softmax'),
])

We can print a summary of the model architecture to see the number of layers and parameters in each network.

In [20]:
model.summary()

## Network Training

In [15]:
import os
from keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from datetime import datetime as datetime

We will need to configure a couple of things before we dive into training the neural network. We need to specify where the training and testing data is located, the number of classes we are trying to predict, the image size, and the batch size. These parameters will be used to configure the data generators as well as the neural network architecture.

In [39]:
BASE_PATH = r"C:\Users\JTWit\Documents\ECE 579\Datasets\Split GTZAN Dataset"
TEST_PATH = os.path.join(BASE_PATH,'test')
TRAIN_PATH = os.path.join(BASE_PATH,'train')

SAVE_PATH = os.path.join(r"C:\Users\JTWit\Documents\ECE 579","Custom DNN Models")

#Make the save path for the neural network just in case it does not yet exist
os.makedirs(SAVE_PATH,exist_ok = True)

checkpoint_dir = os.path.join(r"C:\Users\JTWit\Documents\ECE 579",'Training Checkpoints')
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}.weights.h5")

# Create the directory if it doesn't exist
os.makedirs(checkpoint_dir, exist_ok=True)

In [37]:
LEARNING_RATE = 1e-5
EPOCHS = 30

TARGET_SIZE = (432,288)

NETWORK_NAME = "GTZAN Custom DNN"

Let's specify some callbacks to help with training. Callbacks are functions that are called during the training process at specific points, such as at the end of an epoch or after a certain number of batches. They can be used to monitor the training process, save the model, and adjust the learning rate.

In [40]:
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.2, 
    patience=2)

earlystop = EarlyStopping(
    monitor='val_acc',
    mode="max", 
    patience=3)

checkpoint = ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True,  
    monitor='val_loss',      
    save_best_only=False,    
    verbose=1                
)


callbacks = [reduce_lr,earlystop] 

With everything in place, we can now compile the models and begin training. We will use the Adam optimizer and categorical cross-entropy loss function for both networks.

In [41]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

We will also have to configure the train and validation data generators to load the images from the specified directories, resize them to the desired input shape, and apply any necessary preprocessing.

In [42]:
def get_train_and_validation_data(training_path, training_options, validation_split=0.2):
    # Create an ImageDataGenerator with validation_split
    datagen = ImageDataGenerator(
        rotation_range=training_options["rotation_range"],
        width_shift_range=training_options["width_shift_range"],
        height_shift_range=training_options["height_shift_range"],
        brightness_range=training_options["brightness_range"],
        rescale=1./255,  # Important for scaling pixel values
        validation_split=validation_split
    )

    # Training generator
    train_generator = datagen.flow_from_directory(
        training_path,
        target_size=training_options["target_size"],
        batch_size=training_options["batch_size"],
        class_mode='categorical',
        subset='training'
    )

    # Validation generator
    validation_generator = datagen.flow_from_directory(
        training_path,
        target_size=training_options["target_size"],
        batch_size=training_options["batch_size"],
        class_mode='categorical',
        subset='validation'
    )

    return train_generator, validation_generator

In [43]:
#Data generators
train_options = {
    "rotation_range": 0,           # Slightly reduced
    "width_shift_range": 0.0,      # Up to 5% shift (1.5 pixels for 30x30)
    "height_shift_range": 0.,     # Up to 5% shift
    "brightness_range": (1, 1), # Gentle brightness adjustment
    "target_size": TARGET_SIZE,
    "batch_size": 16
}

train_gen,valid_gen = get_train_and_validation_data(TRAIN_PATH,train_options)

Found 640 images belonging to 10 classes.
Found 159 images belonging to 10 classes.


In [None]:
history = model.fit(train_gen, validation_data=valid_gen, epochs=EPOCHS, callbacks =  callbacks)
    
accuracy = history.history['accuracy'][-1]
date_str = datetime.today().strftime('%Y-%m-%d')
name_string = f"{NETWORK_NAME}(accuracy = {accuracy:.4f})(date = {date_str}).keras"

save_file = os.path.join(SAVE_PATH,name_string)

model.save(save_file)

Epoch 1/30
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m45s[0m 1s/step - accuracy: 0.8906 - loss: 0.3229 - val_accuracy: 0.4969 - val_loss: 2.2162 - learning_rate: 0.0010
Epoch 2/30


  current = self.get_monitor_value(logs)


[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m46s[0m 1s/step - accuracy: 0.9141 - loss: 0.2424 - val_accuracy: 0.4591 - val_loss: 2.3171 - learning_rate: 0.0010
Epoch 3/30
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 1s/step - accuracy: 0.9312 - loss: 0.2280 - val_accuracy: 0.4340 - val_loss: 2.5203 - learning_rate: 0.0010
Epoch 4/30
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 1s/step - accuracy: 0.9359 - loss: 0.1889 - val_accuracy: 0.4717 - val_loss: 2.4472 - learning_rate: 2.0000e-04
Epoch 5/30
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 2s/step - accuracy: 0.9609 - loss: 0.1326 - val_accuracy: 0.4780 - val_loss: 2.5027 - learning_rate: 2.0000e-04
Epoch 6/30
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 1s/step - accuracy: 0.9672 - loss: 0.1076 - val_accuracy: 0.4906 - val_loss: 2.5540 - learning_rate: 4.0000e-05
Epoch 7/30
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 1s/

## Network Evaluation

With a trained neural networkk in hand, we can evaluate its performance on the test dataset. This involves using the model to make predictions on the test data and comparing those predictions to the true labels to calculate metrics such as accuracy, precision, recall, and F1-score.