# Part 2 - Dense vs Convolutional

## Data preprocessing
The Fashion MNIST dataset is preprocessed for training neural networks. The images are reshaped from 2D (28x28 pixels) to 1D (784 pixels) and normalized to have pixel values between 0 and 1. The labels are converted from integer labels to one-hot vectors for multi-class classification.

In [2]:
!pip install keras



In [2]:
from keras.datasets import fashion_mnist

(train_X, train_y), (test_X, test_y) = fashion_mnist.load_data()

In [4]:
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split

# Convert the training and test labels to one-hot vectors
train_y_cat = to_categorical(train_y, num_classes=10)
test_y_cat = to_categorical(test_y, num_classes=10)

# Preprocess the data for DENSE layers
# Reshape the training and test images to 1D (flatten them) and normalize the pixel values (divide by 255)
train_X_flat = train_X.reshape(train_X.shape[0], -1) / 255.0
test_X_flat = test_X.reshape(test_X.shape[0], -1) / 255.0

train_flat,valid_flat,train_label_flat,valid_label_flat = train_test_split(train_X_flat, train_y_cat, test_size=0.2, random_state=13)

print('training set shape:', train_flat.shape)
print('validation set shape:', valid_flat.shape)
print('training label set shape:', train_label_flat.shape)
print('validation label set shape:', valid_label_flat.shape)

training set shape: (48000, 784)
validation set shape: (12000, 784)
training label set shape: (48000, 10)
validation label set shape: (12000, 10)


In [5]:
# Preprocess the data for Convolutional layers
train_X = train_X.reshape(-1, 28, 28, 1)
test_X = test_X.reshape(-1, 28, 28, 1)

# Split train data for train and validation data!
train_X,valid_X,train_label,valid_label = train_test_split(train_X, train_y_cat, test_size=0.2, random_state=13)

print('training set shape:', train_X.shape)
print('validation set shape:', valid_X.shape)
print('training label set shape:', train_label.shape)
print('validation label set shape:', valid_label.shape)

training set shape: (48000, 28, 28, 1)
validation set shape: (12000, 28, 28, 1)
training label set shape: (48000, 10)
validation label set shape: (12000, 10)


## Dense Model nr.1.

The neural network model for multi-class classification is defined as follows:
- simple feed-forward neural network with two hidden layers and one output layer
- `Sequential` class: initializes the model, and it is used to add layers to the model
- `Dense` class: creates fully connected layers, uses sigmoid as activation function
- `SGD` class: specifies the stochastic gradient descent optimizer

### Layer structure:
- Input layer: it is made up of 784 nodes - one for each pixel in a 28x28 image
- 1st hidden layer: 128 nodes, 'sigmoid' activation function and input shape of 784
- 2nd hidden layer: 64 nodes, 'sigmoid' activation function
- Output layer: 10 nodes (for the 10 classes) and 'softmax' activation function

### Compile the model
- with the use of the `compile()` function, the model is configured for training and sets the SGD optimizer, loss function, and metrics
- `learning_rate = 0.01`
- `loss='categorical_crossentropy'`

### Train the model
- fit the model to the training data for a fixed number of iterations
- `epochs=10`
- `batch_size=1000`

### Model evaluation
- compute the loss and the accuracy when compiling the model for the training and test data

In [5]:
# Import necessary libraries and modules
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

# Initialize a linear stack of layers
model1 = Sequential()

# 1st hidden layer
model1.add(Dense(128, activation='sigmoid', input_shape=(784,)))

# 2nd hidden layer
model1.add(Dense(64, activation='sigmoid'))

# Output layer
model1.add(Dense(10, activation='softmax'))

# Compile the model
model1.compile(optimizer=SGD(learning_rate=0.01), loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history_model1 = model1.fit(train_flat, train_label_flat, epochs=10, batch_size=1000, validation_data=(valid_flat, valid_label_flat))

# Evaluate the model on the training data
train_loss, train_accuracy = model1.evaluate(train_flat, train_label_flat, verbose=0)

# Print the training loss and accuracy
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}")

# Evaluate the model on the test data
test_loss, test_accuracy = model1.evaluate(test_X_flat, test_y_cat, verbose=0)

# Print the test loss and accuracy
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}")

2024-06-11 15:29:19.384754: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Train Loss: 2.2170, Train Accuracy: 0.4674
Test Loss: 2.2167, Test Accuracy: 0.4627


## Dense Model nr.2.

For this training, the Sigmoid Activation functions is replaced with ReLu Activation which generally leads to faster convergence and better performance compared to Sigmoid.

The neural network model for multi-class classification is defined as follows:
- simple feed-forward neural network with two hidden layers and one output layer
- `Sequential` class: initializes the model, and it is used to add layers to the model
- `Dense` class: creates fully connected layers, uses ReLu as activation function
- `SGD` class: specifies the stochastic gradient descent optimizer

### Layer structure:
- Input layer: it is made up of 784 nodes - one for each pixel in a 28x28 image
- 1st hidden layer: 128 nodes, 'relu' activation function and input shape of 784
- 2nd hidden layer: 64 nodes, 'relu' activation function
- Output layer: 10 nodes (for the 10 classes) and 'softmax' activation function

### Compile the model
- with the use of the `compile()` function, the model is configured for training and sets the SGD optimizer, loss function, and metrics
- `learning_rate = 0.01`
- `loss='categorical_crossentropy'`

### Train the model
- fit the model to the training data for a fixed number of iterations
- `epochs=10`
- `batch_size=1000`

### Model evaluation
- compute the loss and the accuracy when compiling the model for the training and test data

In [6]:
# Initialize a linear stack of layers
model2 = Sequential()

# 1st hidden layer
model2.add(Dense(128, activation='relu', input_shape=(784,)))

# 2nd hidden layer
model2.add(Dense(64, activation='relu'))

# Output layer
model2.add(Dense(10, activation='softmax'))

# Compile the model
model2.compile(optimizer=SGD(learning_rate=0.01), loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history_model2 = model2.fit(train_flat, train_label_flat, epochs=10, batch_size=1000, validation_data=(valid_flat, valid_label_flat))

# Evaluate the model on the training data
train_loss, train_accuracy = model2.evaluate(train_flat, train_label_flat, verbose=0)

# Print the training loss and accuracy
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}")

# Evaluate the model on the test data
test_loss, test_accuracy = model2.evaluate(test_X_flat, test_y_cat, verbose=0)

# Print the test loss and accuracy
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Train Loss: 0.7069, Train Accuracy: 0.7681
Test Loss: 0.7257, Test Accuracy: 0.7559


## Dense Model nr.3.

The Dropout regularization prevents overfitting by randomly dropping out neurons during training. It is expected that the model performs better with these modifications due to improved generalization.

The neural network model for multi-class classification is defined as follows:
- simple feed-forward neural network with two hidden layers and one output layer
- `Sequential` class: initializes the model, and it is used to add layers to the model
- `Dense` class: creates fully connected layers, uses ReLu as activation function
- `SGD` class: specifies the stochastic gradient descent optimizer
- `Dropout` class: dropout regularization, which helps prevent overfitting by randomly setting a fraction of the input units to 0 during training

### Layer structure:
- Input layer: it is made up of 784 nodes - one for each pixel in a 28x28 image
- Dropout layer: sets a fraction (0.2) of input units to 0 at each update
- 1st hidden layer: 128 nodes, 'relu' activation function and input shape of 784
- 2nd hidden layer: 64 nodes, 'relu' activation function
- Dropout layer: sets a fraction (0.2) of input units to 0 at each update
- Output layer: 10 nodes (for the 10 classes) and 'softmax' activation function

### Compile the model
- with the use of the `compile()` function, the model is configured for training and sets the SGD optimizer, loss function, and metrics
- `learning_rate = 0.01`
- `loss='categorical_crossentropy'`

### Train the model
- fit the model to the training data for a fixed number of iterations
- `epochs=10`
- `batch_size=1000`

### Model evaluation
- compute the loss and the accuracy when compiling the model for the training and test data

In [7]:
from keras.layers import Dropout

# Initialize a linear stack of layers
model3 = Sequential()

# 1st hidden layer
model3.add(Dense(128, activation='relu', input_shape=(784,)))
model3.add(Dropout(0.2))
# 2nd hidden layer
model3.add(Dense(64, activation='relu'))
model3.add(Dropout(0.2))

# Output layer
model3.add(Dense(10, activation='softmax'))

# Compile the model
model3.compile(optimizer=SGD(learning_rate=0.01), loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history_model3 = model3.fit(train_flat, train_label_flat, epochs=10, batch_size=1000, validation_data=(valid_flat, valid_label_flat))

# Evaluate the model on the training data
train_loss, train_accuracy = model3.evaluate(train_flat, train_label_flat, verbose=0)

# Print the training loss and accuracy
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}")

# Evaluate the model on the test data
test_loss, test_accuracy = model3.evaluate(test_X_flat, test_y_cat, verbose=0)

# Print the test loss and accuracy
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Train Loss: 0.8175, Train Accuracy: 0.7281
Test Loss: 0.8319, Test Accuracy: 0.7127


## Convolutional model nr.1.

This model uses convolutional layers to extract features from the images.

- `Sequential` class: creates a network, allows stacking layers sequentially
- `Conv2D` class: extracts patterns from greyscale images
- `Flatten` class: prepare output of convolutional layers for the fully connected layers
- `Dense` class: creates fully connected layers
- `SGD` class: specifies the stochastic gradient descent optimizer

### Layer structure:
- Input layer: takes input with shape (28, 28, 1)
- 1st hidden layer: 32 nodes, 'sigmoid' activation function and input shape of (28, 28, 1)
- 2nd hidden layer: 64 nodes, 'sigmoid' activation function
- Dense layers: first one has 256 neurons, the second one has 128 neurons, both uses 'sigmoid' activation function
- Output layer: 10 nodes (for the 10 classes) and 'softmax' activation function

### Compile the model
- with the use of the `compile()` function, the model is configured for training and sets the SGD optimizer, loss function, and metrics
- `learning_rate = 0.001`
- `loss='categorical_crossentropy'`

### Train the model
- fit the model to the training data for a fixed number of iterations
- `epochs=10`
- `batch_size=1000`

### Model evaluation
- compute the loss and the accuracy when compiling the model for the training and test data

In [6]:
from keras.models import Sequential
from keras.layers import Conv2D, Flatten, Dense
from keras.optimizers import SGD

# Initialize the model
model_cnn1 = Sequential()

# Add convolutional layers
model_cnn1.add(Conv2D(32, kernel_size=(3, 3), activation='sigmoid', input_shape=(28, 28, 1)))
model_cnn1.add(Conv2D(64, kernel_size=(3, 3), activation='sigmoid'))

# Flatten the output of the conv layers to feed into the dense layers
model_cnn1.add(Flatten())

# Add dense layers
model_cnn1.add(Dense(256, activation='sigmoid'))
model_cnn1.add(Dense(128, activation='sigmoid'))

# Output layer with softmax activation for classification
model_cnn1.add(Dense(10, activation='softmax'))

# Compile the model
model_cnn1.compile(optimizer=SGD(learning_rate=0.01), loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history_cnn1 = model_cnn1.fit(train_X, train_label, epochs=10, batch_size=1000, validation_data=(valid_X, valid_label))

# Evaluate the model on the training data
train_loss_relu, train_accuracy_relu = model_cnn1.evaluate(train_X, train_label, verbose=0)

# Print the training loss and accuracy
print(f"ReLU Train Loss: {train_loss_relu:.4f}, Train Accuracy: {train_accuracy_relu:.4f}")

# Evaluate the model on the test data
test_loss_relu, test_accuracy_relu = model_cnn1.evaluate(test_X, test_y_cat, verbose=0)

# Print the test loss and accuracy
print(f"ReLU Test Loss: {test_loss_relu:.4f}, Test Accuracy: {test_accuracy_relu:.4f}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
ReLU Train Loss: 2.0068, Train Accuracy: 0.5723
ReLU Test Loss: 2.0074, Test Accuracy: 0.5694


## Convolutional model nr.2.

This model uses convolutional layers to extract features from the images. The Sigmoid Activation functions is replaced with ReLu Activation which generally leads to faster convergence and better performance compared to Sigmoid.

- `Sequential` class: creates a network, allows stacking layers sequentially
- `Conv2D` class: extracts patterns from greyscale images
- `Flatten` class: prepare output of convolutional layers for the fully connected layers
- `Dense` class: creates fully connected layers
- `SGD` class: specifies the stochastic gradient descent optimizer

### Layer structure:
- Input layer: takes input with shape (28, 28, 1)
- 1st hidden layer: 32 nodes, 'relu' activation function and input shape of (28, 28, 1)
- 2nd hidden layer: 64 nodes, 'relu' activation function
- Dense layers: first one has 256 neurons, the second one has 128 neurons, both uses 'relu' activation function
- Output layer: 10 nodes (for the 10 classes) and 'softmax' activation function

### Compile the model
- with the use of the `compile()` function, the model is configured for training and sets the SGD optimizer, loss function, and metrics
- `learning_rate = 0.001`
- `loss='categorical_crossentropy'`

### Train the model
- fit the model to the training data for a fixed number of iterations
- `epochs=10`
- `batch_size=1000`

### Model evaluation
- compute the loss and the accuracy when compiling the model for the training and test data


In [7]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, BatchNormalization, Dropout

# Initialize the model
model_cnn2 = Sequential()

# Add convolutional layers
model_cnn2.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model_cnn2.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))

# Flatten the output of the conv layers to feed into the dense layers
model_cnn2.add(Flatten())

# Add dense layers
model_cnn2.add(Dense(256, activation='relu'))
model_cnn2.add(Dense(128, activation='relu'))

# Output layer with softmax activation for classification
model_cnn2.add(Dense(10, activation='softmax'))

# Compile the model
model_cnn2.compile(optimizer=SGD(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history_cnn2 = model_cnn2.fit(train_X, train_label, epochs=10, batch_size=1000, validation_data=(valid_X, valid_label))

# Evaluate the model on the training data
train_loss_relu, train_accuracy_relu = model_cnn2.evaluate(train_X, train_label, verbose=0)

# Print the training loss and accuracy
print(f"ReLU Train Loss: {train_loss_relu:.4f}, Train Accuracy: {train_accuracy_relu:.4f}")

# Evaluate the model on the test data
test_loss_relu, test_accuracy_relu = model_cnn2.evaluate(test_X, test_y_cat, verbose=0)

# Print the test loss and accuracy
print(f"ReLU Test Loss: {test_loss_relu:.4f}, Test Accuracy: {test_accuracy_relu:.4f}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
ReLU Train Loss: 0.3617, Train Accuracy: 0.8738
ReLU Test Loss: 0.4343, Test Accuracy: 0.8518


## Convolutional model nr.3.

This model uses convolutional layers to extract features from the images. The Dropout layers help prevent overfitting by randomly setting a fraction of input units to 0 at each update during training.

- `Sequential` class: creates a network, allows stacking layers sequentially
- `Conv2D` class: extracts patterns from greyscale images
- `Flatten` class: prepare output of convolutional layers for the fully connected layers
- `Dense` class: creates fully connected layers
- `SGD` class: specifies the stochastic gradient descent optimizer
- `Dropout` class: dropout regularization, which helps prevent overfitting by randomly setting a fraction of the input units to 0 during training

### Layer structure:
- Input layer: takes input with shape (28, 28, 1)
- 1st hidden layer: 32 nodes, 'relu' activation function and input shape of (28, 28, 1)
- 2nd hidden layer: 64 nodes, 'relu' activation function
- Dense layers: first one has 256 neurons, the second one has 128 neurons, both uses 'relu' activation function
- Dropout layer: sets a fraction (0.5) of input units to 0 at each update
- Output layer: 10 nodes (for the 10 classes) and 'softmax' activation function

### Compile the model
- with the use of the `compile()` function, the model is configured for training and sets SGD the optimizer, loss function, and metrics
- `learning_rate = 0.001`
- `loss='categorical_crossentropy'`

### Train the model
- fit the model to the training data for a fixed number of iterations
- `epochs=10`
- `batch_size=1000`

### Model evaluation
- compute the loss and the accuracy when compiling the model for the training and test data


In [8]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, Dropout
from keras.optimizers import SGD

# Initialize the model
model_cnn3 = Sequential()

# Add convolutional layers
model_cnn3.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model_cnn3.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))

# Flatten the output of the conv layers to feed into the dense layers
model_cnn3.add(Flatten())

# Add dense layers
model_cnn3.add(Dense(256, activation='relu'))
model_cnn3.add(Dropout(0.5))
model_cnn3.add(Dense(128, activation='relu'))
model_cnn3.add(Dropout(0.5))

# Output layer with softmax activation for classification
model_cnn3.add(Dense(10, activation='softmax'))

# Compile the model
model_cnn3.compile(optimizer=SGD(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history_cnn3 = model_cnn3.fit(train_X, train_label, epochs=10, batch_size=1000, validation_data=(valid_X, valid_label))

# Evaluate the model on the training data
train_loss_relu, train_accuracy_relu = model_cnn3.evaluate(train_X, train_label, verbose=0)

# Print the training loss and accuracy
print(f"ReLU Train Loss: {train_loss_relu:.4f}, Train Accuracy: {train_accuracy_relu:.4f}")

# Evaluate the model on the test data
test_loss_relu, test_accuracy_relu = model_cnn3.evaluate(test_X, test_y_cat, verbose=0)

# Print the test loss and accuracy
print(f"ReLU Test Loss: {test_loss_relu:.4f}, Test Accuracy: {test_accuracy_relu:.4f}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
ReLU Train Loss: 0.5543, Train Accuracy: 0.8324
ReLU Test Loss: 0.5823, Test Accuracy: 0.8216
