#1. Data Preprocessing:#

*Load the CIFAR-10 dataset.*

*Perform necessary data preprocessing steps:*

  ▪ Normalize pixel values to range between 0 and 1.

  ▪ Convert class labels into one-hot encoded format.

  ▪ Split the dataset into training and test sets (e.g., 50,000 images for training and 10,000 for testing).

  ▪ Optionally, apply data augmentation techniques (such as random flips, rotations, or shifts) to improve the generalization of the model.

In [2]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split

# Load the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert class labels to one-hot encoded format
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Flatten the input data for ANN (from 32x32x3 to 3072)
x_train = x_train.reshape(-1, 32*32*3)
x_test = x_test.reshape(-1, 32*32*3)

print("Training data shape", x_train.shape)
print("Training labels shape", y_train.shape)
print("Testing data shape", x_test.shape)
print("Testing labels shape", y_test.shape)


Training data shape (50000, 3072)
Training labels shape (50000, 10)
Testing data shape (10000, 3072)
Testing labels shape (10000, 10)


**Normalization**: Scale pixel values to a range between 0 and 1.

**One-Hot Encoding**: Convert the class labels into one-hot encoded format for multi-class classification.

**Data Splitting**: Split the dataset into training (50,000) and testing (10,000) images.

#2. Network Architecture Design:#

Design a feedforward neural network to classify the images.

*▪ Input Layer*: The input shape should match the 32x32x3 dimensions of the CIFAR-10 images.

*▪ Hidden Layers*: Use appropriate layers.

*▪ Output Layer*: The final layer should have 10 output neurons (one for each class) with a softmax activation function for multi-class classification.

In [3]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Define the ANN model
model = Sequential()

# Input layer (3072 features from the 32x32x3 image)
model.add(Dense(512, activation='relu', input_shape=(32*32*3,)))

# Hidden layers with ReLU and Dropout for regularization
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))  # Dropout to avoid overfitting

model.add(Dense(128, activation='tanh'))
model.add(Dropout(0.5))

# Output layer (10 classes with softmax for multi-class classification)
model.add(Dense(10, activation='softmax'))

# Display model summary
model.summary()


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


*Input Layer*: 32x32x3 (RGB image).

*Convolutional Layers*: To detect patterns like edges, colors, or textures.

*Pooling Layers*: To downsample the image and reduce complexity.

*Fully Connected Layers*: To classify the extracted features into categories.

*Output Layer*: 10 neurons with softmax activation for multi-class classification.




**Justification**

*Convolutional layers* help in automatically learning filters for feature extraction.

*Pooling layers* reduce the number of parameters and computational load.

*Fully connected layers* consolidate the extracted features into final class scores.

#3. Activation Functions

*ReLU* (Rectified Linear Unit) is efficient for preventing the vanishing gradient problem during backpropagation by allowing faster learning.

*tanh* ensures that the values are centered around zero, which can improve convergence in some cases.



In [4]:
# No change needed in the previous code as ReLU is already used.

**Role in Backpropagation:**

*ReLU*: ReLU mitigates the vanishing gradient problem (which is common with Sigmoid and Tanh) because its gradient does not saturate (except for the zero output case).ReLU deactivates neurons when the input is negative (output is 0), making the model sparse and more computationally efficient.

*tanh*: can be useful in cases where the input data is centered around zero, but it may suffer from the vanishing gradient problem in deeper layers.

#4. Loss Function and Optimizer

The most suitable loss function for multi-class classification is categorical crossentropy. You could compare this with:

  *Mean Squared Error (MSE)*: Not ideal for classification but used to compare performance.
  
  *Sparse Categorical Crossentropy*: Another variant of cross-entropy when the labels are integers.

Use Adam optimizer due to its adaptive learning rate and ability to handle sparse gradients.

In [5]:
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


*Effect of Optimizer & Learning Rate:*

Adam adjusts the learning rate dynamically, leading to faster convergence.
  
If the model isn't converging, reduce the learning rate to allow for finer updates.

#5. Training the Model:

Implement backpropagation to update the weights and biases of the
network during training.

Train the model for a fixed number of epochs (e.g., 50 epochs) and
monitor the training and validation accuracy.

In [6]:
# Train the model (batch size of 64 and 50 epochs)
history = model.fit(x_train, y_train, batch_size=64, epochs=50, validation_data=(x_test, y_test))


Epoch 1/50
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 8ms/step - accuracy: 0.1552 - loss: 2.6320 - val_accuracy: 0.3099 - val_loss: 1.8767
Epoch 2/50
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.3168 - loss: 1.8655 - val_accuracy: 0.3550 - val_loss: 1.7682
Epoch 3/50
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.3337 - loss: 1.8257 - val_accuracy: 0.3673 - val_loss: 1.7387
Epoch 4/50
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.3409 - loss: 1.7992 - val_accuracy: 0.3733 - val_loss: 1.7248
Epoch 5/50
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.3614 - loss: 1.7763 - val_accuracy: 0.3783 - val_loss: 1.7143
Epoch 6/50
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.3620 - loss: 1.7554 - val_accuracy: 0.3723 - val_loss: 1.7151
Epoch 7/50
[1m782/782[0m 

Backpropagation & Learning Rate:

Backpropagation updates the weights in each layer by calculating the gradient of the loss with respect to the weights and adjusting them using the learning rate.

The learning rate determines how large these weight updates are. If it's too high, the model may overshoot optimal points; if too low, it might converge slowly.

#6. Model Evaluation:
After training, evaluate the performance of your model on the test set.

Calculate accuracy, precision, recall, F1-score, and the confusion matrix to understand the model’s classification performance.

In [7]:
from sklearn.metrics import classification_report, confusion_matrix

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)

# Get predictions
y_pred = model.predict(x_test)
y_pred_classes = y_pred.argmax(axis=1)
y_true = y_test.argmax(axis=1)

# Classification report
print(classification_report(y_true, y_pred_classes))

# Confusion matrix
print(confusion_matrix(y_true, y_pred_classes))


[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.4525 - loss: 1.5387
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step
              precision    recall  f1-score   support

           0       0.45      0.55      0.50      1000
           1       0.58      0.56      0.57      1000
           2       0.36      0.17      0.23      1000
           3       0.30      0.35      0.32      1000
           4       0.42      0.32      0.36      1000
           5       0.41      0.32      0.36      1000
           6       0.48      0.52      0.50      1000
           7       0.43      0.60      0.50      1000
           8       0.50      0.67      0.57      1000
           9       0.52      0.46      0.49      1000

    accuracy                           0.45     10000
   macro avg       0.45      0.45      0.44     10000
weighted avg       0.45      0.45      0.44     10000

[[549  32  28  34  19  11  14  64 204  45]
 [ 68 559  11 

*How to Improve Performance:*

 Data Augmentation: Introduce variations in the data to reduce overfitting.

 More Complex Architectures: Add more layers or filters to improve feature extraction.

#7. Optimization Strategies
**Early Stopping**: Stop training when validation accuracy no longer improves.

**Learning Rate Scheduling**: Gradually decrease the learning rate to allow finer convergence.

**Weight Initialization**: Start with weights near zero, but not zero, to ensure symmetry breaking and efficient learning.

**Weight Initialization Importance**:

  Poor initialization can cause vanishing/exploding gradients.

  Techniques like He initialization for ReLU layers can help achieve faster convergence.