# MNIST Digit Recognition with CNN
## Theoretical Foundations in Machine Learning
## 1. Importing required files

In [4]:
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
import numpy as np

## 2. Data Preprocessing
> **Normalization**: Scaling pixel values to the [0–1] range helps stabilize gradient descent during training.  
> **One-hot Encoding**: Transforms class labels into a binary matrix representation, essential for multi-class classification.

For a quick understanding of one-hot encoding, refer to this video: [YouTube - One Hot Encoding](https://www.youtube.com/watch?v=i2JSH5tn2qc)

In [5]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the image data
X_train = X_train.reshape(-1,28,28,1).astype('float32') / 255
X_test = X_test.reshape(-1,28,28,1).astype('float32') / 255

# Convert labels to one-hot encoded format
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step


## 3. Conceptual Questions
- **Why use CNNs instead of traditional models like Random Forests or SVMs for this MNIST image classification?**
  
- **Why are non-linear activation functions essential in neural networks? Which activation function is most appropriate here, and why?**


## 4. Model Architecture
> Experiment with different combinations of:
> - Convolutional layers
> - Filter sizes
> - Activation functions
> - Pooling strategies

In [None]:
model = models.Sequential([
    # Uncomment and modify based on your experimentation
    # layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    # layers.MaxPooling2D((2,2)),
    # layers.Conv2D(64, (3,3), activation='relu'),
    # layers.MaxPooling2D((2,2)),
    # layers.Dropout(0.5),
    # layers.Flatten(),
    # layers.Dense(128, activation='relu'),
    # layers.Dense(10, activation='softmax')
])

## 5. Model Training
> Experiment with various:
> - **Optimizers** (e.g., Adam, SGD with momentum, RMSprop)
> - **Loss Functions** (e.g., categorical crossentropy, KL divergence, hinge loss)
> - **Hyperparameters** (e.g., learning rate, batch size, number of epochs)

Evaluate:
- Training speed and convergence
- Validation accuracy
- Computational efficiency

Justify the best combination based on both empirical performance and theoretical understanding.

In [None]:
model.compile(optimizer=tf.keras.optimizers.---,
              loss='---',
              metrics=['accuracy'])

history = model.fit(X_train, y_train,
                    epochs=---,
                    batch_size=---,
                    validation_split=0.2)

## 6. Model Evaluation
Evaluate the model using the following metrics:
- **Accuracy**: Overall classification performance
- **Confusion Matrix**: Class-wise prediction performance
- **Classification Report**: Precision, recall, F1-score

In [None]:
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")

from sklearn.metrics import confusion_matrix
y_pred = np.argmax(model.predict(X_test), axis=1)
cm = confusion_matrix(np.argmax(y_test, axis=1), y_pred)

# Visualization
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training vs Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

## 7. Overfitting Mitigation Strategies
> Techniques to reduce overfitting in CNNs:
- Data augmentation
- Dropout layers
- L2 regularization
- Early stopping

## 8. Hyperparameter Tuning Guide
| Parameter       | Suggested Range       |
|----------------|------------------------|
| Learning Rate   | 1e-2 to 1e-5           |
| Batch Size      | 32 to 256              |
| Filter Sizes    | 32 to 128              |
| Dense Units     | 64 to 512              |