# Regularization with Alpha Dropout and MC Dropout

Using the MNIST dataset, extend the previously trained deep neural network by applying Alpha Dropout. Then, without retraining, use Monte Carlo (MC) Dropout  at inference to estimate if you can achieve better accuracy. Set random seeds to 42. Use the following configuration:
- Flatten input images to 28 × 28 = 784 features
- 3 hidden layers, 64 neurons each
- SELU activation function (required for Alpha Dropout)
- LeCun normal initialization
- Alpha Dropout rate: 0.1 in all hidden layers
- Output layer: 10 neurons with softmax
- Optimizer: Nadam
- learning rate = 0.001, loss=sparse categorical crossentropy
- epochs = 50, batch size = 32
- Use only the first 1000 training samples and first 200 test samples
- For MC Dropout, enable dropout during inference and average predictions over 20 stochastic forward passes

Q3.1 Report the test accuracy of the network with Alpha Dropout applied during training.

Q3.2 Report the MC Dropout-enhanced accuracy (averaging 20 stochastic predictions).

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(42)
tf.random.set_seed(42)
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.19.0


In [None]:
# Load MNIST
mnist = tf.keras.datasets.mnist.load_data()
(X_full, y_full), (X_test_full, y_test_full) = mnist

# Desired split sizes
train_samples = 800
valid_samples = 200
test_samples  = 200

# Select the first 1000 for train+valid
X_small = X_full[:train_samples + valid_samples]
y_small = y_full[:train_samples + valid_samples]

# Split into 800 train / 200 valid
X_train, X_valid = X_small[:train_samples], X_small[train_samples:]
y_train, y_valid = y_small[:train_samples], y_small[train_samples:]

# Select first 200 test instances
X_test = X_test_full[:test_samples]
y_test = y_test_full[:test_samples]

In [None]:
# Preprocess the data: reshape and normalize pixel values to [0, 1]
X_train = X_train.reshape(-1, 28*28).astype('float32') / 255.0
X_valid = X_valid.reshape(-1, 28*28).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28*28).astype('float32') / 255.0

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.InputLayer(shape=(28*28,)),
    tf.keras.layers.Dense(64, activation='selu', kernel_initializer='lecun_normal'),
    tf.keras.layers.AlphaDropout(0.1),
    tf.keras.layers.Dense(64, activation='selu', kernel_initializer='lecun_normal'),
    tf.keras.layers.AlphaDropout(0.1),
    tf.keras.layers.Dense(64, activation='selu', kernel_initializer='lecun_normal'),
    tf.keras.layers.AlphaDropout(0.1),
    tf.keras.layers.Dense(10, activation='softmax')
])

In [None]:
nadam_optimizer = tf.keras.optimizers.Nadam(learning_rate=0.001)

model.compile(
    optimizer=nadam_optimizer,
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)
model.summary()

In [None]:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

In [None]:
model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_data=(X_valid, y_valid),
)

Epoch 1/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 14ms/step - accuracy: 0.1563 - loss: 2.4793 - val_accuracy: 0.7150 - val_loss: 1.0553
Epoch 2/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.4780 - loss: 1.5407 - val_accuracy: 0.7800 - val_loss: 0.7000
Epoch 3/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6590 - loss: 1.0299 - val_accuracy: 0.8250 - val_loss: 0.6178
Epoch 4/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7561 - loss: 0.7906 - val_accuracy: 0.8250 - val_loss: 0.6360
Epoch 5/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7714 - loss: 0.6542 - val_accuracy: 0.8600 - val_loss: 0.5841
Epoch 6/50
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.8230 - loss: 0.5815 - val_accuracy: 0.8450 - val_loss: 0.6008
Epoch 7/50
[1m25/25[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x7bcf823b8fb0>

In [None]:
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"\nQ3.1 Test accuracy (standard inference): {test_acc:.4f}  (loss {test_loss:.4f})")

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9158 - loss: 0.4871 

Q3.1 Test accuracy (standard inference): 0.9100  (loss 0.5321)


### Monte Carlo (MC) Droput



In [None]:
# ---------- Q3.2: MC Dropout inference (enable dropout during predict) ----------
from scipy.stats import entropy
# We collect T softmax outputs and average them
T = 20
preds = np.zeros((T, X_test.shape[0], 10), dtype=np.float32)

for t in range(T):
    # Pass training=True to enable AlphaDropout during inference
    preds[t] = model(X_test, training=True).numpy()

mean_preds = preds.mean(axis=0)  # shape (n_samples, num_classes)
mc_labels = mean_preds.argmax(axis=1)
mc_acc = (mc_labels == y_test).mean()
print(f"Q3.2 MC Dropout accuracy (T={T}): {mc_acc:.4f}")

# Exit with a short summary as well
print("\nSUMMARY:")
print(f" - Q3.1 standard accuracy: {test_acc:.4f}")
print(f" - Q3.2 MC accuracy (T={T}): {mc_acc:.4f}")

Q3.2 MC Dropout accuracy (T=20): 0.8900

SUMMARY:
 - Q3.1 standard accuracy: 0.9100
 - Q3.2 MC accuracy (T=20): 0.8900
