<a href="https://colab.research.google.com/github/DaniiarR/ai-application-system-hw-week5/blob/main/activity_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

In [3]:
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images / 255.0
test_images = test_images / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [4]:
# Define the learning rates to be experimented
learning_rates = [0.001, 0.01, 0.1]

for lr in learning_rates:
    optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
    model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=optimizer,
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    print(f"Training model with learning rate: {lr}")
    model.fit(train_images, train_labels, epochs=5, batch_size=32)
    test_loss, test_acc = model.evaluate(test_images, test_labels)
    print(f"Test accuracy with learning rate {lr}: {test_acc}")

Training model with learning rate: 0.001
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy with learning rate 0.001: 0.9765999913215637
Training model with learning rate: 0.01
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy with learning rate 0.01: 0.961899995803833
Training model with learning rate: 0.1
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy with learning rate 0.1: 0.4235000014305115


In [6]:
# Task 2: Experiment with Different Batch Sizes
# Define the batch sizes to be experimented
batch_sizes = [32, 64, 128]

for batch_size in batch_sizes:
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)  # Using a fixed learning rate for this experiment
    model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=optimizer,
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    print(f"Training model with batch size: {batch_size}")
    model.fit(train_images, train_labels, epochs=5, batch_size=batch_size)
    test_loss, test_acc = model.evaluate(test_images, test_labels)
    print(f"Test accuracy with batch size {batch_size}: {test_acc}")


Training model with batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy with batch size 32: 0.9663000106811523
Training model with batch size: 64
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy with batch size 64: 0.9699000120162964
Training model with batch size: 128
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy with batch size 128: 0.9708999991416931


# Task 3: Discuss the Trade-offs
**Learning Rate:**

*Small Learning Rate (e.g., 0.001):* Training will be more stable, but it might be slow, especially in finding the global minimum of the loss function.

*Medium Learning Rate (e.g., 0.01):* A balance between stability and speed. It often converges to a good solution without being too slow.

*Large Learning Rate (e.g., 0.1):* Training might be faster, but it could overshoot the minimum and fail to converge or converge to a suboptimal solution.

**Batch Size:**

*Small Batch Size (e.g., 32):* Models converge faster because they are updated more frequently. However, the noise in the parameter updates can cause the model to get stuck in local minima.

*Medium Batch Size (e.g., 64):* A good balance between the advantages of small and large batch sizes. It often leads to stable convergence.

*Large Batch Size (e.g., 128):* Training is faster as updates are less frequent. However, very large batch sizes can lead to convergence to sharp minimizers and can cause generalization issues.