<a href="https://colab.research.google.com/github/Undasnr/DL-ML/blob/main/Ronny_Keras_series_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**1. Sharing and executing the official tutorial model**

I will share the "TensorFlow 2 quickstart for beginners" tutorial, which demonstrates a simple image classification model using Keras and the MNIST dataset.

In [1]:
import tensorflow as tf

# Loading the MNIST dataset
mnist = tf.keras.datasets.mnist

# Splitting the data into training and testing sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalizing the data: Scale pixel values from [0, 255] to [0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

# Building the Keras Sequential model
model = tf.keras.models.Sequential([
    # Reshaping the 28x28 images into a flat 1D array of 784 pixels
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    # Adding a fully connected (Dense) hidden layer with 128 neurons and ReLU activation
    tf.keras.layers.Dense(128, activation='relu'),
    # Adding a Dropout layer to prevent overfitting by randomly dropping 20% of neurons
    tf.keras.layers.Dropout(0.2),
    # Adding the final output layer with 10 neurons (for 10 classes) and Softmax activation
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compiling the model
# Configuring the training process using an optimizer, loss function, and metrics
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Training the model for 5 epochs
model.fit(x_train, y_train, epochs=5)

# Evaluating the model on the test data
model.evaluate(x_test, y_test)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


  super().__init__(**kwargs)


Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 6ms/step - accuracy: 0.8607 - loss: 0.4815
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.9548 - loss: 0.1522
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step - accuracy: 0.9668 - loss: 0.1114
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.9754 - loss: 0.0830
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9760 - loss: 0.0752
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9748 - loss: 0.0886


[0.07403755187988281, 0.9782000184059143]

**2. Executing Various Methods**

**Part 1: Executing a Model**

I've selected the retinanet model from the tensorflow/models/research repository as an example of an advanced, real-world application. RetinaNet is a powerful single-stage object detection model known for its high accuracy. Executing this model directly can be complex due to its dependencies and the large dataset it requires (e.g., COCO or Pascal VOC). Therefore, I'll focus on a code reading and provide a high-level explanation of its components and workflow, as you suggested for cases where direct execution is difficult.

High-Level Code Reading of RetinaNet

The retinanet code is structured to handle various stages of the object detection pipeline.

1. Model Definition: The core of the code defines the RetinaNet architecture. It typically involves:

***Backbone Network: A deep convolutional neural network (e.g., ResNet) that extracts features from the input image. This is a foundational component.

***Feature Pyramid Network (FPN): This component takes the features from the backbone and builds a top-down pathway with lateral connections, creating a feature pyramid with high-level semantic information at all scales.

***Subnets: Two parallel subnets are attached to the FPN. One predicts the class of the object at each location, and the other predicts the bounding box regression (the coordinates of the box).

2. Loss Function: RetinaNet uses a unique loss function called the Focal Loss. This is a key innovation designed to address the problem of class imbalance in object detection (where the number of background locations is far greater than the number of foreground objects). The code implements this loss, which down-weights the loss assigned to well-classified examples, thus focusing training on hard, misclassified examples.

3. Data Input Pipeline: The model uses a robust tf.data pipeline to load and preprocess the massive datasets required for object detection. This includes parsing data formats like TFRecords, augmenting images (e.g., random flips, scaling), and batching.

4. Training and Evaluation Loops: The code includes functions for training the model using a custom training loop, which involves a single pass through the network for both classification and regression tasks, followed by loss calculation and backpropagation.

Part 2: Rewriting Models with Keras

I will rewrite the four specified models using the tf.keras module. I'll use the Sequential API for simplicity, as it's a straightforward way to stack layers for most of these tasks.

2.1 Iris (Binary Classification)

Objective: Classify Iris-versicolor vs. Iris-virginica.

Methodology: I'll use a binary cross-entropy loss function.



In [2]:
# Dataset 1 (Iris Binary Classification)

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from tensorflow import keras

# 1. Data Preparation
iris = load_iris()
X = iris.data[50:]  # Versicolor and Virginica are at indices 50-149
y = iris.target[50:] - 1 # Relabel to 0 and 1 for binary classification

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 2. Model Definition (Sequential API)
model_iris_binary = keras.Sequential([
    keras.layers.Dense(8, activation='relu', input_shape=(X_train.shape[1],)),
    keras.layers.Dense(1, activation='sigmoid')  # Sigmoid for binary classification
])

# 3. Model Compilation
model_iris_binary.compile(optimizer='adam',
                          loss='binary_crossentropy',
                          metrics=['accuracy'])

# 4. Training and Evaluation
model_iris_binary.fit(X_train, y_train, epochs=20, verbose=0)
loss, accuracy = model_iris_binary.evaluate(X_test, y_test, verbose=0)
print(f"Iris (Binary) Test Accuracy: {accuracy:.4f}")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Iris (Binary) Test Accuracy: 0.7667


**Dataset 2: Iris (Multi-value Classification)**

In [3]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from tensorflow import keras

# 1. Data Preparation (all three classes)
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 2. Model Definition
model_iris_multi = keras.Sequential([
    keras.layers.Dense(8, activation='relu', input_shape=(X_train.shape[1],)),
    keras.layers.Dense(3, activation='softmax')  # 3 neurons for 3 classes
])

# 3. Model Compilation
model_iris_multi.compile(optimizer='adam',
                         loss='sparse_categorical_crossentropy',
                         metrics=['accuracy'])

# 4. Training and Evaluation
model_iris_multi.fit(X_train, y_train, epochs=20, verbose=0)
loss, accuracy = model_iris_multi.evaluate(X_test, y_test, verbose=0)
print(f"Iris (Multi-value) Test Accuracy: {accuracy:.4f}")

Iris (Multi-value) Test Accuracy: 0.2889


**Dataset 3: House Prices**

In [18]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_california_housing
from tensorflow import keras

# 1. Data Preparation
# Load the California Housing dataset
housing = fetch_california_housing(as_frame=True)
X, y = housing.data, housing.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Data Scaling (Crucial for regression)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 2. Model Definition
model_house_prices = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(1) # Single neuron for regression output
])

# 3. Model Compilation
model_house_prices.compile(optimizer='rmsprop',
                           loss='mse',
                           metrics=['mae']) # Mean Absolute Error as a metric

# 4. Training and Evaluation
model_house_prices.fit(X_train_scaled, y_train, epochs=20, verbose=0)
loss, mae = model_house_prices.evaluate(X_test_scaled, y_test, verbose=0)
print(f"House Prices Test MAE: {mae:.4f}")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


House Prices Test MAE: 0.3733


**Dataset 4: MNIST (Image Classification)**

In [5]:
import tensorflow as tf
from tensorflow import keras

# 1. Data Preparation
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# 2. Model Definition
model_mnist = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10, activation='softmax')
])

# 3. Model Compilation
model_mnist.compile(optimizer='adam',
                    loss='sparse_categorical_crossentropy',
                    metrics=['accuracy'])

# 4. Training and Evaluation
model_mnist.fit(x_train, y_train, epochs=5, verbose=0)
loss, accuracy = model_mnist.evaluate(x_test, y_test, verbose=0)
print(f"MNIST Test Accuracy: {accuracy:.4f}")

  super().__init__(**kwargs)


MNIST Test Accuracy: 0.9792


**3. Learning Iris (binary classification) with Keras**

In [6]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from tensorflow import keras

# Loading the Iris dataset
iris = load_iris()

# Only extracting the data for Iris-versicolor and Iris-virginica, which are the last 100 samples from the dataset.
X = iris.data[50:]
y = iris.target[50:]

# Relabeling the targets from 1 and 2 to 0 and 1 for binary classification compatibility
y = y - 1

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print("Training data shape:", X_train.shape)
print("Training labels shape:", y_train.shape)

Training data shape: (70, 4)
Training labels shape: (70,)


**Model Definition (Using Keras Sequential API to build a simple neural network)**

In [7]:
# Defining the model architecture
model = keras.Sequential([
    # Input layer with 8 neurons and ReLU activation.
    # The input_shape is specified only for the first layer.
    keras.layers.Dense(8, activation='relu', input_shape=(X_train.shape[1],)),

    # Output layer with a single neuron and a sigmoid activation function.
    # Sigmoid squashes the output to a value between 0 and 1,
    keras.layers.Dense(1, activation='sigmoid')
])

# Displaying the model summary to see the architecture
model.summary()

**Model Compilation (configuring the model for training)**

In [8]:
# Compile the model
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

**Training and Evaluation (training the model on the training data and evaluating it on the test data)**

In [9]:
# Training the model
model.fit(X_train, y_train, epochs=20, verbose=0)

# Evaluating the model on the test data
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)

print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

Test Loss: 0.7710
Test Accuracy: 0.5667


**4. Learn Iris (multi-level classification) with Keras**

In [10]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from tensorflow import keras

# Loading the full Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print("Training data shape:", X_train.shape)
print("Training labels shape:", y_train.shape)

Training data shape: (105, 4)
Training labels shape: (105,)


**Model Definition**

In [11]:
# Defining the model architecture
model = keras.Sequential([
    # Input layer with 8 neurons and ReLU activation
    keras.layers.Dense(8, activation='relu', input_shape=(X_train.shape[1],)),

    # Output layer with 3 neurons (one for each class) and a softmax activation function.
    # Softmax converts the output into a probability distribution over the classes.
    keras.layers.Dense(3, activation='softmax')
])

# Displaying the model summary
model.summary()

**Model Compilation**

In [12]:
# Compiling the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

**Training and Evaluation**

In [13]:
# Training the model
model.fit(X_train, y_train, epochs=20, verbose=0)

# Evaluating the model on the test data
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)

print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

Test Loss: 0.7467
Test Accuracy: 0.6444


**5. Learning House Prices with Keras**

In [19]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_california_housing
from tensorflow import keras

# Load the California Housing dataset
housing = fetch_california_housing(as_frame=True)
X, y = housing.data, housing.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Data Scaling (Crucial for regression)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Scaled training data shape:", X_train_scaled.shape)
print("Scaled training labels shape:", y_train.shape)

Scaled training data shape: (16512, 8)
Scaled training labels shape: (16512,)


**Model Definition**

In [15]:
# Defining the model architecture using the Sequential API
model = keras.Sequential([
    # Hidden layer with 64 neurons and ReLU activation
    keras.layers.Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),

    # Another hidden layer to learn more complex patterns
    keras.layers.Dense(64, activation='relu'),

    # The final output layer has a single neuron. No activation function is used
    keras.layers.Dense(1)
])

# Displaying the model summary
model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


**Model Compilation**

In [16]:
# Compiling the model
model.compile(optimizer='rmsprop',
              loss='mse',  # Mean Squared Error for regression
              metrics=['mae']) # Mean Absolute Error for easier interpretation

**Training and Evaluation**

In [17]:
# Training the model
model.fit(X_train_scaled, y_train, epochs=20, verbose=0)

# Evaluating the model on the test data
loss, mae = model.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Test Loss (MSE): {loss:.4f}")
print(f"Test MAE: {mae:.4f}")

Test Loss (MSE): 21.9519
Test MAE: 3.3538


**6. Learning MNIST with Keras**

In [20]:
import tensorflow as tf
from tensorflow import keras
import numpy as np

# Loading the MNIST dataset
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalizing the data: Scale pixel values from [0, 255] to [0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

print("Training data shape:", x_train.shape)
print("Training labels shape:", y_train.shape)

Training data shape: (60000, 28, 28)
Training labels shape: (60000,)


**Model Definition**

In [21]:
# Defining the model architecture
model = keras.Sequential([
    # Flatten the 28x28 image into a 1D array of 784 pixels
    keras.layers.Flatten(input_shape=(28, 28)),

    # A fully connected hidden layer with 128 neurons and ReLU activation
    keras.layers.Dense(128, activation='relu'),

    # Dropout layer to reduce overfitting
    keras.layers.Dropout(0.2),

    # The output layer with 10 neurons (for 10 classes) and softmax activation
    keras.layers.Dense(10, activation='softmax')
])

# Displaying the model summary
model.summary()

  super().__init__(**kwargs)


**Model Compilation**

In [22]:
# Compiling the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

**Training and Evaluation**

In [23]:
# Training the model for 5 epochs
model.fit(x_train, y_train, epochs=5, verbose=0)

# Evaluating the model on the test data
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)

print(f"\nTest Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")


Test Loss: 0.0714
Test Accuracy: 0.9761


**7. Rewriting to PyTorch**

In [24]:
# Iris (Binary Classification)
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 1. Data Preparation
iris = load_iris()
X = iris.data[50:]
y = iris.target[50:] - 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scaling data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Converting to PyTorch tensors and creating DataLoader
X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# 2. Model Definition
class BinaryClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(4, 8)
        self.fc2 = nn.Linear(8, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

model = BinaryClassifier()

# 3. Loss, Optimizer, and Training
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

num_epochs = 20
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

# 4. Evaluation
with torch.no_grad():
    model.eval()
    test_outputs = model(X_test_tensor)
    predicted_classes = (torch.sigmoid(test_outputs) > 0.5).float()
    accuracy = (predicted_classes.squeeze() == y_test_tensor.squeeze()).float().mean()
    print(f"Iris (Binary) Test Accuracy: {accuracy.item():.4f}")

Iris (Binary) Test Accuracy: 0.8667


**Iris (Multi-class Classification)**

In [25]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 1. Data Preparation
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scaling data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Converting to PyTorch tensors and creating DataLoader
X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# 2. Model Definition
class MultiClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(4, 8)
        self.fc2 = nn.Linear(8, 3) # 3 neurons for 3 classes

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

model = MultiClassifier()

# 3. Loss, Optimizer, and Training
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

num_epochs = 20
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

# 4. Evaluation
with torch.no_grad():
    model.eval()
    test_outputs = model(X_test_tensor)
    _, predicted_classes = torch.max(test_outputs, 1)
    accuracy = (predicted_classes == y_test_tensor).float().mean()
    print(f"Iris (Multi-class) Test Accuracy: {accuracy.item():.4f}")

Iris (Multi-class) Test Accuracy: 1.0000


**House Prices (Regression)**

In [27]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 1. Data Preparation
housing = fetch_california_housing(as_frame=True)
X, y = housing.data, housing.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scaling data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Converting to PyTorch tensors and creating DataLoader
# Converting to PyTorch tensors and creating DataLoader
X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).unsqueeze(1)
X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).unsqueeze(1)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# 2. Model Definition
class RegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(X_train.shape[1], 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 1) # Single output for regression

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

model = RegressionModel()

# 3. Loss, Optimizer, and Training
criterion = nn.MSELoss()
optimizer = optim.RMSprop(model.parameters(), lr=0.001)

num_epochs = 20
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

# 4. Evaluation
with torch.no_grad():
    model.eval()
    test_outputs = model(X_test_tensor)
    mae = torch.mean(torch.abs(test_outputs - y_test_tensor))
    print(f"House Prices Test MAE: {mae.item():.4f}")

House Prices Test MAE: 0.3732


**MNIST (Multi-class Classification)**

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# 1. Data Preparation
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# 2. Model Definition (using a CNN)
class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = nn.functional.relu(x)
        x = self.conv2(x)
        x = nn.functional.relu(x)
        x = nn.functional.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = nn.functional.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        return x

model = CNN()

# 3. Loss, Optimizer, and Training
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

epochs = 5
for epoch in range(epochs):
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

# 4. Evaluation
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
    for data, target in test_loader:
        output = model(data)
        test_loss += criterion(output, target).item()
        pred = output.argmax(dim=1, keepdim=True)
        correct += pred.eq(target.view_as(pred)).sum().item()

test_loss /= len(test_loader.dataset)
print(f"\nMNIST Test Accuracy: {100. * correct / len(test_loader.dataset):.2f}%")

100%|██████████| 9.91M/9.91M [00:00<00:00, 58.1MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.76MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 14.6MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 6.90MB/s]


**8. Comparison of frameworks**

1. Calculation Speed

Keras: When running on top of TensorFlow, Keras utilizes a computational graph that is built and optimized before execution. This static graph approach allows for powerful optimizations and is very efficient for deployment on a variety of hardware, including mobile and production servers. However, it can sometimes be less flexible for research and rapid prototyping where you need to change the model's structure on the fly.

PyTorch: PyTorch uses a dynamic computational graph, which is built on the fly as the code is executed. This "eager execution" approach makes it behave more like standard Python, simplifying debugging and allowing for more flexibility. For basic operations, there's little difference in speed, but the dynamic nature of PyTorch can make it a favorite for research. Recently, PyTorch has added features like TorchScript to compile models into a static graph for performance optimization and deployment, bridging the gap with TensorFlow's approach.

2. Number of Lines of Code and Readability

Keras: As a high-level API, Keras is renowned for its simplicity and minimalism. You can define and train a complex model in just a few lines of code. The automated training loop (model.fit()) abstracts away much of the boilerplate, making it exceptionally easy to read and understand for beginners. This abstraction, however, can sometimes obscure what's happening under the hood.

PyTorch: PyTorch is a lower-level, more verbose framework. You must manually define the model as a class, create a Dataset and DataLoader, and write the entire training loop yourself. While this results in more lines of code, it provides explicit control over every part of the process, which is highly valued by researchers and advanced users. The code is very readable for those familiar with object-oriented programming in Python.

3. Functions Provided

Keras: Keras provides a simplified, user-friendly set of tools focused on rapid prototyping and deployment. Its core functions are designed to streamline the common deep learning workflow:

>Sequential and Functional APIs: For building models in a simple, intuitive way.

>Pre-built layers: A wide range of pre-configured layers (Dense, Conv2D, LSTM, etc.).

>Built-in functions: Standardized loss functions, optimizers, and metrics that can be specified by a simple string (e.g., 'adam', 'mse').

>model.fit(): An all-in-one function for training with automatic handling of batches, epochs, and progress tracking.

PyTorch: PyTorch offers a more comprehensive and flexible set of tools for research and fine-grained control. Its key features include:

>torch.nn.Module: The foundational class for building models, providing a flexible, object-oriented approach.

>Dynamic computational graph: Enables complex, dynamic models and makes debugging much easier.

>torch.utils.data: Powerful tools for creating custom datasets and iterators (DataLoader), which are ideal for large or non-standard datasets.

>Extensive library: A vast ecosystem of libraries like torchvision (for computer vision), torchaudio (for audio), and torchtext (for NLP).

>Manual control: You have direct access to every component of the training loop, including gradient calculations, which is essential for advanced research.