## Trasnlation from `keras` to `pytorch` by ChatGPT, **without improvements**.

- FFNN plain
- FFNN improved
- CNN improved
- RNN improved

### FFNN plain

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [None]:
# Hyperparameters
batch_size = 128
num_classes = 10
epochs = 20
learning_rate = 0.001

In [None]:
# Data Loading
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

In [None]:
# Neural Network Definition
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 512)
        self.fc2 = nn.Linear(512, 512)
        self.fc3 = nn.Linear(512, num_classes)
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        x = x.view(-1, 784)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        return F.log_softmax(x, dim=1)

model = Net()
optimizer = optim.RMSprop(model.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

In [None]:
# Training
for epoch in range(epochs):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
    # Validation
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            test_loss += criterion(output, target).item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print('Epoch: {}, Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)'.format(
        epoch, test_loss, correct, len(test_loader.dataset), 100. * correct / len(test_loader)))

In [1]:
# Final Evaluation
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
    for data, target in test_loader:
        output = model(data)
        test_loss += criterion(output, target).item()  # sum up batch loss
        pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()

test_loss /= len(test_loader.dataset)

print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{}')

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 121145232.64it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 89863274.35it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 29753191.75it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 16408724.18it/s]


Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

Epoch: 0, Test set: Average loss: 0.0013, Accuracy: 9491/10000 (12014%)
Epoch: 1, Test set: Average loss: 0.0011, Accuracy: 9569/10000 (12113%)
Epoch: 2, Test set: Average loss: 0.0009, Accuracy: 9644/10000 (12208%)
Epoch: 3, Test set: Average loss: 0.0008, Accuracy: 9670/10000 (12241%)
Epoch: 4, Test set: Average loss: 0.0008, Accuracy: 9696/10000 (12273%)
Epoch: 5, Test set: Average loss: 0.0007, Accuracy: 9739/10000 (12328%)
Epoch: 6, Test set: Average loss: 0.0007, Accuracy: 9757/10000 (12351%)
Epoch: 7, Test set: Average loss: 0.0006, Accuracy: 9770/10000 (12367%)
Epoch: 8, Test set: Average loss: 0.0006, Accuracy: 9774/10000 (12372%)
Epoch: 9, Test set: Average loss: 0.0006, Accuracy: 9792/10000 (12395%)
Epoch: 10, Test set: Average loss: 0.0006, Accuracy: 9787/10000 (12389%)
Epoch: 11, Test set: Average loss: 0.0006, Accuracy: 9772/10000 (12370%)
Epoch: 12, Test set: Average loss: 0.0007, Accuracy: 9767/1

### FFNN improved

Recommendations by chatgpt that has not been reflected in the code:

- Batch normalization
- Learning rate scheduling
- Advanced optimizers (AdamW, for example, which combines weight decay and the Adam optimizer)
- GPU training (Note: If you have a CUDA-enabled GPU, make sure to run this on GPU for faster execution)
- Dataset normalization (already included in the previous code)

----
Firstly:
- Batch Normalization: One potential improvement could be to introduce batch normalization layers. Batch normalization can improve the training process's speed and stability.

- Regularization: In the Keras code, dropout is used as a regularization technique. While this is often effective, one could also consider L1 or L2 weight regularization or a combination thereof.

- Learning Rate Scheduling: PyTorch provides easy-to-use learning rate schedulers that can adjust the learning rate during training. This could be useful if the loss plateaus.

- Advanced Optimizers: The initial code used the RMSprop optimizer. While this is a solid choice, one could experiment with other optimizers like Adam or AdamW, which might provide better results for some problems.

- Dataset Normalization: In the PyTorch code, I've added normalization to the MNIST dataset using transforms.Normalize((0.5,), (0.5,)). This makes the input values range from -1 to 1 which can aid in training.

- GPU Training: The PyTorch code I provided runs on the CPU by default. For faster training, it's recommended to move computations to a GPU using model.to('cuda') and similarly for the data batches.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from tqdm import tqdm # To view a progress bar during training on each epoch

# Check for GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Hyperparameters
batch_size = 128
num_classes = 10
epochs = 20
learning_rate = 0.001
weight_decay = 1e-5  # for L2 regularization

# Data Loading with normalization
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Neural Network Definition with BatchNorm
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 512)
        self.fc2 = nn.Linear(512, 512)
        self.fc3 = nn.Linear(512, num_classes)
        self.dropout = nn.Dropout(0.2)
        self.bn1 = nn.BatchNorm1d(512)
        self.bn2 = nn.BatchNorm1d(512)

    def forward(self, x):
        x = x.view(-1, 784)
        x = F.relu(self.bn1(self.fc1(x)))
        x = self.dropout(x)
        x = F.relu(self.bn2(self.fc2(x)))
        x = self.dropout(x)
        x = self.fc3(x)
        return F.log_softmax(x, dim=1)

model = Net().to(device)  # Move model to GPU if available
optimizer = optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
criterion = nn.CrossEntropyLoss()

# Learning rate scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

# Training
for epoch in range(epochs):
    model.train()
    total_loss = 0
    progress_bar = tqdm(enumerate(train_loader), total=len(train_loader), desc="Epoch {}".format(epoch))
    for batch_idx, (data, target) in progress_bar:
        data, target = data.to(device), target.to(device)  # Move data to GPU if available
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(total_loss/(batch_idx+1))})

    scheduler.step()  # Step the learning rate scheduler

    # Validation
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)  # Move data to GPU if available
            output = model(data)
            test_loss += criterion(output, target).item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print('Epoch: {}, Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)'.format(
        epoch, test_loss, correct, len(test_loader.dataset), 100. * correct / len(test_loader.dataset)))
