# Pytorch vs Tensorflow - ANN

Choosing Pytorch vs Tensorflow, which should you use? Which is better and what are the differences between these two frameworks both made for deep learning. Well in this series I am going to compare them side by side so you could see the differences and simliarities clearly, in code with crips examples.

Also, when I switched from Tensorflow to pytorch, I just wished I had a "**Rosetta Stone**" to translate between frameworks, which I hope this series will be for you. A nice overview that practitioners to leverage the strengths of both libraries.

## In this notebook
In the first notebook of this series, I will start with the basics by creating a simple Artificial Neural Network (ANN) in both PyTorch and TensorFlow. We will train the ANN on the MNIST dataset, which consists of handwritten digit images, to illustrate fundamental concepts and workflows in each framework. This foundational exercise will set the stage for more complex models and comparisons in subsequent notebooks.

For both PyTorch and TensorFlow, the steps for training a model are essentially the same:

1. Importing the necessary libraries.
2. Defining the model parameters and hyperparameters.
3. Loading the dataset.
4. Preprocessing the data to make it suitable for training.
5. Initializing the model architecture.
6. Training the model using the defined parameters and dataset.

These steps form the core workflow in building and training deep learning models, regardless of the framework used.

In [1]:
import numpy as np
import pandas as pd

In [2]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Plotting the training metrics with Plotly
def plot_tensorflow_training_results(history):
    fig = make_subplots(rows=1, cols=2, subplot_titles=("Loss Over Epochs", "Accuracy Over Epochs"))

    # Loss plot
    fig.add_trace(go.Scatter(x=list(range(len(history.history['loss']))),
                             y=history.history['loss'],
                             mode='lines',
                             name='Training Loss'),
                  row=1, col=1)
    fig.add_trace(go.Scatter(x=list(range(len(history.history['val_loss']))),
                             y=history.history['val_loss'],
                             mode='lines',
                             name='Validation Loss'),
                  row=1, col=1)

    # Accuracy plot
    fig.add_trace(go.Scatter(x=list(range(len(history.history['accuracy']))),
                             y=history.history['accuracy'],
                             mode='lines',
                             name='Training Accuracy'),
                  row=1, col=2)
    fig.add_trace(go.Scatter(x=list(range(len(history.history['val_accuracy']))),
                             y=history.history['val_accuracy'],
                             mode='lines',
                             name='Validation Accuracy'),
                  row=1, col=2)

    # Updating layout
    fig.update_layout(title='Training Metrics',
                      xaxis_title='Epoch',
                      yaxis_title='Value',
                      showlegend=True)

    # Update xaxis labels
    fig.update_xaxes(title_text='Epoch', row=1, col=1)
    fig.update_xaxes(title_text='Epoch', row=1, col=2)

    # Update yaxis labels
    fig.update_yaxes(title_text='Loss', row=1, col=1)
    fig.update_yaxes(title_text='Accuracy', row=1, col=2)

    fig.show()
    

In [3]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

def plot_pytorch_training_results(n_epochs, train_losses, test_losses, train_accuracies, test_accuracies):
    # Plotting the training metrics with Plotly
    fig = make_subplots(rows=1, cols=2, subplot_titles=("Loss Over Epochs", "Accuracy Over Epochs"))

    # Loss plot
    fig.add_trace(go.Scatter(x=list(range(n_epochs)), y=train_losses, mode='lines', name='Training Loss'), row=1, col=1)
    fig.add_trace(go.Scatter(x=list(range(n_epochs)), y=test_losses, mode='lines', name='Validation Loss'), row=1, col=1)

    # Accuracy plot
    fig.add_trace(go.Scatter(x=list(range(n_epochs)), y=train_accuracies, mode='lines', name='Training Accuracy'), row=1, col=2)
    fig.add_trace(go.Scatter(x=list(range(n_epochs)), y=test_accuracies, mode='lines', name='Validation Accuracy'), row=1, col=2)

    # Updating layout
    fig.update_layout(title='Training Metrics', xaxis_title='Epoch', yaxis_title='Value', showlegend=True)

    # Update xaxis labels
    fig.update_xaxes(title_text='Epoch', row=1, col=1)
    fig.update_xaxes(title_text='Epoch', row=1, col=2)

    # Update yaxis labels
    fig.update_yaxes(title_text='Loss', row=1, col=1)
    fig.update_yaxes(title_text='Accuracy', row=1, col=2)

    fig.show()

In [4]:
pixel_columns = [f"pixel{i}" for i in range(784)]
all_df = pd.read_csv("/kaggle/input/digit-recognizer/train.csv")

indices = np.arange(len(all_df))
np.random.shuffle(indices)

# Split the indices into 80% train and 20% validation
trn_len = int(0.8 * len(all_df))
trn_ind = indices[:trn_len]
val_ind = indices[trn_len:]

trn_df = all_df.iloc[trn_ind]
val_df = all_df.iloc[val_ind]
x_trn = trn_df[pixel_columns].values
y_trn = trn_df["label"].values

x_val = val_df[pixel_columns].values
y_val = val_df["label"].values

tst_df = pd.read_csv("/kaggle/input/digit-recognizer/test.csv")
x_tst = tst_df[pixel_columns].values

In [5]:
#1. Importing the necessary libraries.
import tensorflow as tf
import numpy as np

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import CategoricalCrossentropy

from tensorflow.keras.datasets import mnist

# 2. Defining the model parameters and hyperparameters.
hidden_1 = 64
hidden_2 = 64
n_classes = 10
n_epochs = 10
batch_size = 32
img_size = 28
img_size_flattened = img_size * img_size
learning_rate = 0.001
n_classes = 10

x_trn = x_trn/255.0
x_val = x_val/255.0
x_tst = x_tst/255.0

y_trn_encoded = np.zeros(shape=(y_trn.shape[0], n_classes), dtype=int)
y_val_encoded = np.zeros(shape=(y_val.shape[0], n_classes), dtype=int)

y_trn_encoded[np.linspace(0, len(y_trn)-1, len(y_trn)).astype(int), y_trn]=1
y_val_encoded[np.linspace(0, len(y_val)-1, len(y_val)).astype(int), y_val]=1

# 5. Initializing the model architecture.
model = Sequential([
    Dense(units=hidden_1, activation="relu"),
    Dense(units=hidden_2, activation="relu"),
    Dense(units=n_classes, activation="softmax")
])
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate), loss=CategoricalCrossentropy(), 
    metrics=["accuracy"])

# 6. Training the model using the defined parameters and dataset.
callback = model.fit(
    x=x_trn, 
    y=y_trn_encoded, 
    validation_data=(x_val, y_val_encoded), 
    batch_size=batch_size, 
    epochs=n_epochs)

Epoch 1/10
[1m1050/1050[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8260 - loss: 0.6047 - val_accuracy: 0.9407 - val_loss: 0.2014
Epoch 2/10
[1m1050/1050[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9492 - loss: 0.1701 - val_accuracy: 0.9537 - val_loss: 0.1520
Epoch 3/10
[1m1050/1050[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.9669 - loss: 0.1136 - val_accuracy: 0.9626 - val_loss: 0.1176
Epoch 4/10
[1m1050/1050[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.9732 - loss: 0.0876 - val_accuracy: 0.9568 - val_loss: 0.1434
Epoch 5/10
[1m1050/1050[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.9767 - loss: 0.0748 - val_accuracy: 0.9680 - val_loss: 0.1047
Epoch 6/10
[1m1050/1050[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.9815 - loss: 0.0586 - val_accuracy: 0.9689 - val_loss: 0.0979
Epoch 7/10
[1m1

In [6]:
plot_tensorflow_training_results(history=callback)

In [7]:
y_prd = model.predict(x_tst)
y_prd_labels = np.argmax(y_prd, axis=1)

[1m875/875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step


In [8]:
tst_df["Label"] = y_prd_labels
tst_df = tst_df.reset_index(names="ImageId")
tst_df[["ImageId", "Label"]].to_csv("tensorflow-prediction.csv", index=False, header=True)

In [9]:
# 1. Importing the necessary libraries.
import os
import torch
import torchvision
from tqdm import tqdm
import torch.nn as nn
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots


class NumpyDataset(torch.utils.data.Dataset):
    def __init__(self, data, labels=None, train=True):
        self.data = torch.from_numpy(data).float()
        if train:
            self.labels = torch.from_numpy(labels).long()

        self.train = train

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sample = self.data[idx]
        if self.train:
            return sample, self.labels[idx]
        return sample

# 3. Loading the dataset.
mnist_dataset_trn = NumpyDataset(data=x_trn, labels=y_trn, train=True)
mnist_dataset_val = NumpyDataset(data=x_val, labels=y_val, train=True)
mnist_dataset_tst = NumpyDataset(data=x_tst, labels=None,  train=False)

# 4. Preprocessing the data to make it suitable for training.
trn_loader = torch.utils.data.DataLoader(
    dataset=mnist_dataset_trn, 
    batch_size=batch_size, 
    shuffle=True)
val_loader = torch.utils.data.DataLoader(
    dataset=mnist_dataset_val, 
    batch_size=batch_size, 
    shuffle=True)
tst_loader = torch.utils.data.DataLoader(
    dataset=mnist_dataset_tst, 
    batch_size=batch_size, 
    shuffle=False)

# 5. Initializing the model architecture.
class ANN(nn.Module):
    def __init__(self):
        super(ANN, self).__init__()
        self.fc1 = nn.Linear(img_size_flattened, hidden_1)
        self.fc2 = nn.Linear(hidden_1, hidden_2)
        self.out = nn.Linear(hidden_2, n_classes)
    
    def forward(self, x):
        x = x.reshape(-1, img_size_flattened)
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = self.out(x)
        return x

model = ANN()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
loss_function = nn.CrossEntropyLoss()

# Tracking metrics
train_losses = []
test_losses = []
train_accuracies = []
test_accuracies = []

# 6. Training the model using the defined parameters and dataset.
for epoch in range(n_epochs):
    model.train()
    running_loss = 0
    correct_train = 0
    total_train = 0

    for batch_idx, (x_trn_batch, y_trn_batch) in tqdm(enumerate(trn_loader)):
        optimizer.zero_grad()
        y_pred_batch = model(x_trn_batch)
        loss = loss_function(y_pred_batch, y_trn_batch)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = torch.max(y_pred_batch, 1)
        correct_train += (predicted == y_trn_batch).sum().item()
        total_train += y_trn_batch.size(0)

    train_losses.append(running_loss / len(trn_loader))
    train_accuracies.append(correct_train / len(x_trn))

    model.eval()
    test_loss = 0
    correct_test = 0
    total_test = 0

    with torch.no_grad():
        for x_val_batch, y_val_batch in val_loader:
            y_pred_batch = model(x_val_batch)
            loss = loss_function(y_pred_batch, y_val_batch)
            test_loss += loss.item()
            _, predicted = torch.max(y_pred_batch, 1)
            correct_test += (predicted == y_val_batch).sum().item()
            total_test += y_val_batch.size(0)

    test_losses.append(test_loss / len(val_loader))
    test_accuracies.append(correct_test / len(x_val))

    print(f'Epoch {epoch+1}/{n_epochs}, Train Loss: {train_losses[-1]:.4f}, Train Accuracy: {train_accuracies[-1]:.4f}, Test Loss: {test_losses[-1]:.4f}, Test Accuracy: {test_accuracies[-1]:.4f}')


plot_pytorch_training_results(n_epochs, train_losses, test_losses, train_accuracies, test_accuracies)

1050it [00:03, 294.94it/s]


Epoch 1/10, Train Loss: 0.4150, Train Accuracy: 0.8829, Test Loss: 0.2133, Test Accuracy: 0.9358


1050it [00:03, 320.37it/s]


Epoch 2/10, Train Loss: 0.1812, Train Accuracy: 0.9464, Test Loss: 0.1648, Test Accuracy: 0.9477


1050it [00:03, 304.90it/s]


Epoch 3/10, Train Loss: 0.1307, Train Accuracy: 0.9607, Test Loss: 0.1293, Test Accuracy: 0.9580


1050it [00:03, 304.77it/s]


Epoch 4/10, Train Loss: 0.1015, Train Accuracy: 0.9689, Test Loss: 0.1202, Test Accuracy: 0.9617


1050it [00:03, 302.35it/s]


Epoch 5/10, Train Loss: 0.0798, Train Accuracy: 0.9757, Test Loss: 0.1295, Test Accuracy: 0.9612


1050it [00:03, 316.42it/s]


Epoch 6/10, Train Loss: 0.0655, Train Accuracy: 0.9790, Test Loss: 0.1285, Test Accuracy: 0.9618


1050it [00:03, 304.34it/s]


Epoch 7/10, Train Loss: 0.0556, Train Accuracy: 0.9820, Test Loss: 0.1127, Test Accuracy: 0.9661


1050it [00:03, 301.63it/s]


Epoch 8/10, Train Loss: 0.0484, Train Accuracy: 0.9848, Test Loss: 0.1290, Test Accuracy: 0.9646


1050it [00:03, 312.56it/s]


Epoch 9/10, Train Loss: 0.0381, Train Accuracy: 0.9876, Test Loss: 0.1240, Test Accuracy: 0.9662


1050it [00:03, 298.71it/s]


Epoch 10/10, Train Loss: 0.0332, Train Accuracy: 0.9898, Test Loss: 0.1240, Test Accuracy: 0.9686


In [10]:
y_pred_tot = []
with torch.no_grad():
    for x_tst_batch in tst_loader:
        y_pred_batch = model(x_val_batch)
        _, pred = torch.max(y_pred_batch, 1)
        y_pred_tot += list(pred.numpy())

In [11]:
tst_df["Label"] = y_prd_labels
tst_df[["ImageId", "Label"]].to_csv("pytorch-prediction.csv", index=False, header=True)