##### Getting Started with Torch & TorchVision

This is notebook is designed to help you get started with PyTorch and TorchVision, two powerful libraries for deep learning and computer vision tasks.
In this notebook, we will cover the following topics:
1. Installation: How to install PyTorch and TorchVision.
2. Building Model: How to build a simple neural network model using PyTorch.
3. Loading Data: How to load and preprocess datasets using TorchVision.
4. Training: How to train the model on a dataset.
5. Techniques: How to apply various techniques to improve model performance.
6. Evaluation: How to evaluate the model's performance.

Notice: In the end of notebook you will build model MobileNetV2 and train it in CIFAR10 dataset.

##### Install required dependencies

```bash
pip install torch torchvision
pip install torchinfo # for summarizing models
```

You can install in cell by running the following command:

```python
# !pip install torch torchvision
# !pip install torchinfo
```

##### Building Simple Model

In the section, we will build a simple neural network model using PyTorch and training for MNIST dataset.

So, let's start by importing the necessary libraries and modules.
```python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
import torchinfo
```

In [32]:
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchinfo import summary
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader

##### Define the Simple Model

With MNIST dataset, we will create a simple model with architecture like this:
1. Input Layer: 28x28 pixels (MNIST images)
2. Flatten Layer: Converts the 2D image into a 1D vector -> 784 features
3. We will use two fully connected layers:
    - First Layer: 512 neurons with ReLU activation
    - Second Layer: 128 neurons with ReLU activation
4. Output Layer: 10 neurons (one for each digit 0-9) with Softmax activation

Okay, so let's define the model class with pytorch

In [33]:
class SimpleNN(nn.Module):
    def __init__(self, num_classes) -> None:
        super(SimpleNN, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(784, 512)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(512, 128)
        self.relu2 = nn.ReLU()

        self.fc3 = nn.Linear(128, num_classes)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        out = self.flatten(x)
        out = self.fc1(out)
        out = self.relu1(out)
        out = self.fc2(out)
        out = self.relu1(out)
        out = self.fc3(out)
        return out

##### Model Summary

In [34]:
simple_model = SimpleNN(num_classes=10)
summary(simple_model, input_size=(1, 28, 28))

Layer (type:depth-idx)                   Output Shape              Param #
SimpleNN                                 [1, 10]                   --
├─Flatten: 1-1                           [1, 784]                  --
├─Linear: 1-2                            [1, 512]                  401,920
├─ReLU: 1-3                              [1, 512]                  --
├─Linear: 1-4                            [1, 128]                  65,664
├─ReLU: 1-5                              [1, 128]                  --
├─Linear: 1-6                            [1, 10]                   1,290
Total params: 468,874
Trainable params: 468,874
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.47
Input size (MB): 0.00
Forward/backward pass size (MB): 0.01
Params size (MB): 1.88
Estimated Total Size (MB): 1.88

##### Loading MNIST Dataset

In [35]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))  # Mean and std deviation for MNIST
])

dataset = "/Users/hinsun/Workspace/ComputerScience/DeepLearning/data"

train_batch_size = 64
test_batch_size = 128

train_dataset = MNIST(root=dataset, train=True, download=True, transform=transform)
test_dataset = MNIST(root=dataset, train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=train_batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=test_batch_size, shuffle=False)

print(f"Train X shape: {train_dataset.data.shape} y shape: {train_dataset.targets.shape}")
print(f"Test X shape: {test_dataset.data.shape} y shape: {test_dataset.targets.shape}")

Train X shape: torch.Size([60000, 28, 28]) y shape: torch.Size([60000])
Test X shape: torch.Size([10000, 28, 28]) y shape: torch.Size([10000])


##### Training the Model

Before training the model, we will cover about device in PyTorch.
In PyTorch, you can specify the device (CPU or GPU) on which you want to run your computations. This is done using the `torch.device` function. If a GPU is available, you can use it to speed up training and inference.

Notice: In MacOS, you can use MPS (Metal Performance Shaders) for GPU acceleration.

In [36]:
from sys import platform

# You can check if a GPU is available and set the device accordingly
# Because we are using macOS, we will use MPS (Metal Performance Shaders) for GPU acceleration

if platform == "darwin":
    device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
else:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(f"Device: {device}")

Device: mps


Before training the model, we need to define the loss function and optimizer.

In [37]:
# Loss function for multi-class classification
# Notice: You can see in last layer of model, we not use Softmax activation, because CrossEntropyLoss already applies Softmax internally. If you remember, when we use Softmax and CrossEntropyLoss together, so when gradient backpropagation, it will be calculated simply with formula: A - Y, where A is the output of the model and Y is the target label.
criterion = nn.CrossEntropyLoss()

# Learning rate for the optimizer
lr = 0.001

# Momentum technique to accelerate SGD in the relevant direction and dampen oscillations
momentum = 0.9

# Nesterov momentum, which is a variant of momentum that looks ahead to the next position
nesterov = True

# Optimizer for updating model weights
optimizer = optim.SGD(simple_model.parameters(), lr=lr, momentum=momentum, nesterov=nesterov)

Also, I will define a function to calculate the accuracy when training and testing the model.

In [38]:
def calculate_accuracy(model, loader, device):
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0

    with torch.no_grad():
        for X, y in loader:
            X, y = X.to(device), y.to(device)  # Move data to the device
            outputs = model(X)

            _, predicted = torch.max(outputs.data, 1)  # Get the index of the max log-probability
            total += y.size(0)  # Total number of samples
            correct += (predicted == y).sum().item()  # Count correct predictions

    accuracy = 100 * correct / total  # Calculate accuracy
    model.train()  # Set the model back to training mode
    return accuracy

Let's train the model for 10 epochs. In each epoch, we will iterate over the training dataset, compute the loss, and update the model weights using the optimizer.

In [39]:
simple_model = simple_model.to(device)

epochs = 10
for epoch in range(epochs):
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(train_loader):
        # Remember convert inputs and labels to the device (GPU or CPU)
        inputs, labels = inputs.to(device), labels.to(device)

        # Forward pass
        outputs = simple_model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if (i + 1) % 100 == 0:  # Print every 100 mini-batches
            print(f"Epoch [{epoch + 1}/{epochs}], Step [{i + 1}/{len(train_loader)}], Loss: {running_loss / 100:.4f}")
            running_loss = 0.0

    # Calculate accuracy after each epoch
    train_accuracy = calculate_accuracy(simple_model, train_loader, device)
    print(f"Epoch [{epoch + 1}/{epochs}], Train Accuracy: {train_accuracy:.2f}%")

Epoch [1/10], Step [100/938], Loss: 2.1597
Epoch [1/10], Step [200/938], Loss: 1.5584
Epoch [1/10], Step [300/938], Loss: 0.9138
Epoch [1/10], Step [400/938], Loss: 0.6326
Epoch [1/10], Step [500/938], Loss: 0.5182
Epoch [1/10], Step [600/938], Loss: 0.4623
Epoch [1/10], Step [700/938], Loss: 0.4132
Epoch [1/10], Step [800/938], Loss: 0.3770
Epoch [1/10], Step [900/938], Loss: 0.3716
Epoch [1/10], Train Accuracy: 90.17%
Epoch [2/10], Step [100/938], Loss: 0.3558
Epoch [2/10], Step [200/938], Loss: 0.3242
Epoch [2/10], Step [300/938], Loss: 0.3319
Epoch [2/10], Step [400/938], Loss: 0.3091
Epoch [2/10], Step [500/938], Loss: 0.3065
Epoch [2/10], Step [600/938], Loss: 0.2945
Epoch [2/10], Step [700/938], Loss: 0.2825
Epoch [2/10], Step [800/938], Loss: 0.2857
Epoch [2/10], Step [900/938], Loss: 0.2552
Epoch [2/10], Train Accuracy: 92.16%
Epoch [3/10], Step [100/938], Loss: 0.2491
Epoch [3/10], Step [200/938], Loss: 0.2599
Epoch [3/10], Step [300/938], Loss: 0.2598
Epoch [3/10], Step [400

##### Evaluating the Model

In [40]:
# After training the model, we can evaluate its performance on the test dataset.
test_accuracy = calculate_accuracy(simple_model, test_loader, device)
print(f"Accuracy: {test_accuracy:.2f}%")

Accuracy: 96.79%


Good job! Now you have successfully built and trained a simple neural network model using PyTorch on the MNIST dataset. And look at that, accuracy >= 95% is achieved on the test dataset.

##### MobileNetV2 with CIFAR10 Dataset
Paper: [MobileNetV2: Inverted Residuals and Linear Bottlenecks for Efficient Mobile Network Design](https://arxiv.org/abs/1801.04381)

Now, let's build a more complex model with MobileNetV2 architecture and train it on CIFAR10 dataset.
First of all, we dive into MobileNetV2 architecture, which is a lightweight deep learning model designed for mobile and embedded vision applications. It is based on depthwise separable convolutions, which significantly reduce the number of parameters and computational cost compared to traditional convolutional neural networks (CNNs).

In this architecture, we will use the following components:
1. Inverted Residual Block: A residual block where the number of channels is expanded first, then reduced (inverted from traditional ResNet).
2. Linear Bottleneck: The final layer of the block uses a linear activation function instead of ReLU to avoid non-linearity in the bottleneck layer.
3. Depthwise Separable Convolution: Similar to MobileNetV1, it uses depthwise separable convolutions to reduce the number of parameters and computations.
4. Skip Connections: Similar to ResNet, it uses skip connections to allow gradients to flow through the network more easily.

##### Inverted Residual Block
Inverted Residual Block is a key component of MobileNetV2 architecture. It consists of three main parts:
1. Expansion: The input is first expanded to a higher number of channels using a 1x1 convolution.
2. Depthwise Convolution: A depthwise convolution is applied to the expanded channels, which applies a single filter to each input channel.
3. Projection: The output of the depthwise convolution is then projected back to a lower number of channels using another 1x1 convolution.
4. Skip Connection: If the input and output channels are the same, a skip connection is added to allow gradients to flow through the network more easily.

The Inverted Residual Block can be implemented in PyTorch as follows:

In [75]:
import torch
import torch.nn as nn


class InvertedResidualBlock(nn.Module):
    """
    Inverted Residual Block
    This block is a key component of MobileNetV2 architecture.
    It consists of three main parts:
    1. Expansion: The input is first expanded to a higher number of channels using a 1x1 convolution.
    2. Depthwise Convolution: A depthwise convolution is applied to the expanded channels.
    3. Projection: The output of the depthwise convolution is then projected back to a lower
    number of channels using another 1x1 convolution.
    4. Skip Connection: If the input and output channels are the same, a skip connection is added.
    Args:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        stride (int): Stride for the depthwise convolution.
        expand_ratio (float): Expansion ratio for the number of channels.
    Returns:
        torch.Tensor: Output tensor after applying the Inverted Residual Block.
    """

    def __init__(self, in_channels, out_channels, stride, expand_ratio):
        super(InvertedResidualBlock, self).__init__()
        self.stride = stride

        """
        The distinction of Inverted Residual Block and Residual Block is that the Inverted
        Residual Block first expands the number of channels, then reduces it whereas the
        Residual Block reduces the number of channels first.
        So, hidden dimension is calculated by multiplying the input channels with the expand ratio.
        """
        hidden_dim = int(in_channels * expand_ratio)

        # Residual Connection same with ResNet
        self.use_residual_connection = self.stride == 1 and in_channels == out_channels

        # init layers
        layers = []
        if expand_ratio != 1:
            # 1. Expansion (Conv 1x1) -> only change channels to hidden dimension
            layers.append(nn.Conv2d(in_channels, hidden_dim, kernel_size=1, bias=False))
            layers.append(nn.BatchNorm2d(hidden_dim))
            layers.append(nn.ReLU6(inplace=True))

        # 2. Depthwise Convolution
        layers.append(nn.Conv2d(
            in_channels=hidden_dim,
            out_channels=hidden_dim,
            kernel_size=3,
            stride=stride,
            groups=hidden_dim,  # Depthwise convolution
            padding=1,
        ))
        layers.append(nn.BatchNorm2d(hidden_dim))
        layers.append(nn.ReLU6(inplace=True))

        # 3. Projection (Conv 1x1, no activation)
        layers.append(nn.Conv2d(hidden_dim, out_channels, kernel_size=1, bias=False))
        layers.append(nn.BatchNorm2d(out_channels))

        # Combine all layers into a sequential block
        self.block = nn.Sequential(*layers)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass of the Inverted Residual Block.
        Args:
            x (torch.Tensor): Input tensor.
        Returns:
            torch.Tensor: Output tensor after applying the block.
        """
        if self.use_residual_connection:
            return x + self.block(x)  # Add skip connection
        else:
            return self.block(x)  # No skip connection

##### MobileNetV2 Model

In [76]:
import torch
import torch.nn as nn


# (1, 16, 1, 1),
# (6, 24, 2, 2),
# (6, 32, 3, 2),
# (6, 64, 4, 2),
# (6, 96, 3, 1),
# (6, 160, 3, 2),
# (6, 320, 1, 1),


class MobileNetV2(nn.Module):
    """
    MobileNetV2
    This class implements the MobileNetV2 architecture.
    It consists of an initial convolution layer followed by a series of Inverted
    Residual Blocks. Follow Architecture:
    1. First layer: A 3x3 convolution with stride 2 and 32 output channels.
    2. Inverted Residual Blocks: A series of blocks with different configurations
    following the pattern:
        - t, c, n, s: (expand_ratio, out_channels, num_blocks, stride)
        - expand_ratio: The ratio by which the number of channels is expanded in the block.
    3. Last layer: A 1x1 convolution with 1280 output channels.
    4. Final layer: A 1x1 convolution with num_classes output channels.
    Remember BatchNorm and ReLU6 will be applied in this architecture.
    Args:
        num_classes (int): Number of output classes for classification.
    Returns:
        torch.Tensor: Output tensor after applying the MobileNetV2 model.
    """

    def __init__(self, num_classes=10):  # use for CIFAR10
        super(MobileNetV2, self).__init__()
        self.configuration = [
            (1, 16, 1, 1),
            (6, 24, 2, 2),
            (6, 32, 3, 2),
            (6, 64, 4, 2),
            (6, 96, 3, 1),
            (6, 160, 3, 2),
            (6, 320, 1, 1),
        ]

        # Initial First Layer
        input_channels = 32
        layers = [
            nn.Conv2d(3, input_channels, kernel_size=3, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(input_channels),
            nn.ReLU6(inplace=True)
        ]

        # Inverted Residual Blocks
        for t, c, n, s in self.configuration:
            outputs_channels = c
            for i in range(n):  # n is number of blocks
                # First block uses stride, others use 1
                # Example:
                #     (1, 16, 1, 1) -> 1 block with stride 1
                #     (6, 24, 2, 2) -> 2 blocks with stride 2 for first block and 1 for second block
                stride = s if i == 0 else 1
                layers.append(InvertedResidualBlock(input_channels, outputs_channels, stride, t))
                input_channels = outputs_channels  # Update input channels for next block

        # Final Conv 1x1
        final_channel = 1280
        # Use kernel_size=1 to reduce the number of channels to final_channel
        layers.append(nn.Conv2d(input_channels, final_channel, kernel_size=1, bias=False))
        layers.append(nn.BatchNorm2d(final_channel))
        layers.append(nn.ReLU6(inplace=True))
        self.features = nn.Sequential(*layers)

        # Classifier
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Conv2d(final_channel, num_classes, kernel_size=1, bias=False)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass of the MobileNetV2 model.
        Args:
            x (torch.Tensor): Input tensor.
        Returns:
            torch.Tensor: Output tensor after applying the MobileNetV2 model.
        """
        x = self.features(x)  # Apply feature extraction layers
        x = self.avgpool(x)  # Adaptive average pooling to 1x1
        x = self.classifier(x)  # Final classification layer
        x = torch.flatten(x, 1)  # Flatten the output to [B, num_classes]
        return x

##### Model Summary

In [77]:
from torchinfo import summary

mobilenet_v2 = MobileNetV2(num_classes=10)  # CIFAR10 has 10 classes
summary(mobilenet_v2, (1, 3, 32, 32))

Layer (type:depth-idx)                   Output Shape              Param #
MobileNetV2                              [1, 10]                   --
├─Sequential: 1-1                        [1, 1280, 1, 1]           --
│    └─Conv2d: 2-1                       [1, 32, 16, 16]           864
│    └─BatchNorm2d: 2-2                  [1, 32, 16, 16]           64
│    └─ReLU6: 2-3                        [1, 32, 16, 16]           --
│    └─InvertedResidualBlock: 2-4        [1, 16, 16, 16]           --
│    │    └─Sequential: 3-1              [1, 16, 16, 16]           928
│    └─InvertedResidualBlock: 2-5        [1, 24, 8, 8]             --
│    │    └─Sequential: 3-2              [1, 24, 8, 8]             5,232
│    └─InvertedResidualBlock: 2-6        [1, 24, 8, 8]             --
│    │    └─Sequential: 3-3              [1, 24, 8, 8]             8,976
│    └─InvertedResidualBlock: 2-7        [1, 32, 4, 4]             --
│    │    └─Sequential: 3-4              [1, 32, 4, 4]             10,144
│  

##### Loading CIFAR10 Dataset

In [78]:
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torchvision import transforms

# CIFAR10 dataset
dataset = "/Users/hinsun/Workspace/ComputerScience/DeepLearning/data"
train_batch_size = 64
test_batch_size = 64

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010])
])

train_dataset = CIFAR10(root=dataset, train=True, download=True, transform=transform)
test_dataset = CIFAR10(root=dataset, train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=train_batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=test_batch_size, shuffle=False)

##### Training the Model

In [80]:
def training(model_train, epochs_train, current_train_loader, train_criterion, train_optimizer):
    for epoch in range(epochs_train):
        running_loss = 0.0
        for i, (inputs, labels) in enumerate(current_train_loader):
            # Remember convert inputs and labels to the device (GPU or CPU)
            inputs, labels = inputs.to(device), labels.to(device)

            # Forward pass
            outputs = model_train(inputs)
            loss = train_criterion(outputs, labels)

            # Backward pass and optimization
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if (i + 1) % 100 == 0:  # Print every 100 mini-batches
                print(
                    f"Epoch [{epoch + 1}/{epochs_train}], Step [{i + 1}/{len(current_train_loader)}], Loss: {running_loss / 100:.4f}")
                running_loss = 0.0

        # Calculate accuracy after each epoch
        accuracy = calculate_accuracy(model_train, current_train_loader, device)
        print(f"Epoch [{epoch + 1}/{epochs_train}], Train Accuracy: {accuracy:.2f}%")


lr = 0.01
momentum = 0.9
nesterov = True
device = torch.accelerator.current_accelerator()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(mobilenet_v2.parameters(), lr=lr, momentum=momentum, nesterov=nesterov)
mobilenet_v2 = mobilenet_v2.to(device)
epochs = 5

training(mobilenet_v2, epochs, train_loader, criterion, optimizer)

Epoch [1/5], Step [100/782], Loss: 0.6705
Epoch [1/5], Step [200/782], Loss: 0.6662
Epoch [1/5], Step [300/782], Loss: 0.7205
Epoch [1/5], Step [400/782], Loss: 0.6996
Epoch [1/5], Step [500/782], Loss: 0.7426
Epoch [1/5], Step [600/782], Loss: 0.7244
Epoch [1/5], Step [700/782], Loss: 0.7259
Epoch [1/5], Train Accuracy: 80.41%
Epoch [2/5], Step [100/782], Loss: 0.6233
Epoch [2/5], Step [200/782], Loss: 0.6277
Epoch [2/5], Step [300/782], Loss: 0.6823
Epoch [2/5], Step [400/782], Loss: 0.7120
Epoch [2/5], Step [500/782], Loss: 0.6873
Epoch [2/5], Step [600/782], Loss: 0.6544
Epoch [2/5], Step [700/782], Loss: 0.6903
Epoch [2/5], Train Accuracy: 78.66%
Epoch [3/5], Step [100/782], Loss: 0.5827
Epoch [3/5], Step [200/782], Loss: 0.5830
Epoch [3/5], Step [300/782], Loss: 0.6160
Epoch [3/5], Step [400/782], Loss: 0.6279
Epoch [3/5], Step [500/782], Loss: 0.6384
Epoch [3/5], Step [600/782], Loss: 0.6559
Epoch [3/5], Step [700/782], Loss: 0.6594
Epoch [3/5], Train Accuracy: 83.00%
Epoch [4/5

##### Applying Techniques to Improve Model Performance

Look at that, we have trained the MobileNetV2 model on CIFAR10 dataset for 10 epochs. Time training so long, but we can apply some techniques to improve, such as:
1. Save and Load Model: Save the trained model to disk and load it later for inference or further training.
2. Try Adam Optimizer: Use the Adam optimizer instead of SGD for better convergence.

In [85]:
is_new_training = False
mobilenet_v2 = None
path = "/Users/hinsun/Workspace/ComputerScience/DeepLearning/saved/mobilenet_v2_cifar10.pth"
device = torch.accelerator.current_accelerator()

if is_new_training:
    print("Training new model...")
    mobilenet_v2 = MobileNetV2(num_classes=10).to(device)
else:
    print("Loading pre-trained model...")
    mobilenet_v2 = MobileNetV2(num_classes=10).to(device)
    mobilenet_v2.load_state_dict(torch.load(path, map_location=device))

lr = 0.1
momentum = 0.9
nesterov = True
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(mobilenet_v2.parameters(), lr=lr, momentum=momentum, nesterov=nesterov)

epochs = 5
training(mobilenet_v2, epochs, train_loader, criterion, optimizer)

# Save the trained model
torch.save(mobilenet_v2.state_dict(), path)

Loading pre-trained model...
Epoch [1/5], Step [100/782], Loss: 0.9204
Epoch [1/5], Step [200/782], Loss: 0.9401
Epoch [1/5], Step [300/782], Loss: 0.9090
Epoch [1/5], Step [400/782], Loss: 0.9043
Epoch [1/5], Step [500/782], Loss: 0.8932
Epoch [1/5], Step [600/782], Loss: 0.9091
Epoch [1/5], Step [700/782], Loss: 0.8704
Epoch [1/5], Train Accuracy: 67.74%
Epoch [2/5], Step [100/782], Loss: 0.8474
Epoch [2/5], Step [200/782], Loss: 0.8352
Epoch [2/5], Step [300/782], Loss: 0.8395
Epoch [2/5], Step [400/782], Loss: 0.8287
Epoch [2/5], Step [500/782], Loss: 0.8298
Epoch [2/5], Step [600/782], Loss: 0.8603
Epoch [2/5], Step [700/782], Loss: 0.7941
Epoch [2/5], Train Accuracy: 68.28%
Epoch [3/5], Step [100/782], Loss: 0.7773
Epoch [3/5], Step [200/782], Loss: 0.8027
Epoch [3/5], Step [300/782], Loss: 0.7815
Epoch [3/5], Step [400/782], Loss: 0.7825
Epoch [3/5], Step [500/782], Loss: 0.7805
Epoch [3/5], Step [600/782], Loss: 0.7660
Epoch [3/5], Step [700/782], Loss: 0.7930
Epoch [3/5], Trai