Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: Create and train a PyTorch model for digit classification

minutes_to_complete: 80

who_is_this_for: This is an introductory topic for software developers interested in learning how to use PyTorch to create and train a feedforward neural network for digit classification.

learning_objectives:
- Prepare a PyTorch development environment.
- Download and prepare the MNIST dataset.
- Create a neural network architecture using PyTorch.
- Train a neural network using PyTorch.

prerequisites:
- A computer that can run Python3 and Visual Studio Code. The OS can be Windows, Linux, or macOS.


author_primary: Dawid Borycki

### Tags
skilllevels: Introductory
subjects: ML
armips:
- Cortex-A
- Cortex-X
- Neoverse
operatingsystems:
- Windows
- Linux
- macOS
tools_software_languages:
- Android Studio
- Coding
shared_path: true
shared_between:
- servers-and-cloud-computing
- laptops-and-desktops
- smartphones-and-mobile

### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
# ================================================================================
# Edit
# ================================================================================

next_step_guidance: >
Proceed to Use Keras Core with TensorFlow, PyTorch, and JAX backends to continue exploring Machine Learning.

# 1-3 sentence recommendation outlining how the reader can generally keep learning about these topics, and a specific explanation of why the next step is being recommended.

recommended_path: "/learning-paths/servers-and-cloud-computing/keras-core/"

# Link to the next learning path being recommended(For example this could be /learning-paths/servers-and-cloud-computing/mongodb).


# further_reading links to references related to this path. Can be:
# Manuals for a tool / software mentioned (type: documentation)
# Blog about related topics (type: blog)
# General online references (type: website)

further_reading:
- resource:
title: PyTorch
link: https://pytorch.org
type: documentation
- resource:
title: MNIST
link: https://en.wikipedia.org/wiki/MNIST_database
type: website
- resource:
title: Visual Studio Code
link: https://code.visualstudio.com
type: website


# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
title: "Next Steps" # Always the same
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
# ================================================================================
# Edit
# ================================================================================

# Always 3 questions. Should try to test the reader's knowledge, and reinforce the key points you want them to remember.
# question: A one sentence question
# answers: The correct answers (from 2-4 answer options only). Should be surrounded by quotes.
# correct_answer: An integer indicating what answer is correct (index starts from 0)
# explanation: A short (1-3 sentence) explanation of why the correct answer is correct. Can add additional context if desired


review:
- questions:
question: >
Does the input layer of the model flatten the 28x28 pixel image into a 1D array of 784 elements?
answers:
- "Yes"
- "No"
correct_answer: 1
explanation: >
Yes, the model uses nn.Flatten() to reshape the 28x28 pixel image into a 1D array of 784 elements for processing by the fully connected layers.
- questions:
question: >
Will the model make random predictions if it’s run before training?
answers:
- "Yes"
- "No"
correct_answer: 1
explanation: >
Yes, however in such the case the model will produce random outputs, as the network has not been trained to recognize any patterns from the data.
- questions:
question: >
Which loss function was used to train the PyTorch model on the MNIST dataset?
answers:
- Mean Squared Error Loss
- CrossEntropyLoss
- Hinge Loss
- Binary Cross-Entropy Loss
correct_answer: 2
explanation: >
The CrossEntropyLoss function was used to train the model because it is suitable for multi-class classification tasks like digit classification. It measures the difference between the predicted probabilities and the true class labels, helping the model learn to make accurate predictions.

# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
title: "Review" # Always the same title
weight: 20 # Set to always be larger than the content in this path
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
---
# User change
title: "Datasets and training"

weight: 5

layout: "learningpathall"
---

Start by downloading the MNIST dataset. Proceed as follows:

1. Open the pytorch-digits.ipynb you created earlier.

2. Add the following statements:

```python
from torchvision import transforms, datasets
from torch.utils.data import DataLoader

# Training data
training_data = datasets.MNIST(
root="data",
train=True,
download=True,
transform=transforms.ToTensor()
)

# Test data
test_data = datasets.MNIST(
root="data",
train=False,
download=True,
transform=transforms.ToTensor()
)

# Dataloaders
batch_size = 32

train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
```

The above code snippet downloads the MNIST dataset, transforms the images into tensors, and sets up data loaders for training and testing. Specifically, the `datasets.MNIST` function is used to download the MNIST dataset, with `train=True` indicating training data and `train=False` indicating test data. The `transform=transforms.ToTensor()` argument converts each image in the dataset into a PyTorch tensor, which is necessary for model training and evaluation.

The DataLoader wraps the datasets and allows efficient loading of data in batches. It handles data shuffling, batching, and parallel loading. Here, the train_dataloader and test_dataloader are created with a batch_size of 32, meaning they will load 32 images per batch during training and testing.

This setup prepares the training and test datasets for use in a machine learning model, enabling efficient data handling and model training in PyTorch.

To run the above code, you will need to install certifi package:

```console
pip install certifi
```

The certifi Python package provides the Mozilla root certificates, which are essential for ensuring the SSL connections are secure. If you’re using macOS, you may also need to install the certificates by running:

```console
/Applications/Python\ 3.x/Install\ Certificates.command
```

Make sure to replace `x` with the number of Python version you have installed.

After running the code you will see the output that might look like shown below:

![image](Figures/01.png)

# Train the model

To train the model, specify the loss function and the optimizer:

```Python
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
```

Use CrossEntropyLoss as the loss function and the Adam optimizer for training. The learning rate is set to 1e-3.

Next, define the methods for training and evaluating the feedforward neural network:

```Python
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (x, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(x)
loss = loss_fn(pred, y)

# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()

def test_loop(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0

with torch.no_grad():
for x, y in dataloader:
pred = model(x)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()

test_loss /= num_batches
correct /= size

print(f"Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
```

The first method, `train_loop`, uses the backpropagation algorithm to optimize the trainable parameters and minimize the prediction error of the neural network. The second method, `test_loop`, calculates the neural network error using the test images and displays the accuracy and loss values.

You can now invoke these methods to train and evaluate the model using 10 epochs.

```Python
epochs = 10

for t in range(epochs):
print(f"Epoch {t+1}:")
train_loop(train_dataloader, model, loss_fn, optimizer)
test_loop(test_dataloader, model, loss_fn)
```

After running this code, you will see the following output that shows the training progress.

![image](Figures/02.png)

Once the training is complete, you will see something like the following:

```output
Epoch 10:
Accuracy: 95.4%, Avg loss: 1.507491
```

which shows the model achieved around 95% of accuracy.

# Save the model

Once the model is trained, you can save it. There are various approaches for this. In PyTorch, you can save both the model’s structure and its weights to the same file using the `torch.save()` function. Alternatively, you can save only the weights (parameters) of the model, not the model architecture itself. This requires you to have the model’s architecture defined separately when loading. To save the model weights, you can use the following command:

```Python
torch.save(model.state_dict(), "model_weights.pth").
```

However, PyTorch does not save the definition of the class itself. When you load the model using `torch.load()`, PyTorch needs to know the class definition to recreate the model object.

Therefore, when you later want to use the saved model for inference, you will need to provide the definition of the model class.

Alternatively, you can use TorchScript, which serializes both the architecture and weights into a single file that can be loaded without needing the original class definition. This is particularly useful for deploying models to production or sharing models without code dependencies.

Use TorchScript to save the model using the following commands:

```Python
# Set model to evaluation mode
model.eval()

# Trace the model with an example input
traced_model = torch.jit.trace(model, torch.rand(1, 1, 28, 28))

# Save the traced model
traced_model.save("model.pth")
```

The above commands set the model to evaluation mode, trace the model, and save it. Tracing is useful for converting models with static computation graphs to TorchScript, making them portable and independent of the original class definition.

Setting the model to evaluation mode before tracing is important for several reasons:

1. Behavior of Layers like Dropout and BatchNorm:
* Dropout. During training, dropout randomly zeroes out some of the activations to prevent overfitting. During evaluation dropout is turned off, and all activations are used.
* BatchNorm. During training, Batch Normalization layers use batch statistics to normalize the input. During evaluation, they use running averages calculated during training.

2. Consistent Inference Behavior. By setting the model to eval mode, you ensure that the traced model will behave consistently during inference, as it will not use dropout or batch statistics that are inappropriate for inference.

3. Correct Tracing. Tracing captures the operations performed by the model using a given input. If the model is in training mode, the traced graph may include operations related to dropout and batch normalization updates. These operations can affect the correctness and performance of the model during inference.

In the next step, you will use the saved model for inference.
Loading