Skip to content
Original file line number Diff line number Diff line change
@@ -1,24 +1,23 @@
---
title: MLOps with Arm-hosted GitHub Runners
draft: true
title: Optimize MLOps with Arm-hosted GitHub Runners

cascade:
draft: true

minutes_to_complete: 30
minutes_to_complete: 60

who_is_this_for: This is an introductory topic for software developers interested in automation for machine learning (ML) tasks.
who_is_this_for: This is an introductory topic for software developers interested in automation for Machine Learning (ML) tasks.

learning_objectives:
- Set up an Arm-hosted GitHub runner.
- Train and test a PyTorch ML model with the German Traffic Sign Recognition Benchmark (GTSRB) dataset.
- Use PyTorch compiled with OpenBLAS and oneDNN with Arm Compute Library to compare the performance of a trained model.
- Containerize the model and push the container to DockerHub.
- Automate all the steps in the ML workflow using GitHub Actions.
- Compare the performance of two trained PyTorch ML models; one model compiled with OpenBLAS (Open Basic Linear Algebra Subprograms Library) and oneDNN (Deep Neural Network Library), and the other model compiled with Arm Compute Library (ACL).
- Containerize a ML model and push the container to DockerHub.
- Automate steps in an ML workflow using GitHub Actions.

prerequisites:
- A GitHub account with access to Arm-hosted GitHub runners.
- A Docker Hub account for storing container images.
- Some familiarity with ML and continuous integration and deployment (CI/CD) concepts.
- Familiarity with the concepts of ML and continuous integration and deployment (CI/CD).

author_primary: Pareena Verma, Annie Tallund

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,29 @@ review:
- "No"
correct_answer: 1
explanation: >
Arm-hosted runners for use with GitHub Actions are available for Linux and Windows.
You can use Arm-hosted runners with GitHub Actions, and they are available for both Linux and Windows.

- questions:
question: >
What is the GTSRB dataset made up of?
What does the GTSRB dataset consist of?
answers:
- Sound files of spoken German words
- Sound files of animal sounds
- Images of flower petals
- Images of German traffic signs
- Sound files of spoken German words.
- Sound files of animal sounds.
- Images of flower petals.
- Images of German traffic signs.
correct_answer: 4
explanation: >
GTSRB stands for German Traffic Signs Recognition Benchmark
GTSRB stands for German Traffic Signs Recognition Benchmark, and the dataset consists of images of German traffic signs.

- questions:
question: >
ACL is included in PyTorch.
Is ACL included in PyTorch?
answers:
- "True"
- "False"
correct_answer: 1
explanation: >
While it is possible to use ACL stand-alone, the optimized kernels are built into PyTorch through the oneDNN backend.
While it is possible to use Arm Compute Library independently, the optimized kernels are built into PyTorch through the oneDNN backend.



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,33 +10,43 @@ layout: learningpathall

In this Learning Path, you will learn how to automate an MLOps workflow using Arm-hosted GitHub runners and GitHub Actions.

You will learn how to do the following tasks:
You will perform the following tasks:
- Train and test a neural network model with PyTorch.
- Compare the model inference time using two different PyTorch backends.
- Containerize the model and save it to DockerHub.
- Deploy the container image and use API calls to access the model.

## GitHub Actions

GitHub Actions is a platform that automates software development workflows, including continuous integration and continuous delivery. Every repository on GitHub has an `Actions` tab as shown below:
GitHub Actions is a platform that automates software development workflows, which includes Continuous Integration and Continuous Delivery (CI/CD).

Every repository on GitHub has an **Actions** tab as shown below:

![#actions-gui](images/actions-gui.png)

GitHub Actions runs workflow files to automate processes. Workflows run when specific events occur in a GitHub repository.

[YAML](https://yaml.org/) defines a workflow.

Workflows specify how a job is triggered, the running environment, and the commands to run.
Workflows specify:

* How a job is triggered.
* The running environment.
* The commands to run.

The machine running workflows is called a _runner_.
The machine running the workflows is called a _runner_.

## Arm-hosted GitHub runners

Hosted GitHub runners are provided by GitHub so you don't need to setup and manage cloud infrastructure. Arm-hosted GitHub runners use the Arm architecture so you can build and test software without cross-compiling or instruction emulation.
Hosted GitHub runners are provided by GitHub, so you do not need to set up and manage cloud infrastructure. Arm-hosted GitHub runners use the Arm architecture so you can build and test software without the necessity for cross-compiling or instruction emulation.

Arm-hosted GitHub runners enable you to:

Arm-hosted GitHub runners enable you to optimize your workflows, reduce cost, and improve energy consumption.
* Optimize your workflows.
* Reduce cost.
* Improve energy consumption.

Additionally, the Arm-hosted runners are preloaded with essential tools, making it easier for you to develop and test your applications.
Additionally, the Arm-hosted runners are preloaded with essential tools, which makes it easier for to develop and test your applications.

Arm-hosted runners are available for Linux and Windows. This Learning Path uses Linux.

Expand Down Expand Up @@ -66,22 +76,22 @@ jobs:

## Machine Learning Operations (MLOps)

Machine learning use-cases have a need for reliable workflows to maintain performance and quality.
Machine learning use cases require reliable workflows to maintain both performance and quality of output.

There are many tasks that can be automated in the ML lifecycle.
- Model training and re-training
- Model performance analysis
- Data storage and processing
- Model deployment
There are tasks that can be automated in the ML lifecycle, such as:
- Model training and retraining.
- Model performance analysis.
- Data storage and processing.
- Model deployment.

Developer Operations (DevOps) refers to good practices for collaboration and automation, including CI/CD. The domain-specific needs for ML, combined with DevOps knowledge, creates the new term MLOps.
Developer Operations (DevOps) refers to good practices for collaboration and automation, including CI/CD. MLOps describes the area of practice where the ML application development intersects with ML system deployment and operations.

## German Traffic Sign Recognition Benchmark (GTSRB)

This Learning Path explains how to train and test a PyTorch model to perform traffic sign recognition.

You will learn how to use the GTSRB dataset to train the model. The dataset is free to use under the [Creative Commons](https://creativecommons.org/publicdomain/zero/1.0/) license. It contains thousands of images of traffic signs found in Germany. It has become a well-known resource to showcase ML applications.

The GTSRB dataset is also good for comparing performance and accuracy of different models and to compare and contrast different PyTorch backends.
The GTSRB dataset is also effective for comparing the performance and accuracy of both different models, and different PyTorch backends.

Continue to the next section to learn how to setup an end-to-end MLOps workflow using Arm-hosted GitHub runners.
Continue to the next section to learn how to set up an end-to-end MLOps workflow using Arm-hosted GitHub runners.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ In this section, you will change the PyTorch backend being used to test the trai

In the previous section, you used the PyTorch 2.3.0 Docker Image compiled with OpenBLAS from DockerHub to run your testing workflow. PyTorch can be run with other backends. You will now modify the testing workflow to use PyTorch 2.3.0 Docker Image compiled with OneDNN and the Arm Compute Library.

The [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) is a collection of low-level machine learning functions optimized for Arm's Cortex-A and Neoverse processors and Mali GPUs. Arm-hosted GitHub runners use Arm Neoverse CPUs, which make it possible to optimize your neural networks to take advantage of processor features. ACL implements kernels (also known as operators or layers), using specific instructions that run faster on AArch64.
The [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) is a collection of low-level machine learning functions optimized for Arm's Cortex-A and Neoverse processors and Mali GPUs. Arm-hosted GitHub runners use Arm Neoverse CPUs, which make it possible to optimize your neural networks to take advantage of processor features. ACL implements kernels, which are also known as operators or layers, using specific instructions that run faster on AArch64.

ACL is integrated into PyTorch through [oneDNN](https://github.com/oneapi-src/oneDNN), an open-source deep neural network library.

Expand Down Expand Up @@ -43,11 +43,11 @@ jobs:

### Run the test workflow

Trigger the **Test Model** job again by clicking the `Run workflow` button on the `Actions` tab.
Trigger the **Test Model** job again by clicking the **Run workflow** button on the **Actions** tab.

The test workflow starts running.

Navigate to the workflow run on the `Actions` tab, click into the job, and expand the **Run testing script** step.
Navigate to the workflow run on the **Actions** tab, click into the job, and expand the **Run testing script** step.

You see a change in the performance results with OneDNN and ACL kernels being used.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,30 +6,29 @@ weight: 3
layout: learningpathall
---

In this section, you will fork the example GitHub repository containing the project code and inspect the Python code for training and testing a neural network model.

## Fork the example repository

Get started by forking the example repository.
In this section, you will fork the example GitHub repository containing the project code.

In a web browser, navigate to the repository at:
Get started by forking the example repository. In a web browser, navigate to the repository at:

```bash
https://github.com/Arm-Labs/gh_armrunner_mlops_gtsrb
```

Fork the repository, using the `Fork` button:
Fork the repository, using the **Fork** button:

![#fork](/images/fork.png)

Create a fork within a GitHub Organization or Team where you have access to Arm-hosted GitHub runners.

{{% notice Note %}}
If a repository with the same name `gh_armrunner_mlops_gtsrb` already exists in your Organization or Team you modify the repository name to make it unique.
If a repository with the same name `gh_armrunner_mlops_gtsrb` already exists in your Organization or Team, you can modify the repository name to make it unique.
{{% /notice %}}

## Learn about model training and testing

In this section, you will inspect the Python code for training and testing a neural network model.

Explore the repository using a browser to get familiar with code and the workflow files.

{{% notice Note %}}
Expand All @@ -42,13 +41,13 @@ The purpose is to provide an overview of the code used for training and testing

In the `scripts` directory, there is a Python script called `train_model.py`. This script loads the GTSRB dataset, defines a neural network, and trains the model on the dataset.

#### Data pre-processing
#### Data preprocessing

The first section loads the GTSRB dataset to prepare it for training. The GTSRB dataset is built into `torchvision`, which makes loading easier.

The transformations used when loading data are part of the pre-processing step, which makes the data uniform and ready to run through the extensive math operations of the ML model.
The transformations used when loading data are part of the preprocessing step, which makes the data uniform and ready to run through the extensive math operations of the ML model.

In accordance with common machine learning practices, data is separated into training and testing data to avoid over-fitting the neural network.
In accordance with common machine learning practices, data is separated into training and testing data to avoid overfitting the neural network.

Here is the code to load the dataset:

Expand All @@ -67,9 +66,9 @@ train_loader = DataLoader(train_set, batch_size=64, shuffle=True)

The next step is to define a class for the model, listing the layers used.

The model defines the forward-pass function used at training time to update the weights. Additionally, the loss function and optimizer for the model are defined.
The model defines the forward pass function used at training time to update the weights. Additionally, the loss function and optimizer for the model are defined.

Here is the code defining the model:
Here is the code that defines the model:

```python
class TrafficSignNet(nn.Module):
Expand Down Expand Up @@ -167,7 +166,7 @@ test_loader = DataLoader(test_set, batch_size=64, shuffle=False)

The testing loop passes each batch of test data through the model and compares predictions to the actual labels to calculate accuracy.

The accuracy is calculated as a percentage of correctly classified images. Both the accuracy and PyTorch profiler report is printed at the end of the script.
The accuracy is calculated as a percentage of correctly classified images. Both the accuracy and PyTorch profiler reports are printed at the end of the script.

Here is the testing loop with profiling:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,23 +135,23 @@ The `test-model.yml` file needs to be edited to be able to use the saved model f

Complete the steps below to modify the testing workflow file:

1. Navigate to the `Actions` tab on your GitHub repository.
1. Navigate to the **Actions** tab on your GitHub repository.

2. Click on `Train Model` on the left side of the page.
2. Click on **Train Model** on the left side of the page.

3. Click on the completed `Train Model` workflow.
3. Click on the completed **Train Model** workflow.

4. Copy the The 11 digit ID number from the end of the URL in your browser address bar.
4. Copy the 11-digit ID number from the end of the URL in your browser address bar.

![#run-id](/images/run-id.png)

5. Navigate back to the `Code` tab and open the file `.github/workflows/test-model.yml`.
5. Navigate back to the **Code** tab and open the file `.github/workflows/test-model.yml`.

6. Click the Edit button, represented by a pencil on the top right of the file contents.

7. Update the `run-id` parameter with the 11 digit ID number you copied.

8. Save the file by clicking the `Commit changes` button.
8. Save the file by clicking the **Commit changes** button.


#### Run the workflow file
Expand All @@ -160,7 +160,7 @@ You are now ready to run the **Test Model** workflow.

1. Navigate to the `Actions` tab and select the **Test Workflow** on the left side.

2. Click the `Run workflow` button to run the workflow on the main branch.
2. Click the **Run workflow** button to run the workflow on the main branch.

![#run-workflow](images/run-workflow.png)

Expand All @@ -170,7 +170,7 @@ Click on the workflow to view the output from each step.

![Actions_test](/images/actions_test.png)

Click on the "Run testing script" step to see the accuracy of the model and a table of the results from the PyTorch profiler.
Click on the **Run testing script** step to see the accuracy of the model and a table of the results from the PyTorch profiler.

The output from is similar to:

Expand Down