ML.NET + Azure DevOps = MLOps

How do you keep your model up to date as the data, and the code used during training changes? What about automatic builds and deployments?

We can apply the same DevOps' principles and ideas to managing our machine learning model as we can with our code. For demonstration purposes we'll be using Azure DevOps.

1. Getting started

Getting started

Create an Azure DevOps account

Feel free to skip this section if you already have an account.

Navigate to Azure DevOps
Click on Start free
Follow the provided instructions to create a free account

Fork repository

In the top right corner of this repo, click Fork
Select to Fork this repository to your own Github account

Remove existing YML file

After the repository has been forked, remove the existing azure-pipelines.yml file

2. Set up Continuous Integration

Set up Continuous Integration

Navigate to Azure DevOps
Click on New Project in the top-right corner
Give the new project a name, e.g. fraud-detection
In the menu to the left under Pipelines, click on Builds and then New pipeline
In the list, select GitHub
You may be asked to enter your Github account for authentication
In the list of repositories, select the new repository you just forked
Click on Approve and Install to install Azure Pipelines in the forked repository
Select Starter pipeline
Let's make some changes to the default YML file.

10.1. Change the VM image to

pool:
  vmImage: 'windows-latest'

10.2. Add a variables section (after the vmImage)

variables:
  buildConfiguration: 'Release'

10.3. Replace the current steps with

- script: dotnet build src/machine-learning/FraudPredictionTrainer/FraudPredictionTrainer.csproj --configuration $(buildConfiguration)
  displayName: 'Build Trainer Console App (dotnet build) $(buildConfiguration)'

- script: dotnet run --project src/machine-learning/FraudPredictionTrainer/FraudPredictionTrainer.csproj --configuration $(buildConfiguration)
  displayName: 'Train ML model (dotnet run)'

The steps above builds and runs the console application used to train our model.

Your YML file should now look like

In the top-right corner, click Save and Run

If you have a look at the completed build, you'll see that it failed. This is because the console application cannot find the data.csv file used during training. For smaller data sources, it may make sense to include them in the repository. For any file larger than 100 Mb (as in our case), we can instead store the data in an Azure file share and mount the share as a separate step in the build. Let's have a look at how this can be done.

2.1. Create an Azure File Share

Navigate to the Azure portal
Navigate to a previously created storage account (in part 2)
In the storage account, select File shares
In the top-middle, click on + File share
Give the file share a name, e.g. data
Click Create

As the current data source is 500+ Mb large, we'll only use a small portion of the total amount of data for demonstrational purposes. This will speed up the build process.

Upload the following file to your newly created file share

2.2. Mount the Azure File Share as part of the build

Navigate back to Azure DevOps
If you're not already in your YML file, click the Edit button in the top-right corner to edit your build pipeline

In your YML file, add the snippet below as a first step (make sure to replace the two placeholders with the name of your storage account)

- script: 'net use X: \\nameofyourstorageaccount.file.core.windows.net\data /u:nameofyourstorageaccount $(filestorage.key)'
  displayName: 'Map disk drive to Azure Files share folder'

Replace the variables section with:

variables:
- group: fraud-detection
- name: buildConfiguration
  value: 'Release'

Click Save

Your YML file should now like:

The final piece that is missing, is a variable holding the access key to your file share.

In Azure DevOps, navigate to variable groups, by clicking on the Library menu item to the left
Click on + Variable group
Name the variable group fraud-detection
Add a new variable called filestorage.key
Set the value of the variable to the access key of your storage account. The access key can be found by navigating to your storage account and selecting the Access Keys menu option in the menu to the left
Make sure to check the lock symbol to the right, so that the variable becomes a secret variable
Click Save

To queue a new build, navigate to Pipelines -> Builds and click on the Queue button in the top-right corner. The build should now complete successfully in about 2 min.

3. Set up Continuous Delivery

Set up Continuous Delivery

We now have a continuous integration pipeline set up, in which a new model is trained each time any check-in to the repository is made. We can take this one step further and deploy the MLModel.zip to our Azure storage account on completion of the build.

In Azure DevOps, click on Project Settings in the bottom-left corner
In the menu that appears, click on Service Connections (under pipelines)
Click on + New Service Connection and select Azure Resource Manager in the list
In the modal that appears, give the connection the name of Azure and select your subscription
Click OK to close the modal
In Azure DevOps, navigate back to your build's YML file
Copy/paste the following as the last step. Replace the placeholder with the name of your storage account

- task: AzureFileCopy@3
inputs:
 SourcePath: 'X://MLModel.zip'
 azureSubscription: 'Azure'
 Destination: 'AzureBlob'
 storage: '{name-of-your-storage-account}'
 ContainerName: 'model'

Click Save

Your YML file should now look like:

Great job! You've now successfully set up a CI/CD pipeline for your model. This pipeline can be further extended with triggers for changes in data, or additional unit and integration tests to ensure the model performance as expected.

4. Add Automated Testing

Add Automated Testing

In the same way as we can add unit tests to test our regular code base, we can add unit tests to test the performance of our model. Let's have a look at how we can do just that.

Open VS Code
In VS Code, select Terminal -> New Terminal to open a new terminal window
Navigate to {location of forked repo}\mldotnet-real-time-data-streaming-workshop\src\machine-learning
In the terminal, execute the following command to create a new test project dotnet new nunit -o FraudPrediction.Tests
In the terminal, execute the following command to navigate to the location of the new test project cd FraudPrediction.Tests
In the terminal, execute the following command to open the folder in VS Code code . -r
Open a new terminal and execute the following command to add the required NuGet packages

dotnet add package Microsoft.ML

Open the project file called FraudPrediction.Tests.csproj
Within ItemGroup add a project reference to FraudPredictionTrainer.csproj

    <ProjectReference Include="..\FraudPredictionTrainer\FraudPredictionTrainer.csproj" />

Delete the default UnitTest1.cs class
Add a new class called FraudPredictionTests.cs
Copy the following to the new class

using NUnit.Framework;
using FraudPredictionTrainer;
using Microsoft.ML;

namespace FraudPrediction.Tests
{
  [TestFixture]
  public class FraudDetectionTests 
  {
      private PredictionEngine<Transaction, FraudPrediction> predictionEngine;

      [SetUp]
      public void SetUp()
      {
          var mlContext = new MLContext();
          var model = mlContext.Model.Load(@"X:\\MLModel.zip", out _);
          predictionEngine = mlContext.Model.CreatePredictionEngine<Transaction, FraudPrediction>(model);
      }

      [Test]
      public void Predict_GivenNonFraudulentTransaction_ShouldReturnFalse()
      {
          //Arrange
          var transaction = new Transaction 
          {
                  Amount = 1500f,
                  OldbalanceDest = 100,
                  NewbalanceDest = 300,
                  NameDest = "C123",
                  NameOrig = "B123"
          };

          //Act
          var result = this.predictionEngine.Predict(transaction);

          //Assert
          Assert.IsFalse(result.IsFraud);
      }

  }
}

Add a new class called FraudPrediction.cs
Copy the following to the new class

using Microsoft.ML.Data;

namespace FraudPrediction.Tests 
{
    public class FraudPrediction
    {
        [ColumnName("PredictedLabel")]
        public bool IsFraud { get; set; }

        [ColumnName("Score")]
        public float Score { get; set; }
    }
}

In the terminal, execute the following command to build the solution dotnet build
To run the test, execute the following command in the terminal window dotnet test
Commit and push the changes to your repository

Congratulations! You've just created your first unit test to test your machine learning model. Let's see if we can integrate this test in our CI/CD pipeline.

Navigate to Azure DevOps
Open the build's YML file
Copy/paste the following as the second-to-last step (before the copy to blob storage step)

- task: DotNetCoreCLI@2
  displayName: 'Run Unit Tests using trained ML model'
  inputs:
    command: test
    projects: 'src/machine-learning/FraudPrediction.Tests/FraudPrediction.Tests.csproj'
    arguments: '--configuration $(buildConfiguration)'

Click Save and queue up a new build to see test run as part of the build

Your YML file should now look like:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

part6-mlops.md

part6-mlops.md

ML.NET + Azure DevOps = MLOps

1. Getting started

2. Set up Continuous Integration

2.1. Create an Azure File Share

2.2. Mount the Azure File Share as part of the build

3. Set up Continuous Delivery

4. Add Automated Testing

Files

part6-mlops.md

Latest commit

History

part6-mlops.md

File metadata and controls

ML.NET + Azure DevOps = MLOps

1. Getting started

2. Set up Continuous Integration

2.1. Create an Azure File Share

2.2. Mount the Azure File Share as part of the build

3. Set up Continuous Delivery

4. Add Automated Testing