Skip to content

Latest commit

 

History

History
278 lines (230 loc) · 12.5 KB

File metadata and controls

278 lines (230 loc) · 12.5 KB

ML.NET + Azure DevOps = MLOps

How do you keep your model up to date as the data, and the code used during training changes? What about automatic builds and deployments?

We can apply the same DevOps' principles and ideas to managing our machine learning model as we can with our code. For demonstration purposes we'll be using Azure DevOps.

mlops

1. Getting started

Getting started

Create an Azure DevOps account

Feel free to skip this section if you already have an account.

  1. Navigate to Azure DevOps
  2. Click on Start free devops
  3. Follow the provided instructions to create a free account

Fork repository

  1. In the top right corner of this repo, click Fork
  2. Select to Fork this repository to your own Github account

Remove existing YML file

  1. After the repository has been forked, remove the existing azure-pipelines.yml file

2. Set up Continuous Integration

Set up Continuous Integration

  1. Navigate to Azure DevOps
  2. Click on New Project in the top-right corner
  3. Give the new project a name, e.g. fraud-detection
  4. In the menu to the left under Pipelines, click on Builds and then New pipeline newproject
  5. In the list, select GitHub starter
  6. You may be asked to enter your Github account for authentication
  7. In the list of repositories, select the new repository you just forked
  8. Click on Approve and Install to install Azure Pipelines in the forked repository
  9. Select Starter pipeline starter
  10. Let's make some changes to the default YML file.

10.1. Change the VM image to

pool:
  vmImage: 'windows-latest'

10.2. Add a variables section (after the vmImage)

variables:
  buildConfiguration: 'Release'

10.3. Replace the current steps with

- script: dotnet build src/machine-learning/FraudPredictionTrainer/FraudPredictionTrainer.csproj --configuration $(buildConfiguration)
  displayName: 'Build Trainer Console App (dotnet build) $(buildConfiguration)'

- script: dotnet run --project src/machine-learning/FraudPredictionTrainer/FraudPredictionTrainer.csproj --configuration $(buildConfiguration)
  displayName: 'Train ML model (dotnet run)'

The steps above builds and runs the console application used to train our model.

Your YML file should now look like pipeline

  1. In the top-right corner, click Save and Run

If you have a look at the completed build, you'll see that it failed. This is because the console application cannot find the data.csv file used during training. For smaller data sources, it may make sense to include them in the repository. For any file larger than 100 Mb (as in our case), we can instead store the data in an Azure file share and mount the share as a separate step in the build. Let's have a look at how this can be done.

2.1. Create an Azure File Share
  1. Navigate to the Azure portal
  2. Navigate to a previously created storage account (in part 2)
  3. In the storage account, select File shares files
  4. In the top-middle, click on + File share
  5. Give the file share a name, e.g. data
  6. Click Create

As the current data source is 500+ Mb large, we'll only use a small portion of the total amount of data for demonstrational purposes. This will speed up the build process.

  1. Upload the following file to your newly created file share
2.2. Mount the Azure File Share as part of the build
  1. Navigate back to Azure DevOps
  2. If you're not already in your YML file, click the Edit button in the top-right corner to edit your build pipeline

In your YML file, add the snippet below as a first step (make sure to replace the two placeholders with the name of your storage account)

- script: 'net use X: \\nameofyourstorageaccount.file.core.windows.net\data /u:nameofyourstorageaccount $(filestorage.key)'
  displayName: 'Map disk drive to Azure Files share folder'
  1. Replace the variables section with:
variables:
- group: fraud-detection
- name: buildConfiguration
  value: 'Release'
  1. Click Save

Your YML file should now like: pipeline

The final piece that is missing, is a variable holding the access key to your file share.

  1. In Azure DevOps, navigate to variable groups, by clicking on the Library menu item to the left
  2. Click on + Variable group variablegroup
  3. Name the variable group fraud-detection
  4. Add a new variable called filestorage.key
  5. Set the value of the variable to the access key of your storage account. The access key can be found by navigating to your storage account and selecting the Access Keys menu option in the menu to the left variablegroup
  6. Make sure to check the lock symbol to the right, so that the variable becomes a secret variable
  7. Click Save

To queue a new build, navigate to Pipelines -> Builds and click on the Queue button in the top-right corner. The build should now complete successfully in about 2 min.

3. Set up Continuous Delivery

Set up Continuous Delivery

We now have a continuous integration pipeline set up, in which a new model is trained each time any check-in to the repository is made. We can take this one step further and deploy the MLModel.zip to our Azure storage account on completion of the build.

  1. In Azure DevOps, click on Project Settings in the bottom-left corner
  2. In the menu that appears, click on Service Connections (under pipelines) service
  3. Click on + New Service Connection and select Azure Resource Manager in the list
  4. In the modal that appears, give the connection the name of Azure and select your subscription
  5. Click OK to close the modal
  6. In Azure DevOps, navigate back to your build's YML file
  7. Copy/paste the following as the last step. Replace the placeholder with the name of your storage account
- task: AzureFileCopy@3
inputs:
 SourcePath: 'X://MLModel.zip'
 azureSubscription: 'Azure'
 Destination: 'AzureBlob'
 storage: '{name-of-your-storage-account}'
 ContainerName: 'model'
  1. Click Save

Your YML file should now look like: service

Great job! You've now successfully set up a CI/CD pipeline for your model. This pipeline can be further extended with triggers for changes in data, or additional unit and integration tests to ensure the model performance as expected.

4. Add Automated Testing

Add Automated Testing

In the same way as we can add unit tests to test our regular code base, we can add unit tests to test the performance of our model. Let's have a look at how we can do just that.

  1. Open VS Code
  2. In VS Code, select Terminal -> New Terminal to open a new terminal window
  3. Navigate to {location of forked repo}\mldotnet-real-time-data-streaming-workshop\src\machine-learning
  4. In the terminal, execute the following command to create a new test project dotnet new nunit -o FraudPrediction.Tests
  5. In the terminal, execute the following command to navigate to the location of the new test project cd FraudPrediction.Tests
  6. In the terminal, execute the following command to open the folder in VS Code code . -r
  7. Open a new terminal and execute the following command to add the required NuGet packages
dotnet add package Microsoft.ML
  1. Open the project file called FraudPrediction.Tests.csproj
  2. Within ItemGroup add a project reference to FraudPredictionTrainer.csproj
    <ProjectReference Include="..\FraudPredictionTrainer\FraudPredictionTrainer.csproj" />
  1. Delete the default UnitTest1.cs class
  2. Add a new class called FraudPredictionTests.cs
  3. Copy the following to the new class
using NUnit.Framework;
using FraudPredictionTrainer;
using Microsoft.ML;

namespace FraudPrediction.Tests
{
  [TestFixture]
  public class FraudDetectionTests 
  {
      private PredictionEngine<Transaction, FraudPrediction> predictionEngine;

      [SetUp]
      public void SetUp()
      {
          var mlContext = new MLContext();
          var model = mlContext.Model.Load(@"X:\\MLModel.zip", out _);
          predictionEngine = mlContext.Model.CreatePredictionEngine<Transaction, FraudPrediction>(model);
      }

      [Test]
      public void Predict_GivenNonFraudulentTransaction_ShouldReturnFalse()
      {
          //Arrange
          var transaction = new Transaction 
          {
                  Amount = 1500f,
                  OldbalanceDest = 100,
                  NewbalanceDest = 300,
                  NameDest = "C123",
                  NameOrig = "B123"
          };

          //Act
          var result = this.predictionEngine.Predict(transaction);

          //Assert
          Assert.IsFalse(result.IsFraud);
      }

  }
}
  1. Add a new class called FraudPrediction.cs
  2. Copy the following to the new class
using Microsoft.ML.Data;

namespace FraudPrediction.Tests 
{
    public class FraudPrediction
    {
        [ColumnName("PredictedLabel")]
        public bool IsFraud { get; set; }

        [ColumnName("Score")]
        public float Score { get; set; }
    }
}
  1. In the terminal, execute the following command to build the solution dotnet build
  2. To run the test, execute the following command in the terminal window dotnet test
  3. Commit and push the changes to your repository

Congratulations! You've just created your first unit test to test your machine learning model. Let's see if we can integrate this test in our CI/CD pipeline.

  1. Navigate to Azure DevOps
  2. Open the build's YML file
  3. Copy/paste the following as the second-to-last step (before the copy to blob storage step)
- task: DotNetCoreCLI@2
  displayName: 'Run Unit Tests using trained ML model'
  inputs:
    command: test
    projects: 'src/machine-learning/FraudPrediction.Tests/FraudPrediction.Tests.csproj'
    arguments: '--configuration $(buildConfiguration)'
  1. Click Save and queue up a new build to see test run as part of the build

Your YML file should now look like: finalyaml