How do you keep your model up to date as the data, and the code used during training changes? What about automatic builds and deployments?
We can apply the same DevOps' principles and ideas to managing our machine learning model as we can with our code. For demonstration purposes we'll be using Azure DevOps.
Getting started
Create an Azure DevOps account
Feel free to skip this section if you already have an account.
- Navigate to Azure DevOps
- Click on Start free
- Follow the provided instructions to create a free account
Fork repository
- In the top right corner of this repo, click Fork
- Select to Fork this repository to your own Github account
Remove existing YML file
- After the repository has been forked, remove the existing
azure-pipelines.yml
file
Set up Continuous Integration
- Navigate to Azure DevOps
- Click on New Project in the top-right corner
- Give the new project a name, e.g.
fraud-detection
- In the menu to the left under Pipelines, click on Builds and then New pipeline
- In the list, select GitHub
- You may be asked to enter your Github account for authentication
- In the list of repositories, select the new repository you just forked
- Click on Approve and Install to install Azure Pipelines in the forked repository
- Select Starter pipeline
- Let's make some changes to the default YML file.
10.1. Change the VM image to
pool:
vmImage: 'windows-latest'
10.2. Add a variables section (after the vmImage)
variables:
buildConfiguration: 'Release'
10.3. Replace the current steps with
- script: dotnet build src/machine-learning/FraudPredictionTrainer/FraudPredictionTrainer.csproj --configuration $(buildConfiguration)
displayName: 'Build Trainer Console App (dotnet build) $(buildConfiguration)'
- script: dotnet run --project src/machine-learning/FraudPredictionTrainer/FraudPredictionTrainer.csproj --configuration $(buildConfiguration)
displayName: 'Train ML model (dotnet run)'
The steps above builds and runs the console application used to train our model.
Your YML file should now look like
- In the top-right corner, click Save and Run
If you have a look at the completed build, you'll see that it failed. This is because the console application cannot find the data.csv
file used during training. For smaller data sources, it may make sense to include them in the repository. For any file larger than 100 Mb (as in our case), we can instead store the data in an Azure file share and mount the share as a separate step in the build. Let's have a look at how this can be done.
- Navigate to the Azure portal
- Navigate to a previously created storage account (in part 2)
- In the storage account, select File shares
- In the top-middle, click on + File share
- Give the file share a name, e.g.
data
- Click Create
As the current data source is 500+ Mb large, we'll only use a small portion of the total amount of data for demonstrational purposes. This will speed up the build process.
- Upload the following file to your newly created file share
- Navigate back to Azure DevOps
- If you're not already in your YML file, click the Edit button in the top-right corner to edit your build pipeline
In your YML file, add the snippet below as a first step (make sure to replace the two placeholders with the name of your storage account)
- script: 'net use X: \\nameofyourstorageaccount.file.core.windows.net\data /u:nameofyourstorageaccount $(filestorage.key)'
displayName: 'Map disk drive to Azure Files share folder'
- Replace the variables section with:
variables:
- group: fraud-detection
- name: buildConfiguration
value: 'Release'
- Click Save
Your YML file should now like:
The final piece that is missing, is a variable holding the access key to your file share.
- In Azure DevOps, navigate to variable groups, by clicking on the Library menu item to the left
- Click on + Variable group
- Name the variable group fraud-detection
- Add a new variable called filestorage.key
- Set the value of the variable to the access key of your storage account. The access key can be found by navigating to your storage account and selecting the Access Keys menu option in the menu to the left
- Make sure to check the lock symbol to the right, so that the variable becomes a secret variable
- Click Save
To queue a new build, navigate to Pipelines -> Builds and click on the Queue button in the top-right corner. The build should now complete successfully in about 2 min.
Set up Continuous Delivery
We now have a continuous integration pipeline set up, in which a new model is trained each time any check-in to the repository is made. We can take this one step further and deploy the MLModel.zip to our Azure storage account on completion of the build.
- In Azure DevOps, click on Project Settings in the bottom-left corner
- In the menu that appears, click on Service Connections (under pipelines)
- Click on + New Service Connection and select Azure Resource Manager in the list
- In the modal that appears, give the connection the name of Azure and select your subscription
- Click OK to close the modal
- In Azure DevOps, navigate back to your build's YML file
- Copy/paste the following as the last step. Replace the placeholder with the name of your storage account
- task: AzureFileCopy@3
inputs:
SourcePath: 'X://MLModel.zip'
azureSubscription: 'Azure'
Destination: 'AzureBlob'
storage: '{name-of-your-storage-account}'
ContainerName: 'model'
- Click Save
Your YML file should now look like:
Great job! You've now successfully set up a CI/CD pipeline for your model. This pipeline can be further extended with triggers for changes in data, or additional unit and integration tests to ensure the model performance as expected.
Add Automated Testing
In the same way as we can add unit tests to test our regular code base, we can add unit tests to test the performance of our model. Let's have a look at how we can do just that.
- Open VS Code
- In VS Code, select Terminal -> New Terminal to open a new terminal window
- Navigate to
{location of forked repo}\mldotnet-real-time-data-streaming-workshop\src\machine-learning
- In the terminal, execute the following command to create a new test project
dotnet new nunit -o FraudPrediction.Tests
- In the terminal, execute the following command to navigate to the location of the new test project
cd FraudPrediction.Tests
- In the terminal, execute the following command to open the folder in VS Code
code . -r
- Open a new terminal and execute the following command to add the required NuGet packages
dotnet add package Microsoft.ML
- Open the project file called
FraudPrediction.Tests.csproj
- Within
ItemGroup
add a project reference toFraudPredictionTrainer.csproj
<ProjectReference Include="..\FraudPredictionTrainer\FraudPredictionTrainer.csproj" />
- Delete the default
UnitTest1.cs
class - Add a new class called
FraudPredictionTests.cs
- Copy the following to the new class
using NUnit.Framework;
using FraudPredictionTrainer;
using Microsoft.ML;
namespace FraudPrediction.Tests
{
[TestFixture]
public class FraudDetectionTests
{
private PredictionEngine<Transaction, FraudPrediction> predictionEngine;
[SetUp]
public void SetUp()
{
var mlContext = new MLContext();
var model = mlContext.Model.Load(@"X:\\MLModel.zip", out _);
predictionEngine = mlContext.Model.CreatePredictionEngine<Transaction, FraudPrediction>(model);
}
[Test]
public void Predict_GivenNonFraudulentTransaction_ShouldReturnFalse()
{
//Arrange
var transaction = new Transaction
{
Amount = 1500f,
OldbalanceDest = 100,
NewbalanceDest = 300,
NameDest = "C123",
NameOrig = "B123"
};
//Act
var result = this.predictionEngine.Predict(transaction);
//Assert
Assert.IsFalse(result.IsFraud);
}
}
}
- Add a new class called
FraudPrediction.cs
- Copy the following to the new class
using Microsoft.ML.Data;
namespace FraudPrediction.Tests
{
public class FraudPrediction
{
[ColumnName("PredictedLabel")]
public bool IsFraud { get; set; }
[ColumnName("Score")]
public float Score { get; set; }
}
}
- In the terminal, execute the following command to build the solution
dotnet build
- To run the test, execute the following command in the terminal window
dotnet test
- Commit and push the changes to your repository
Congratulations! You've just created your first unit test to test your machine learning model. Let's see if we can integrate this test in our CI/CD pipeline.
- Navigate to Azure DevOps
- Open the build's YML file
- Copy/paste the following as the second-to-last step (before the copy to blob storage step)
- task: DotNetCoreCLI@2
displayName: 'Run Unit Tests using trained ML model'
inputs:
command: test
projects: 'src/machine-learning/FraudPrediction.Tests/FraudPrediction.Tests.csproj'
arguments: '--configuration $(buildConfiguration)'
- Click Save and queue up a new build to see test run as part of the build