![Screenshot 2023-10-11 at 12 16 46](https://private-user-images.githubusercontent.com/74664634/277135759-5a413f01-1cb0-4400-b0a2-8ad913f743b1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyNDkxMjksIm5iZiI6MTcyMTI0ODgyOSwicGF0aCI6Ii83NDY2NDYzNC8yNzcxMzU3NTktNWE0MTNmMDEtMWNiMC00NDAwLWIwYTItOGFkOTEzZjc0M2IxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDIwNDAyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTIxZjRkMzVlYzg0ZjQ4OWFmNzM1MjhkMzcwMTI5YWFlMTE5YTQ2ZTM3NTdkZTcxYzBhZWU3MzNjNDg4OTllZDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.N_8e3N2OeuGIygv095VSeZZMp37WJScLe2EhEKoPbkI)
Tasks:
- Convert a notebook to production code
- Use an Azure Machine Learning job for automation
- Create a compute cluster
- Register a data asset
- Create a job that takes the data asset as input and train script as a command
- Trigger Azure Machine Learning jobs with GitHub Actions
- Create a service principal using the Azure CLI (to authenticate GitHub to manage the Azure Machine Learning workspace)
- Store the Azure credentials in a GitHub secret
- Define a GitHub Action in YAML
- Trigger GitHub Actions with feature-based development
- Protect the main branch to block direct pushes to main
- Create a new branch
- Make a change and push it
- Create a pull request and merge it into the main branch
- Work with linting and unit testing in GitHub Actions
- Install the tools (
Flake8
andPytest
) - Run the tests by specifying the folders within repo that need to be checked.
- Install the tools (
- Work with environments in GitHub Actions
- Create
development
,stage
andproduction
environments in Github repo and store secrets for each environment - Add an approval check for the
production
environment. - Remove the global repo
AZURE_CREDENTIALS
secret, so that each environment will only be able to use its own secret. - For each environment, add the
AZURE_CREDENTIALS
secret that contains the service principal output. - Create a new data asset in the
production
workspace - Create one GitHub Actions workflow, triggered by changes being pushed to the
main
branch, with two jobs:- The experiment job that trains the model using the diabetes-dev-folder dataset in the
development
environment. - The production job that trains the model in the production environment, using the production data (the diabetes-prod-folder data asset as input).
- The experiment job that trains the model using the diabetes-dev-folder dataset in the
- Add a condition that the production job is only allowed to run when the experiment job ran successfully
- Create
- Deploy a model with GitHub Action
- Package and register the model as an MLflow model from the production job.
- Create an online (managed) endpoint.
- Test the deployed model automatically with the same GitHub Action workflow (ensure that the testing only happens when the model deployment is completed successfully).
Workflow:
- The production code is hosted in the main branch.
- A data scientist creates a feature branch for model development.
- The data scientist creates a pull request to propose to push changes to the main branch.
- When a pull request is created, a GitHub Actions workflow is triggered to verify the code.
- When the code passes linting and unit testing, the lead data scientist needs to approve the proposed changes.
![Screenshot 2023-10-04 at 14 38 28](https://private-user-images.githubusercontent.com/74664634/272705534-acd563d9-091c-4f6b-8294-5f40873f61af.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyNDkxMjksIm5iZiI6MTcyMTI0ODgyOSwicGF0aCI6Ii83NDY2NDYzNC8yNzI3MDU1MzQtYWNkNTYzZDktMDkxYy00ZjZiLTgyOTQtNWY0MDg3M2Y2MWFmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDIwNDAyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNkN2Q1ZTM3OTkzNjYyZGNkNzMzMDIxZjE1ZWMzMmQ5ZWJjYzA0NjA0ZmFkZTcwY2U0YmVkYjllMWViNTY1ZjMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.7bOKtWz9ZJwI7LCfJaMBvfUnP2pKcAOcDWp0NoIbfWI)
- After the lead data scientist approves the changes, the pull request is merged, and the main branch is updated accordingly.
![Screenshot 2023-10-04 at 14 38 44](https://private-user-images.githubusercontent.com/74664634/272705549-242e520a-8126-4841-995b-c2acc140f0f2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyNDkxMjksIm5iZiI6MTcyMTI0ODgyOSwicGF0aCI6Ii83NDY2NDYzNC8yNzI3MDU1NDktMjQyZTUyMGEtODEyNi00ODQxLTk5NWItYzJhY2MxNDBmMGYyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDIwNDAyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBmMDdjZTJkMzRiNGY5MDA4MmFhMjFkN2E3ZmNkYTkzMjQ2Mzg0NWMxODg3NDE3MDgxZGMyZDdkMjE2MjlmNjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.uEaREPn8lVoIGVq-7KlWzpLG_D9dhHW5kPNvrEAYOfk)
Ideally, we don’t want to make the production data available in the experimentation (development) environment. Instead, data scientists will only have access to a small dataset which should behave similarly to the production dataset.
By reusing the training script, I can train the model in the production environment using the production data, simply by changing the data input.
The development environment is used for the inner loop:
- Data scientists train the model.
- The model is packaged and registered.
The staging environment is used for part of the outer loop:
- Test the code and model with linting and unit testing.
- Deploy the model to test the endpoint.
The production environment is used for another part of the outer loop:
- Deploy the model to the production endpoint. The production endpoint is integrated with the web application.
- Monitor the model and endpoint performance to trigger retraining when necessary.
Using the Azure Machine Learning CLI (v2), I want to set up an automated workflow that will be triggered when a new model is registered. Once the workflow is triggered, the new registered model will be deployed to the production environment.
When logging a model with mlflow.autologging()
during model training, the model is stored in the job output. Alternatively, I can store the model in an Azure Machine Learning datastore.
Note
When register the model as an MLflow type model, no need to provide a scoring script or environment to deploy the model.
To register the model, point to either a job's output, or to a location in an Azure Machine Learning datastore.
Warning
Standard_DS1_v2 and Standard_F2s_v2 may be too small for bigger models and may lead to container termination due to insufficient memory, not enough space on >the disk, or probe failure as it takes too long to initiate the container.
Here is some testing data to use for the model:
Pregnancies,PlasmaGlucose,DiastolicBloodPressure,TricepsThickness,SerumInsulin,BMI,DiabetesPedigree,Age
9,104,51,7,24,27.36983156,1.350472047,43
6,73,61,35,24,18.74367404,1.074147566,75
4,115,50,29,243,34.69215364,0.741159926,59
Note
The merging conflicts should be resolved first. That will allow the workflow to run successfully.
Create principal:
az ad sp create-for-rbac --name "<service-principal-name>" --role contributor \
--scopes /subscriptions/<subscription-id>/resourceGroups/<your-resource-group-name> \
--json-auth
Get logs from building the image during deployment:
az ml online-deployment get-logs -e <endpoint-name> -n <deployment-name> -l 100