Operationalizing Machine Learning

In this project, we are working with the Bank Marketing dataset. We use Azure to configure a cloud-based machine learning production model using AutoML, deploy it, and consume it. We are also creating, publishing, and consuming a pipeline.

This dataset is about a phone call marketing campaign. The original data can be found @UC Irvine Machine Learning Repository. The dataset can be used (as we are using) to predict if the client will subscribe to a term deposit or not. The target variable is y.

The lab environment provided by Udacity will not be used for this project. Instead, a local development environment along with a Microsoft Azure account will be used.

Architectural Diagram

In this project, We are following the below steps:

Authentication
Automated ML Experiment
Deploy the Best Model
Enable Logging
Swagger Documentation
Consume Model Endpoints
Create and Publish a Pipeline
Documentation

Image by Udacity

Key Steps

Authentication

Authentication is crucial for the continuous flow of operations. Continuous Integration and Delivery system (CI/CD) rely on uninterrupted flows. When authentication is not set properly, it requires human interaction and thus, the flow is interrupted. An ideal scenario is that the system doesn't stop waiting for a user to input a password. So whenever possible, it's good to use authentication with automation.

A “Service Principal” is a user role with controlled permissions to access specific resources. Using a service principal is a great way to allow authentication while reducing the scope of permissions, which enhances security.

We will use local environment for Authentication.

Main operations in the Authentication step are as follows:
- Use Git Bash to sign in Microsoft account using the az login command.
Git Bash screen showing the result 'az login' command
- Ensure the az command-line tool is installed along with the ml using the az extension add -n azure-cli-ml command.
Git Bash screen showing the result 'az extension add -n azure-cli-ml' command
- Create the Service Principal with az after login in using the az ad sp create-for-rbac --sdk-auth --name ml-auth command.
Git Bash screen showing the result 'az ad sp create-for-rbac --sdk-auth --name ml-auth' command
- Capture the "objectId" using the clientID. Use the following command:
az ad sp show --id xxxxxxxx-3af0-4065-8e14-xxxxxxxxxxxx

Git Bash screen showing the result 'az ad sp show --id xxxxxxxx-3af0-4065-8e14-xxxxxxxxxxxx' command
- Assign the role to the new Service Principal for the given Workspace, Resource Group, and User objectId. You will need to match your workspace, subscription, and ID. There should be no error in the output. Use the following command:
az ml workspace share -w xxx -g xxx --user xxxxxxxx-cbdb-4cfd-089f-xxxxxxxxxxxx --role owner

Git Bash screen showing the result 'az ml workspace share -w xxx -g xxx --user xxxxxxxx-cbdb-4cfd-089f-xxxxxxxxxxxx --role owner' command
Automated ML Experiment

In this step, we will create an experiment using Automated ML, configure a compute cluster, and use that cluster to run the experiment. We will use Azure portal for this purpose.

We will use the Bank Marketing dataset described above.

In the following section we will deploy the best model of this AutoML experiment.

We need to configure a compute cluster for this AutoML experiment. To do that we can either use an existing cluster or create a new one. We will create a new cluster with the following configuration:
- Region: eastus2
- Virtual machine priority: Low priority
- Virtual machine type: CPU
- Virtual machine size: Standart_DS12_v2
- Compute name: auto-ml
- Minimum number of nodes: 1
- Maximum number of nodes: 2
Run configuration for the autoML experiment is as follows:
- Experiment name (Create new): ml-experiment-1
- Target column: y
- Compute cluster name: auto-ml
- Primary metric: Accuracy
- Explain best model: Selected
- Exit criterion:
  - Training job time (hours): 1
- Concurrency:
  - Max concurrent iterations: 2
Main operations in the Automated ML Experiment step are as follows:
- Upload the bankmarketing_train.csv to Azure Machine Learning Studio so that it can be used when training the model.
Bank Marketing dataset
- Create a new AutoML run and select Bankmarketting Dataset
New AutoML Run
- Create a new AutoML experiment
A New Experiment
- Configure a new compute cluster
Create Compute Cluster
- Run the experiment using classification, ensure 'Explain best model' is checked
Additional Configurations

AutoML Run with status 'Running'
- Wait for the experiment to finish and explore the best model
AutoML Run with Status 'Completed'

Automated ML Tab Showing the Completed Experiment

Trained Models for Each Run

Best Model: VotingEnsemble
Deploy the Best Model

The best model in the previous step was a VotingEnsemble. Deploying the Best Model will allow to interact with the HTTP API service and interact with the model by sending data over POST requests.

We will use Azure portal for deployment.

Main operations in Deploy the best model step are as follows:
- Select the best model for deployment
Best Model Selected (Deploy status: No deployment yet)
- Deploy the model and enable Authentication
Best Model Selected (Deploy status: Running)
- Deploy the model using Azure Container Instance
Deployed Model (Endpoint) With a Healthy Deployment State
Enable Logging

We can now enable Application Insights and retrieve logs. Application Insights is a special Azure service that provides key facts about an application. It is a very useful tool to detect anomalies and visualize performance.

We will work on local environment in this step. But first, we need to download config.json from Azure portal.

Main operations in the Enable Logging step are as follows:
- Download config.json from ML Studio
How to download 'config.json'
- Write and run the code (logs.py) to enable Application insights
logs.py output in Git Bash

'Application Insights url' in Model Endpoint
- Explore Application insights
Application Insights
Swagger Documentation

In this step, we will consume the deployed model using Swagger. Swagger is a tool that helps build, document, and consume RESTful web services like the ones we are deploying in Azure ML Studio.

We will work on local environment in this step. But first, we need to download swagger.json from Azure portal. We can download swagger.json using endpoints tab in Azure ML Studio.

Main operations in the Swagger Documentation step are as follows:
- Download the swagger.json file.
Azure provides a swagger.json that is used to create a web site that documents the HTTP endpoint for a deployed model. We can find swagger.json URI in the Endpoints section.

Swagger URI

We need to download the swagger.json to the swagger folder. There should be 3 files in the swagger folder.

Local Directory
- Run the swagger.sh in git bash.
'swagger.sh' Output
- Run the serve.py in another git bash.
'serve.py' Output
- Interact with the swagger instance running with the documentation for the HTTP API for the model.
To do that we will first use the http://localhost/ in our browser to interact with the swagger instance running with the documentation for the HTTP API of the model then we will use http://localhost:8000/swagger.json to display the contents of the API for the model.

Swagger Page

More on Swagger Page

More on Swagger Page
Consume Model Endpoints

We can consume a deployed service via an HTTP API. An HTTP API is an URL that is exposed over the network so that interaction with a trained model can happen via HTTP requests.

Users can initiate an input request, usually via an HTTP POST request. HTTP POST is a request method that is used to submit data. The HTTP GET is another commonly used request method. HTTP GET is used to retrieve information from a URL. The allowed request methods and the different URLs exposed by Azure create a bi-directional flow of information.

The APIs exposed by Azure ML will use JSON (JavaScript Object Notation) to accept data and submit responses. It served as a bridge language among different environments.

We will work on local environment in this step. But first, we need to get scoring_uri and the keyfrom Azure portal.

Main operations in the Consume Model Endpoints step are as follows:
- Modify both the scoring_uri and the key (in endpoint.py) to match the key for our service and the URI that was generated after deployment.
scoring_uri and the key can be found on the 'consume' tab of the model endpoint.

Model Endpoint (Consume Tab)
- Run endpoint.py
'endpoint.py' Output

Benchmarking

A benchmark is used to create a baseline or acceptable performance measure. Benchmarking HTTP APIs is used to find the average response time for a deployed model.

One of the most significant metrics is the response time since Azure will timeout if the response times are longer than sixty seconds.

Apache Benchmark is an easy and popular tool for benchmarking HTTP services.

We will work on local environment in this step.

Main operations in the Benchmarking step are as follows:
- Make sure the Apache Benchmark command-line tool is installed and available in your path.
- In the endpoint.py, replace the key and URI again
- Run endpoint.py. A data.json file should appear
- Run the benchmark.sh file.
'benchmark.sh' Output

More on 'benchmark.sh' Output
Create and Publish a Pipeline

For this part of the project, we will use the Jupyter Notebook provided in the starter files folder and the Azure portal.

Main operations in the Create and Publish a Pipeline step are as follows:
- Upload the Jupyter Notebook
Jupyter Notebook on Azure Machine Learning Studio
- Update all the variables that are noted to match the environment and make sure that a config.json has been downloaded and is available in the current working directory.
Directory Structure
- Run through the cells
Jupyter Notebook Output - Dataset

Below you can find some important screenshots for this stage:

Running Pipeline on Azure Machine Learning Studio

Running and Completed Pipelines on Azure Machine Learning Studio

Pipeline Endpoints on Azure Machine Learning Studio

'Pipeline run overview' for Run 61 on Azure Machine Learning Studio

'Published pipeline overview' for Bankmarketting Dataset on Azure Machine Learning Studio

Run Details Output on Jupyter Notebook

More Run Details Output on Jupyter Notebook

More Run Details Output on Jupyter Notebook

More Run Details Output on Jupyter Notebook

Published Pipeline on Jupyter Notebook

More Run Details Output on Jupyter Notebook

'Experiment Run status' on Azure Machine Learning Studio

'Azure Machine Learning Studio Home Page with Runs and Computes

'Experiment Run status' on Azure Machine Learning Studio

Screen Recording

Project 2 Operationalizing Machine Learning Screencast

Note: You can refer to Screencast_text.txt if you have difficulty understanding screencast audio.

Future Work

Improvement for future experiments

The dataset is imbalanced. Although AutoML seems to handle imbalanced data we can try to handle it manually.

TensorFlow LinearClassifier and TensorFlow DNN seem to be blocked since they are not supported by AutML. We can try some HyperDrive runs with these estimators and see their performance.

Deep learning is disabled. We can include Deep learning and run a new AutoML pipeline.

We can select a smaller size of predictors using the feature importance visualizations generated by the model explainability feature of Azure AutoML and run a new AutoML pipeline. By doing this we may get better performance for our model.

We can try new HyperDrive runs with the best performing estimator (VotingEnsemble) to get a better score.

We can monitor the endpoint using Application Insights and detect performance anomalies and visualize performance. We can easily evaluate performance and identify pitfalls since we have already started benchmarking.

References

Machine Learning Engineer for Microsoft Azure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Operationalizing Machine Learning

Architectural Diagram

Key Steps

Screen Recording

Future Work

Improvement for future experiments

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
images		images
swagger		swagger
README.md		README.md
Screencast_text.txt		Screencast_text.txt
aml-pipelines-with-automated-machine-learning-step.ipynb		aml-pipelines-with-automated-machine-learning-step.ipynb
benchmark.sh		benchmark.sh
config.json		config.json
data.json		data.json
endpoint.py		endpoint.py
logs.py		logs.py

ErkanHatipoglu/nd00333_AZMLND_C2

Folders and files

Latest commit

History

Repository files navigation

Operationalizing Machine Learning

Architectural Diagram

Key Steps

Screen Recording

Future Work

Improvement for future experiments

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages