- 1.1 Overview of the project
- 1.2 Architectural Diagram
- 1.3 Key Steps
- 1.4 Screen Recording
- 1.5 Standout Suggestions
- 1.6 Future improvement of project
- 1.7 Screenshots
This project uses Bankmarketing Dataset. We configure a cloud based machine learning model that is deployed and consumed followed by creating, publishing and consuming a pipeline.
-
In the first part of the project, we create and run AutoML experiment through Azure Machine Learning studio.
-
The automl run through ml studio with best model is deployed and consumed.
-
The second part shows python sdk using jupyter notebook to create AutoML run through pipeline class.
-
The pipeline run is published and consumed after completion.
-
The best model summary for AutoML run in ML studio is Voting Ensemble with an accuracy of 0.91927.
-
The best model summary produced by automl module for pipeline run is Voting Ensemble with an AUC weighted of 0.94709.
Create a new AutoML run by uploading and register of Bankmarketing Dataset followed by configuration of new compute cluster(Standard_DS12_v2) and run the experiment using Classification by enabling Explain best model parameter.
After the experiment ends, best model from models tab is selected for deployment. The best model is deployed using Azure Container Instance by enabling Authentication.
Enable application insights and retrieve logs by running logs.py script. Service gets updated and Application insights get enabled to True in Details tab for the endpoint.
Now Swagger URI is obtained for the endpoint model that contains swagger.json file. We use swagger shell script that runs Docker and pulls the swagger API followed by running swaaggerui on the port where all these are done locally so as to interact with swagger documentation. Run swagger.sh using bash command which starts running docker and run localhost in browser after executing serve.py script which provides HTTP server on a given port that shows swagger documentation for the model endpoint API present in azureml studio.
Consume the model endpoint by running enpoint.py script after replacing scoring URI with REST endpoint URL and primary key that was generated after deployment under consume tab of endpoints section. This script provides a json output as response result.
Use Apache Benchmark for benchmarking HTTP services by running benchmark.sh script to retrieve performance results like average response time for deployed model and enables timeout of server if it cannot produce response in a given amount of time.
This notebook demonstrates the use of AutoML step in Azure Machine Learning Pipeline.
- Using python sdk specific imports and create Workspace.
- Create a compute target and load the dataset.
- Use AutoML config to train and create pipeline for Automl run and submit pipeline experiment.
- Retrieve and save best model from pipeline run.
- Test the model using best fitted model by loading test data.
- Publishing the pipeline which enables a REST endpoint to run the pipeline from any HTTP library on any platform.
- Consume the pipeline endpoint by making a request.
The screencast video is present in files with name "project video.mp4".
Used Apache Benchmark to benchmark the endpoint that evaluates and shows the performance results. Benchmark runs against HTTP API successfully.
- GPU can be used instead of CPU as it enormously increases the speed.
- We can use modules available in designer like feature engineering and feature selection while pipeline automl run as it improves the model accuracy.
- Enable deep learning while specifying classification task type for autoML as it applies default techniques depending on the number of rows present in training dataset provided and applies train/validation split with required no.of cross validations without explicitly being provided.
- Use Azure Kubernetes Services(AKS) instead of Azure Container Insance(ACI) as AKS helps in minimizing infrastructure maintenance, uses automated upgrades, repairs, monitoring and scaling. This leads to faster development and integration.
- Use Dedicated virtual machine instead of low-priority as these vm do not guarantee compute nodes.
- ML studio showing the Registered Bankmarketing Dataset in the Datasets section.
- AutoML experiment showing the status as Completed.
- Best Model obtained through automl experiment run.
- Details tab of endpoint having Application insights as Enabled
- Running logs.py script showing the logs
- Swagger running on local host representing HTTP API methods and responses for the model
- endpoint.py script run producing Json output from model
- Apache Benchmark run against HTTP API with authentication keys to retrieve performance results.
- Pipeline section of ML Studio showing Pipeline created
- Pipeline section of ML Studio showing Pipeline endpoint
- Designer section showing Bankmarketing dataset with AutoML module
- Published pipeline overview showing Status as Active
- Jupyter notebook with RunDetails Widget
- ML studio showing Scheduled runs(first 2 runs are the pipeline runs)