ML/IoT/DevOps Hands on workshop
- 09:30-10:00 Workshop overview, scope, expectations
-
10:00-10:50 Dev environment setup: Azure ML service Workspace and Azure Notebooks, authenticate, prepare compute (Azure ML Compute)
-
Install
- Visual Studio Code
- GitHub Desktop
- (optional) Internet browser of your choice (Edge is fine, Chrome is also good)
-
Check Azure subscription
- All attendee should be able to sign in
-
Create an AML service workspace
- region: East US
- resource group: new (one per person for practice)
- after creation, check
Usage + quotas
, Standard NC Family vCPUs: should have 100+ available dedicated cores for this workshop (e.g., 5 people * 6 cores * 4 nodes = 120 cores)
-
(optional) Add users in
Access Control (IAM)
-
From Azure ML service Workspace
Overview
tab, clickDownload config.json
, save locally. -
Set up Notebook environment. In this workshop, use Option 1 to practice.
- Option 1: using Azure Notebooks
- Import the AML sample from GitHub
- GitHub repo to import: https://github.com/Azure/MachineLearningNotebooks
- private (if needed)
- create a folder named
aml_config
at root of the imported project, upload the config.json file.
- Import the AML sample from GitHub
- Option 2: using Notebook VMs from Azure ML service Workspace
- go to
Notebook VMs
, create a new VM (STANDARD_D3_V2) - Click
JupyterLab
, clickTerminal
- From the current directory
/mnt/azmnt/code/Users/
, cd (or mkdir if needed), git clone withgit clone https://github.com/Azure/MachineLearningNotebooks
- Note that the config.json is already automatically added to
/mnt/azmnt/
, you do not need to upload it manually. - From
Notebook VMs
, clickJupyter
, and you can run notebooks there
- go to
- Option 1: using Azure Notebooks
-
Create Azure ML Compute: To do that, open
configuration.ipynb
- Skip creating config.json, because you already have it
- Instead, add a cell and run following script to load the config.json and authenticate.
)
from azureml.core import Workspace ws = Workspace.from_config()
- Proceed to create Azure ML Compute
cpucluster
STANDARD_D2_V3, 0 to 4 nodesgpucluster
STANDARD_NC6, 0 to 4 nodes
-
-
11:00-11:50 Train first DL model on Azure Notebooks using Azure ML Compute
- Open sample notebook
train-hyperparameter-tune-deploy-with-keras.ipynb
underhow-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras
(find this notebook from your notebook environment) - Run (before
run.wait_for_complettion()
cell) - Monitor the Jupyter widget, and the Workspace (from Azure Portal - check Experiment and Compute)
- Additionally, note that files in
./outputs
and./logs
are automatically uploaded to the Workspace. Tensorboard logs should also be saved in this./logs
. Refer to how to train models and TensorBoard integration sample. - Try to understand how the model files are moving, from AML Compute, to Workspace, to local environment.
- Continue running the notebook and try hyperparameter tuning.
- Set
max_concurrent_job
parameter to the maximum number of nodes in your Azure ML Compute cluster. - Run, monitor the Jupyter widget and Azure Portal (AML service Workspace), evaluate the results
Note: Generally when you open the Notebook, you can see the last run results of the code cells, but Jupyter widget results are not shown. So in order to review last Widget run status without running the experiment again, you should find and load the run before using the widget. Sample notebook to do this is here.
- Set
- Stop here. You may continue and deploy to ACI from this notebook, but we will cover deployment in the afternoon.
- Open sample notebook
-
13:00-14:50 Distributed training with Horovod on AML Compute, explore AML Workspace
- Open sample notebook
distributed-pytorch-with-horovod.ipynb
underhow-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod
(find this notebook from your notebook environment) - Run all: consider using 4 nodes when available instead of 2 as
node_count
. - Questions and answers, or proceed to the next step.
- Open sample notebook
-
15:00-16:50 Create container images, deploy to Azure Container Instance (and/or Azure Kubernetes Service)
- We will continue from morning's sample,
train-hyperparameter-tune-deploy-with-keras.ipynb
. Open the notebook, and run the latter part, creating container image and deploying to ACI. - Explore Workspace from Azure Portal.
- Refresh the concepts of MLOps from concept-model-management-and-deployment
-
If time permits, try below contents in addition:
- Build 2019 updates: New Azure Machine Learning updates simplify and accelerate the ML lifecycle
- visual-interface (preview)
- automated ml with GUI (preview)
- interpretability-explainability
- onnx
- fpga
- pipelines
- Enterprise Readiness
- Distributed Training with SR-IOV
- DeepSpeed
- Model Inference Optimization (private)
- custom vision
-
Running AML SDK on Azure Databricks
- Set up Azure Databricks using this guide
- Create a cluster, and import the sample notebook.
- Install
azureml-sdk[automl_databricks]
if needed. - Run samples.
- For Automated ML sample, set
max_concurrent_iterations
to the number of worker nodes.
- For Automated ML sample, set
-
And check out MLOps. This will be covered in Day 3, but will be good if we can get familiar with key concepts earlier.
- We will continue from morning's sample,
-
17:00-17:50 Questions and answers