Azure ML Workshop

Materials for Workshops / Talks / Blog posts on Azure ML Service

Welcome!

Azure Machine Learning is platform Machine Learning service on Microsoft Azure platform. It is intended for all kinds of users:

Developers not very familiar with Machine Learning can use AutoML or Designer to develop models through the UI
Data Scientists can use Azure ML for enhancing their experience both when training on local compute, and on scalable compute clusters.

Starting to use Azure ML requires getting to know some concepts. The simplest way to start using Azure ML relatively painlessly is to use Visual Studio Code.

Creating Workspace

The simplest way to create Azure ML Workspace is manually through Azure Portal. Please go ahead and create one.

You can also create a workspace through Azure Resource Manager Template or Azure CLI:

az extension add -n ml -y
az group create -n myazml -l westus2
az ml workspace create -w myworkspace -g myazml

Note: You need to have recent version of Azure CLI installed. The easiest way to do it is to pip-install using pip install azure-cli, or pip install --upgrade azure-cli.

Using AutoML

AutoML is a service to automatically calculate the best possible algorithm that gives best results for the given set of data. It tries different combinations of algorithms from SkLearn on the data in turn, using scalable cluster resources.

Select Automated ML in the ML Portal menu.
Chose New Automated ML Run
In the Select dataset part, chose Create Dataset -> From Open Datasets
Type MNIST in the search box, select the dataset and click Next.
Name the dataset MNIST, leave all options intact and click Create.
Chose the dataset on Automated ML dialog and click Next
Enter the experiment name
For target column, select Column 785
For the compute cluster, chose the cluster you have created earlier. Click Next
Select Classification task type. Observe settings that are available under View additional configuration settings and View featurization settings.
When you are ready -- click Finish.
Go to Experiments tab and explore the results. Automated ML takes a very long time to run through all algorithms, but you can see intermediate results.
On the experiment results, select Include child runs to see the results of all experiments.

Unselect Include child runs, click on one of the runs that took most time to execute, and select Models tab to observe the accuracy of individual models.

Training Titanic Prediction Model using AutoML

Define Titanic Dataset:
- Select "Tabular" as dataset type
- Use "from the web", and specify the URL to the titanic.csv file in this repository (this one)
Explore the dataset and see which fields are available.
Create new AutoML Experiment with Titanic dataset
- Select "Classification" as experiment type
- Select "Featurization options" and leave only relevant fields. Also, you may specify field types (although AutoML does a good job figuring those out automatically)
- Look at the other options available
- You may select to use Deep Learning models as well
Submit the experiment and wait for it to complete
Look at the results and find the model with highest accuracy
You may also try to look at the interpretability of the best model and see which were the most significant features.

Using Azure ML Designer

Azure ML designer is supposed to be a simple UI tool to create and run complex Azure ML pipelines (multi-step experiments).

Select the Designer tab on the left-hand-side menu on the ML Portal.
Create a new experiment using Sample 12: Multi-Class Classification - Letter Recognition
Examine the way experiment is composed.
Run the experiment.
Monitor the results be chosing the last rectangle Evaluate and clicking view button on the Outputs tab.

Using Azure ML From Visual Studio Code

Prerequisites

If you do not have Visual Studio Code installed --- install it.
Install Azure ML Extension and all other required extensions.
Install recent version of Azure CLI with Azure ML extension (see documentation for details):

pip install --upgrade azure-cli
az extension add -n ml -y
az extension update -n ml

Running local MNIST Training Script

Open the workshop directory (this directory) in VS Code by typing: code .
Examine train_local.py script. It downloads MNIST dataset from the internet, and then trains simple Scikit Learn model to classify handwritten digits.
Run train_local.py (either completely, or in Python interactive console line by line) and observe the accuracy.

Running training script on the cluster

Make sure your Azure ML Extension is connected to your cloud account and you can see your workspace in the MACHINE LEARNING section of Azure bar:

Observe train_universal.py script --- it is a training script that can be run both locally and submitted to Azure ML for training. Note that is almost the same as train_local.py --- except the code for showing digits is removed, and a few lines for logging training results are added:

from azureml.core.run import Run
try:    
    run = Run.get_submitted_run()
except:
    run = None
...
if run is not None:
    run.log('accuracy', acc)

To run an experiment inside Azure ML Workspace, we need to create compute. Compute defines a computing resource which is used for training/inference. You can use your local machine, or any cloud resources. In our case, we will use AmlCompute cluster. Please create a scalable cluster of STANDARD_DS3_v2 machines, with min=0 and max=4 nodes. There are several ways to create the cluster:
- Through web interface in Azure ML Portal. Go to Compute section, and add new Compute Cluster. We suggest you follow this path if you are doing it for the first time.
- From VS Code Azure ML extension blade, go the workspace, look for Compute section, and click on + to the right of Compute clusters section. This will open YAML file, in which you can define the configuration of your cluster, and then submit it from VS Code environment.
- Through Azure CLI command:

az ml compute create -n AzMLCompute --type amlcompute --min-instances 0 --max-instances 4

To submit train_universal.py to Azure ML once the cluster has been created, we need to create YAML description file. This can be done by right-clicking on train_universal.py file in VS Code, and selecting Azure ML - Create Job. This will open the editor with pre-populated YAML file,
In the YAML file, you can press Ctrl-Space in many places to initiate auto-complete. The file defines:
- Script that needs to be run
- Environment, which is essentially a container that is created to perform training on a remote resource. Clicking Ctrl-Space gives you the list of predefined environments. You can also define your own, based on starting container and conda/pip environment specification.

At the end, our YAML file should look like the following:

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code:
  local_path: c:\demo
command: python train_universal.py
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:4
compute:
  target: azureml:AzMLCompute

To submit this job to Azure ML, the easiest way is to click on the Azure ML icon in the top right corner of the YAML file editing screen. Once you have the YAML file, you can also submit it from command-line:

az ml job create -f submit-universal.yml`

One the job has been submitted, logs would be automatically streamed to VS Code terminal window. You can also observe the results in Azure ML Portal. Please note that the run may take several minutes to complete.

You now know that submitting runs to Azure ML is not complicated, and you get some goodies (like storing all statistics from your runs, models, etc.) for free.

Using Datasets

In the current example, we have been fetching MNIST data from the Internet each time the script is run. This is not a good practice. The right solution is to define a dataset, or upload the data to datastore.

You can define a dataset in any of the following ways:

Through the Azure ML Portal
By uploading the data programmatically
Implicitly, by specifying local data in the job YAML definition.

To prepare the data, run create_dataset.py (or create_dataset.ipynb). This will create a pickle file inside datasets directory which we will need later.

This time, let's use Keras to train a neural network for MNIST recognition. train_keras.py script contains the code, and it takes several command-line parameters. One required parameter is data_path, which expects the path to the folder where data is located.

We can create YAML file for job submission in the similar manner. It should look like this:

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code:
  local_path: d:\WORK\AzureMLStarter
command: python train_keras.py --data_path {inputs.mnist}
environment: azureml:AzureML-TensorFlow-2.3-CPU:20
compute:
  target: AzMLCompute
inputs:
  mnist:
    mode: mount
    data:
      local_path: d:\WORK\AzureMLStarter\dataset\mnist.pkl

Note the clever logging strategy we use in this example. We define a Keras callback object that records loss and accuracy at the end of each epoch. This allows us to see the training progress right on the Azure ML Portal, without any need for extra monitoring code.

Now it is easy for us to perform hyperparameter optimization by passing different arguments to the training script, and submitting multiple experiments. This can also be done programmatically, we we will outline later.

However, Azure ML allows you to define a special type of job - sweep job. It will automatically submit an experiment that will schedule a number of sub-experiments with distinct parameters.

Hyperparameter Optimization

For sweep job, we need to use special YAML file. To create it, open VS Code Command Palette by pressing F1, and select Azure ML: Create Job, then choose Sweep Job.

You can have a look at sweep job YAML file here. In short, in addition to the familiar fields that we have seen previously, it also contains the definition of parameter search space.

Parameter search space is defined by:

A number of parameters, each being either drawn from a given set of values (choice), or from a floating point distribution
A search algorithm: greedy search, random search or bayesian search
An objecive - which metric should be maximized or minimized

You can also define early termination criteria, e.g. when metrics is not rising fast enough, we can terminate training early, without wasting computational resources.

Once you submit such an experiment, Azure ML automatically schedules a bunch of experiment runs, and offers you a convenient choice of visualizations to select the best model at the end:

Experiment Submission and Hyperparameter Optimization through Python SDK

Now let's learn how to submit scripts programmatically through Python code, and how to do hyperparameter optimization:

Create small MNIST dataset for our experiments by running create_dataset.ipynb locally. It will create dataset subdirectory.
Download config.json file from your Azure Portal, which contains all credentials for accessing the Workspace, and place it in the current directory, or where your Jupyter notebook is.
Open submit.ipynb file in Jupyter Notebook. The easiest way would be to open in in VS Code, but you can also create a notebook in your Azure ML Workspace (in this case you would also have to create a VM to run it on) and upload all data there.
Go through all the steps in submit.ipynb notebook:
- Create a reference to ML workspace
- Create a reference to compute resource
- Upload data to the ML workspace
- Submit the simple experiment (please monitor the experiment on ML Portal after submission)
- Perform Hyperparameter optimization (please see the results on ML Portal after submission)
- Select and register the best model
After best model registration, you should see the model on the ML Portal under Models tab. Play with the options and see that you can deploy the model from the UI either as Azure Container Instance (ACI), or on Kubernetes cluster (AKS). You will need to supply the scoring Python script for that.

Experiment Submission through ML Portal

Recently, there has been another feature added to Azure ML Portal, allowing you to submit experiments through web interface. While this is typically less convenient (because you would need to fill many forms again to re-submit), you may find it more intuitive when starting to use Azure ML.

Train GAN Model to produce paintings

You can read in more detail about GAN Training in my blog post.

You can train the model on a number of paintings. Image above was trained on around 1000 images from WikiArt, which you would need to collect yourself, for example by using WikiArt Retriever, or borrowing existing collections from WikiArt Dataset or GANGogh Project.

Place images you want to train on somewhere in dataset directory. After that, follow instructions in submit_gan.ipynb notebook.

Cleaning up

Because using Azure ML is resource-intensive, if you are using your own Azure subscription, it is recommended to:

Delete the compute cluster (especially because auto-scale is turned off in our demo to save on cluster preparation time), or make sure the minimum # of nodes is set to 0
You may also delete the Azure ML Workspace and Resource group:

az ml workspace delete --w myworkspace -g myazml
az group delete -n myazml

Further Resources

Azure ML using CLI documentation
Azure ML via VS Code documentation
Azure ML Examples - a repository containing a lot of Azure ML examples, both using CLI, and Python SDK
Series of blog posts on Azure ML - slightly outdated, but they cover some concepts.
Getting FREE Azure - for Students, and for the rest of us

Have fun!

-- Dmitry Soshnikov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure ML Workshop

Materials for Workshops / Talks / Blog posts on Azure ML Service

Creating Workspace

Using AutoML

Training Titanic Prediction Model using AutoML

Using Azure ML Designer

Using Azure ML From Visual Studio Code

Prerequisites

Running local MNIST Training Script

Running training script on the cluster

Using Datasets

Hyperparameter Optimization

Experiment Submission and Hyperparameter Optimization through Python SDK

Experiment Submission through ML Portal

Train GAN Model to produce paintings

Cleaning up

Further Resources

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
images		images
.amlignore		.amlignore
.gitignore		.gitignore
README.md		README.md
create_dataset.ipynb		create_dataset.ipynb
create_dataset.py		create_dataset.py
submit-keras-sweep.yml		submit-keras-sweep.yml
submit-keras.yml		submit-keras.yml
submit-universal.yml		submit-universal.yml
submit.ipynb		submit.ipynb
submit.py		submit.py
submit_gan.ipynb		submit_gan.ipynb
train_gan.py		train_gan.py
train_keras.py		train_keras.py
train_local.py		train_local.py
train_universal.py		train_universal.py

CloudAdvocacy/AzureMLStarter

Folders and files

Latest commit

History

Repository files navigation

Azure ML Workshop

Materials for Workshops / Talks / Blog posts on Azure ML Service

Creating Workspace

Using AutoML

Training Titanic Prediction Model using AutoML

Using Azure ML Designer

Using Azure ML From Visual Studio Code

Prerequisites

Running local MNIST Training Script

Running training script on the cluster

Using Datasets

Hyperparameter Optimization

Experiment Submission and Hyperparameter Optimization through Python SDK

Experiment Submission through ML Portal

Train GAN Model to produce paintings

Cleaning up

Further Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages