Copyright (c) Microsoft Corporation. All rights reserved.

# Learning Outcomes

**Learn how to train models on the cloud using Jupyter Notebooks.**   
**This involves creating resource groups, initializing resources (ie. Machine Learning service), VMs (called "computes") to do so**

### Resources used for this tutorial:

0. Signing up for azure. [link](https://azure.microsoft.com/en-us/free/)
1. Creating Workspace, VM (Compute) and running Jupyter Notebooks in Azure. [link](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup)
2. Downloading a sample dataset and training a model. [link](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-train)

### Additional Resources (To develop on your local machine):
1. Setting up Python SDK on your local machine. [link](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-environment#local)
2. Sample Jupyter Notebook Tutorial on training a model on the cloud. [link](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-train-models-with-aml)

What is a resource group? [link](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal)
Azure's Machine Learning Service [link](https://azure.microsoft.com/en-us/services/machine-learning/)

# Steps:
1. Create an azure account.
2. Create a resource group
3. Create a new "Machine Learning" resource
4. Create a compute in the service
5. Go to: [ml.azure.com](ml.azure.com), create, and develop on your notebook!
   
For this tutorial, we will be going over a slightly modified microsoft tutorial found [here](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-train).

# Tutorial: Train your first model

This tutorial is **part two of a two-part tutorial series**. In the previous tutorial, you created a workspace and chose a development environment. In this tutorial, you learn the foundational design patterns in Azure Machine Learning service, and train a simple scikit-learn model based on the diabetes data set. After completing this tutorial, you will have the practical knowledge of the SDK to scale up to developing more-complex experiments and workflows. 

In this tutorial, you learn the following tasks:

> * Connect your workspace and create an experiment 
> * Load data and train a scikit-learn model
> * View training results in the portal

## Prerequisites

The only prerequisite is to run the previous tutorial, Setup environment and workspace.

## Connect workspace and create experiment

Import the `Workspace` class, and load your subscription information from the file `config.json` using the function `from_config().` This looks for the JSON file in the current directory by default, but you can also specify a path parameter to point to the file using `from_config(path="your/file/path")`. If you are running this notebook in a cloud notebook server in your workspace, the file is automatically in the root directory.

If the following code asks for additional authentication, simply paste the link in a browser and enter the authentication token.

In [1]:
from azureml.core import Workspace
ws = Workspace.from_config()

Now create an experiment in your workspace. An experiment is another foundational cloud resource that represents a collection of trials (individual model runs). In this tutorial you use the experiment to create runs and track your model training in the Azure Portal. Parameters include your workspace reference, and a string name for the experiment.

In [3]:
from azureml.core import Experiment
experiment = Experiment(workspace=ws, name="diabetes-experiment")

## Load data and prepare for training

For this tutorial, you use the diabetes data set, which uses features like age, gender, and BMI to predict diabetes disease progression. Load the data from the Azure Open Datasets class, and split it into training and test sets using `train_test_split()`. This function segregates the data so the model has unseen data to use for testing following training.

In [4]:
from azureml.opendatasets import Diabetes
from sklearn.model_selection import train_test_split

x_df = Diabetes.get_tabular_dataset().to_pandas_dataframe().dropna()

print("Top 5 rows of the full dataset:")
print(x_df.head())
print("\n")
print('\n')
print("Shape (rows, cols) of full dataset:")
print(x_df.shape)

y_df = x_df.pop("Y")

X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=66)

print("Top 5 rows of the training (features) dataset:")
print(X_train.head())
print("\n")
print("Top 5 rows of the training (target) dataset:")
print(y_train.head())
print("Shape (rows, cols) of training (features) dataset:")
print(X_train.shape)
print("Shape (rows, cols) of testing (features) dataset:")
print(X_test.shape)
print("Shape (rows, cols) of training (target) dataset:")
print(y_train.shape)
print("Shape (rows, cols) of testing (target) dataset:")
print(y_test.shape)

Top 5 rows of the full dataset:
   AGE  SEX   BMI     BP   S1     S2    S3   S4      S5  S6    Y
0   59    2  32.1  101.0  157   93.2  38.0  4.0  4.8598  87  151
1   48    1  21.6   87.0  183  103.2  70.0  3.0  3.8918  69   75
2   72    2  30.5   93.0  156   93.6  41.0  4.0  4.6728  85  141
3   24    1  25.3   84.0  198  131.4  40.0  5.0  4.8903  89  206
4   50    1  23.0  101.0  192  125.4  52.0  4.0  4.2905  80  135




Shape (rows, cols) of full dataset:
(442, 11)
Top 5 rows of the training (features) dataset:
     AGE  SEX   BMI     BP   S1     S2    S3    S4      S5   S6
440   36    1  30.0   95.0  201  125.2  42.0  4.79  5.1299   85
389   47    2  26.5   70.0  181  104.8  63.0  3.00  4.1897   70
5     23    1  22.6   89.0  139   64.8  61.0  2.00  4.1897   68
289   28    2  31.5   83.0  228  149.4  38.0  6.00  5.3132   83
101   53    2  22.2  113.0  197  115.2  67.0  3.00  4.3041  100


Top 5 rows of the training (target) dataset:
440    220
389     51
5       97
289     68
101   

## Train a model

Training a simple scikit-learn model can easily be done locally for small-scale training, but when training many iterations with dozens of different feature permutations and hyperparameter settings, it is easy to lose track of what models you've trained and how you trained them. The following design pattern shows how to leverage the SDK to easily keep track of your training in the cloud.

Build a script that trains ridge models in a loop through different hyperparameter alpha values.

In [5]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.externals import joblib
import math

alpha = 1
run = experiment.start_logging()
run.log("alpha_value", alpha)

model = Ridge(alpha=alpha)
model.fit(X=X_train, y=y_train)
y_pred = model.predict(X=X_test)
rmse = math.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred))
run.log("rmse", rmse)
print(f"RMSE for alpha {alpha}: {rmse}")
model_name = "model_alpha_" + str(alpha) + ".pkl"
filename = "outputs/" + model_name

joblib.dump(value=model, filename=filename)
run.upload_file(name=model_name, path_or_stream=filename)
run.complete()

RMSE for alpha 1: 56.661108984990534


The above code accomplishes the following:

1. For each alpha hyperparameter value in the `alphas` array, a new run is created within the experiment. The alpha value is logged to differentiate between each run.
1. In each run, a Ridge model is instantiated, trained, and used to run predictions. The root-mean-squared-error is calculated for the actual versus predicted values, and then logged to the run. At this point the run has metadata attached for both the alpha value and the rmse accuracy.
1. Next, the model for each run is serialized and uploaded to the run. This allows you to download the model file from the run in the portal.
1. At the end of each iteration the run is completed by calling `run.complete()`.



After the training has completed, call the `experiment` variable to fetch a link to the experiment in the portal.

In [6]:
experiment

Name,Workspace,Report Page,Docs Page
diabetes-experiment,ml-ss2019,Link to Azure Machine Learning studio,Link to Documentation


## View training results in portal

Following the **Link to Azure Portal** takes you to the main experiment page. Here you see all the individual runs in the experiment. Any custom-logged values (`alpha_value` and `rmse`, in this case) become fields for each run, and also become available for the charts and tiles at the top of the experiment page. To add a logged metric to a chart or tile, hover over it, click the edit button, and find your custom-logged metric.

When training models at scale over hundreds and thousands of runs, this page makes it easy to see every model you trained, specifically how they were trained, and how your unique metrics have changed over time.

![Main Experiment page in Portal](imgs/experiment_main.png)

Clicking on a run number link in the `RUN NUMBER` column takes you to the page for each individual run. The default tab **Details** shows you more-detailed information on each run. Navigate to the **Outputs** tab, and you see the `.pkl` file for the model that was uploaded to the run during each training iteration. Here you can download the model file, rather than having to retrain it manually.

![Run details page in Portal](imgs/model_download.png)

## Clean up resources

Do not complete this section if you plan on running other Azure Machine Learning service tutorials.

### Stop the notebook VM

If you used a cloud notebook server, stop the VM when you are not using it to reduce cost.

1. In your workspace, select **Compute**.

1. Select the **Notebook VMs** tab in the compute page.

1. From the list, select the VM.

1. Select **Stop**.

1. When you're ready to use the server again, select **Start**.

### Delete everything

If you don't plan to use the resources you created, delete them, so you don't incur any charges:

1. In the Azure portal, select **Resource groups** on the far left.

1. From the list, select the resource group you created.

1. Select **Delete resource group**.

1. Enter the resource group name. Then select **Delete**.

You can also keep the resource group but delete a single workspace. Display the workspace properties and select **Delete**.

## Next steps

In this tutorial, you did the following tasks:

> * Connected your workspace and created an experiment
> * Loaded data and trained scikit-learn models
> * Viewed training results in the portal and retrieved models

[Deploy your model](https://docs.microsoft.com/azure/machine-learning/service/tutorial-deploy-models-with-aml) with Azure Machine Learning.
Learn how to develop [automated machine learning](https://docs.microsoft.com/azure/machine-learning/service/tutorial-auto-train-models) experiments.