# Use Automated Machine Learning

_**Note**: This exercise assumes you have completed the steps in the **00a -  Azure ML Workspace.ipynb** notebook to create an Azure Machine Learning workspace and provision compute targets._

Machine learning uses mathematics and statistics to predict unknown values. For example, suppose *Adventure Works Cycles* is a business that rents cycles in a city. The business could use historic data to train a model that predicts daily rental demand in order to make sure sufficient staff and cycles are available.

<p style='text-align:center'><img src='./images/adventureworks.png' alt='Adventure Works cycle rental location, on a cloudy day in January'/></p>

To do this, Adventure Works could create a machine learning model that takes information about a specific day (the day of week, the expected weather conditions, and so on) as an input, and produces the predicted number of rentals as an output.

Mathematically, you can think of machine learning as a way of defining a function (let's call it ***f***) that operates on one or more *features* of something (which we'll call ***x***) to calculate a predicted *label* (***y***) - like this:

$$y = f(x)$$

In this bicycle rental example, the details about a given day (day of the week, weather, and so on) are the *features* (***x***), the number of rentals for that day is the *label* (***y***), and the function (***f***) that calculates the number of rentals based on the information about the day is the machine learning model.

The operation that the ***f*** function performs on *x* to calculate *y* depends on a number of factors, including the type of model you're trying to create and the specific algorithm used to train the model.

In this case, the label that the model will predict is a numeric value (the number of rentals), so the model must be trained using a *regression* algorithm, of which there are many. Selecting the right algorithm and data preparation steps is the most time-consuming part of machine learning, and usually involves a lot of trial and error. Azure Machine Learning can help reduce the complexity and time taken through *automated machine learning*, which uses scalable compute resources in the cloud to try multiple algorithms and preprocessing steps in parallel and find the best performing model for your data.

## Create a dataset

In Azure Machine Learning, data for model training and other operations is usually encapsulated in an object called a *dataset*.

1. Sign into [Azure Machine Learning studio](https://ml.azure.com) and on the **Datasets** page (under **Assets**), create a new dataset ***from web files*** with the following settings:
    - **Basic Info**:
        - **Web URL**: https://aka.ms/bike-rentals
        - **Name**: bike-rentals
        - **Dataset type**: Tabular
        - **Description**: Bicycle rental data
    - **Settings and preview**:
        - **File format**: Delimited
        - **Delimiter**: Comma
        - **Encoding**: UTF-8
        - **Column headers**: Use headers from first file
        - **Skip rows**: None
    - **Schema**:
        - Include all columns other than **Path**
        - Review the automatically detected types
    - **Confirm details**:
        - Do not profile the dataset after creation
2. After the dataset has been created, open it and view the **Explore** page to see a sample of the data. This data contains historical features and labels for bike rentals.

> **Citation**: *This data is derived from [Capital Bikeshare](https://www.capitalbikeshare.com/system-data) and is used in accordance with the published data [license agreement](https://www.capitalbikeshare.com/data-license-agreement).*

## Run an automated machine learning experiment

In Azure Machine Learning, operations that you run are called *experiments*. Follow the steps below to run an experiment that uses automated machine learning to train a regression model that predicts bicycle rentals.

1. In [Azure Machine Learning studio](https://ml.azure.com), view the **Automated ML** page (under **Author**).
2. Create a new Automated ML run with the following settings:
    - **Select dataset**:
        - **Dataset**: bike-rentals
    - **Configure run**:
        - **Experiment name**: auto-train-bike-rental
        - **Target column**: rentals
        - **Training compute target**: aml-cluster
    - **Task type and settings**:
        - **Task type**: Regression
        - **Additional configuration settings:**
            - **Primary metric**: Normalized root mean square error - *more about this metric later!*
            - **Explain best model**: Unselected - *this option causes automated machine learning to calculate feature importance for the best model; making it possible to determine the influence of each feature on the predicted label*
            - **Blocked algorithms**: *Block **all** other than **RandomForest** and **LightGBM** - normally you'd want to try as many as possible, but doing so can take a long time!*
        - **Featurization settings**:
            - **Enable Featurization**: Unselected.
3. When you finish submitting the automated ML run details, it will start automatically. Wait for the run status to change from *Preparing* to *Running* (this may take five minutes or so, as the cluster nodes need to be initialized before training can begin - now might be a good time for a coffee break!). You may need to select **&#8635; Refresh** periodically.
4. When the run status changes to *Running*, view the **Models** tab and observe as each possible combination of training algorithm and pre-processing steps is tried and the performance of the resulting model is evaluated. The page will automatically refresh periodically, but you can also select **&#8635; Refresh**.
5. After a few models have been trained and evaluated (with a status of **Completed**), select **&#10754; Cancel** to cancel the remaining iterations

## Review the best model

Although you canceled the automated machine learning run, some models were trained; so you can review the best performing one.

1. On the **Details** tab of the automated machine learning run, note the recommended model.

    This recommendation is based on the performance metric you specified (*Normalized root mean square error*). To calculate this metric, the training process used some of the data to train the model, and applied a technique called *cross-validation* to iteratively test the trained model with data it wasn't trained with and compare the predicted value with the actual known value. The difference between the predicted and actual value (known as the *residuals*) indicates the amount of *error* in the model, and our performance metric is calculated by squaring the errors across all of the test cases, finding the mean of these squares, and then taking the square root. What all of this means is that smaller this value is, the more accurately the model is predicting.
2. Select the algorithm name for the best model to view its details, and note that you can see all of the run metrics that give statistical information about the performance of the model.
3. Select the **Visualizations** tab and review the charts that show the performance of the model by comparing the predicted values against the true values, and showing the *residuals* (differences between predicted and actual values) as a histogram.

The **Predicted vs. True** chart should show a diagonal trend in which the predicted value correlates closely to the true value. A dotted line shows how a perfect model should perform, and the closer the line for your model's average predicted value is to this, the better its performance. A histogram below the line chart shows the distribution of true values.

<p style='text-align:center'><img src='./images/predicted-vs-true.png' alt='Predicted vs True chart'/></p>

The **Residual Histogram** shows the frequency of residual value ranges. Residuals represent variance between predicted and true values that can't be explained by the model - in other words, errors; so what you should hope to see is that the most frequently occurring residual values are clustered around 0 (in other words, most of the errors are small), with fewer errors at the extreme ends of the scale.

<p style='text-align:center'><img src='./images/residual-histogram.png' alt='Residuals histogram'/></p>

## Deploy a predictive service

After you've used automated machine learning to train some models, you can deploy the best performing model as a service for client applications to use.

In Azure Machine Learning, you can deploy a service as an Azure Container Instances (ACI) or to an Azure Kubernetes Service (AKS) cluster. For production scenarios, an AKS deployment is recommended.

1. On the page for the best-performing model you trained, select **Deploy**. Then deploy the model with the following settings:
    - **Name**: predict-rentals
    - **Description**: Predict cycle rentals
    - **Compute type**: AKS
    - **Compute name**: aks-cluster
    - **Enable authentication**: Selected
    - **Type**: key-based authentication.
2. Wait for the deployment to start - this may take a few seconds.
3. In Azure Machine Learning studio, view the **Endpoints** page and find the **predict-rentals** real-time endpoint.
4. Select the **predict-rentals** endpoint and note the **Deployment state**. If it is *Transitioning*, wait a few minutes and refresh the page until it is *Healthy*.
5. When the deployment state is healthy, select the **Consume** tab and note the following information there. You need this to connect to your deployed service from a client application.
    - The REST endpoint for your service
    - the Primary Key for your service
6. Note that you can use the &#10697; link next to these values to copy them to the clipboard.

## Test the deployed service

Now that you've deployed a service, you can test it by using the code below to connect to your published service and get predictions for bicycle rentals for a five day period.

> Don't worry too much about the details of the code. The point is just to verify that your published service works.

1. On the **Consume** page for the **predict-rentals** service, copy the REST endpoint for your service and paste the key into the code below, replacing YOUR_ENDPOINT.
2. Copy the Primary Key for your service and paste the key into the code below, replacing YOUR_KEY.
3. Use the **&#9655;** button next to the cell to run the code, and verify that predicted number of rentals for each day in the five day period are returned.

In [None]:
endpoint = 'YOUR_ENDPOINT' # Replace with your endpoint
key = 'YOUR_KEY' # Replace with your key

import json
import requests

#An array of features based on five-day weather forecast
x = [[1,1,2022,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446],
    [2,1,2022,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539],
    [3,1,2022,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309],
    [4,1,2022,1,0,2,1,1,0.2,0.212122,0.590435,0.160296],
    [5,1,2022,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869]]

#Convert the array to JSON format
input_json = json.dumps({"data": x})

#Set the content type and authentication for the request
headers = {"Content-Type":"application/json",
        "Authorization":"Bearer " + key}

#Send the request
response = requests.post(endpoint, input_json, headers=headers)

#If we got a valid response, display the predictions
if response.status_code == 200:
    y = json.loads(response.json())
    print("Predictions:")
    for i in range(len(x)):
        print (" Day: {}. Predicted rentals: {}".format(i+1, max(0, round(y["result"][i]))))
else:
    print(response)

You've successfully used the automated machine learning capability in Azure Machine Learning to train and deploy a predictive model.

Now that you've finished the exercise, you should delete the endpoint you deployed.

If you decide not to complete the next lab, edit your compute cluster to reset the minimum number of nodes to 0, and delete the inference cluster in order to avoid leaving your compute running and incurring unnecessary charges to your Azure subscription. Alternatively, if you're finished exploring Azure Machine Learning, delete the entire resource group in your Azure subscription.