<center><h1> Microsoft Azure AI Fundamentals: Explore Visual Tools for Machine Learning </h1></center>

<center><h2> Introduction </h2></center>

Machine Learning is the foundation for most artificial intelligence solutions, and the creation of an intelligent solution often begins with the use of machine learning to train a predictive model using historic data that you have collected.

Azure Machine Learning is a cloud service that you can use to train and manage machine learning models.

In this module, you'll learn how to:

- Identify different kinds of machine learning model.
- Use the automated machine learning capability of Azure Machine Learning to train and deploy a predictive model.

To complete this module, you'll need a Microsoft Azure subscription. If you don't already have one, you can sign up for a free trial at https://azure.microsoft.com.

<center><h2>What is Machine Learning?</h2></center>

Machine learning is a technique that uses mathematics and statistics to create a model that can predict unknown values.

<img src = "images/image1.png" width = 500 height = 500 />

For example, suppose Adventure Works Cycles is a business that rents cycles in a city. The business could use historic data to train a model that predicts daily rental demand in order to make sure sufficient staff and cycles are available.

To do this, Adventure Works could create a machine learning model that takes information about a specific day (the day of week, the anticipated weather conditions, and so on) as an input, and predicts the expected number of rentals as an output.

Mathematically, you can think of machine learning as a way of defining a function (let's call it $f$) that operates on one or more features of something (which we'll call $x$) to calculate a predicted label ($y$) - like this:

\begin{equation}
f(x) = y
\end{equation}

In this bicycle rental example, the details about a given day (day of the week, weather, and so on) are the features $(x)$, the number of rentals for that day is the label $(y)$, and the function $(f)$ that calculates the number of rentals based on the information about the day is encapsulated in a machine learning model.

The specific operation that the $f$ function performs on $x$ to calculate $y$ depends on a number of factors, including the type of model you're trying to create and the specific algorithm used to train the model. Additionally in most cases, the data used to train the machine learning model requires some pre-processing before model training can be performed.





<h3> Azure Machine Learning </h3>

Training and deploying an effective machine learning model involves a lot of work, much of it time-consuming and resource-intensive. Azure Machine Learning is a cloud-based service that helps simplify some of the tasks and reduce the time it takes to prepare data, train a model, and deploy a predictive service. In the rest of this unit, you'll explore Azure Machine Learning, and in particular its _automated machine learning capability_.
<hr>

<center> <h2>Create an Azure Machine Learning workspace </h2></center>

Data scientists expend a lot of effort exploring and pre-processing data, and trying various types of model-training algorithms to produce accurate models, which is time consuming, and often makes inefficient use of expensive compute hardware.

Azure Machine Learning is a cloud-based platform for building and operating machine learning solutions in Azure. It includes a wide range of features and capabilities that help data scientists prepare data, train models, publish predictive services, and monitor their usage. Most importantly, it helps data scientists increase their efficiency by automating many of the time-consuming tasks associated with training models; and it enables them to use cloud-based compute resources that scale effectively to handle large volumes of data while incurring costs only when actually used.



<h3> Create an Azure Machine Learning workspace </h3>


To use Azure Machine Learning, you create a workspace in your Azure subscription. You can then use this workspace to manage data, compute resources, code, models, and other artifacts related to your machine learning workloads.


<mark style="background-color: #FFFF00">Note: This module is one of many that make use of an Azure Machine Learning workspace, including the other modules in the Create no-code predictive models with Azure Machine Learning learning path. If you are using your own Azure subscription, you may consider creating the workspace once and reusing it in other modules. Your Azure subscription will be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription, so we recommend you delete the Azure Machine Learning workspace when it is no longer required.</mark>


**If you don't already have one, follow these steps to create a workspace:**

1. Sign into the [Azure portal](https://portal.azure.com/) using the Microsoft credentials associated with your Azure subscription.
2. Select **＋Create a resource**, search for Machine Learning, and create a new Machine Learning resource the following settings:
    - **Subscription:** Your Azure subscription
    - **Resource group:** Create or select a resource group
    - **Workspace name:** Enter a unique name for your workspace
    - **Region:** Select the geographical region closest to you
    - **Storage account:** Note the default new storage account that will be created for your workspace
    - **Key vault:** Note the default new key vault that will be created for your workspace
    - **Application insights:** Note the default new application insights resource that will be created for your workspace
    - **Container registry:** None (one will be created automatically the first time you deploy a model to a container)


3. Wait for your workspace to be created (it can take a few minutes). Then go to it in the portal.
4. On the Overview page for your workspace, launch Azure Machine Learning studio (or open a new browser tab and navigate to https://ml.azure.com), and sign into Azure Machine Learning studio using your Microsoft account. If prompted, select your Azure directory and subscription, and your Azure Machine Learning workspace.
5. In Azure Machine Learning studio, toggle the ☰ icon at the top left to view the various pages in the interface. You can use these pages to manage the resources in your workspace.

You can manage your workspace using the Azure portal, but for data scientists and Machine Learning operations engineers, Azure Machine Learning studio provides a more focused user interface for managing workspace resources.

<hr>

<center> <h2>Create compute resources </h2></center>

After you have created an Azure Machine Learning workspace, you can use it to manage the various assets and resources you need to create machine learning solutions. At its core, Azure Machine Learning is a platform for training and managing machine learning models, for which you need compute on which to run the training process.

<h3> Create a compute cluster </h3>

Compute targets are cloud-based resources on which you can run model training and data exploration processes.

In [Azure Machine Learning studio](https://ml.azure.com/), view the **Compute** page (under **Manage**). This is where you manage the compute targets for your data science activities. There are four kinds of compute resource you can create:

- **Compute Instances:** Development workstations that data scientists can use to work with data and models.
- **Compute Clusters:** Scalable clusters of virtual machines for on-demand processing of experiment code.
- **Inference Clusters:** Deployment targets for predictive services that use your trained models.
- **Attached Compute:** Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters.


<mark style="background-color: #FFFF00">Note: Compute instances and clusters are based on standard Azure virtual machine images. For this module, the Standard_DS11_v2 image is recommended to achieve the optimal balance of cost and performance. If your subscription has a quota that does not include this image, choose an alternative image; but bear in mind that a larger image may incur higher cost and a smaller image may not be sufficient to complete the tasks. Alternatively, ask your Azure administrator to extend your quota.</mark>

1. Switch to the **Compute Clusters** tab, and add a new compute cluster with the following settings. You'll use this to train a machine learning model:
    - **Location:** Select the same as your workspace. If that location is not listed, choose the one closest to you
    - **Virtual Machine tier:** Dedicated
    - **Virtual Machine type:** CPU
    - **Virtual Machine size:**
        - Choose **Select from all options**
        - Search for and select **Standard_DS11_v2**
    - **Compute name:** enter a unique name
    - **Minimum number of nodes:** 0
    - **Maximum number of nodes:** 2
    - **Idle seconds before scale down:** 120
    - **Enable SSH access:** Unselected

The compute cluster will take some time to be created. You can move onto the next unit while you wait.

<hr>

<center> <h2>Explore data </h2></center>

Machine learning models must be trained with existing data. In this case, you'll use a dataset of historical bicycle rental details to train a model that predicts the number of bicycle rentals that should be expected on a given day, based on seasonal and meteorological features.



<h3> Create a dataset </h3>

In _Azure Machine Learning_, data for model training and other operations is usually encapsulated in an object called a dataset.

1. View the comma-separated data at https://aka.ms/bike-rentals in your web browser.

2. In [Azure Machine Learning studio](https://ml.azure.com/), view the **Datasets** page. Datasets represent specific data files or tables that you plan to work with in Azure ML.

3. Create a new dataset **from web files**, using the following settings:

- **Basic Info:**
    - **Web URL:** https://aka.ms/bike-rentals
    - **Name:** bike-rentals
    - **Dataset type:** Tabular
    - **Description:** Bicycle rental data
- **Settings and preview:**
    - **File format:** Delimited
    - **Delimiter:** Comma
    - **Encoding:** UTF-8
    - **Column headers:** Only first file has headers
    - **Skip rows:** None
    - **Dataset contains multi-line data:** Do not select
- **Schema:**
    - Include all columns other than **Path**
    - Review the automatically detected types
- **Confirm details:**
    - Do not profile the dataset after creation
4. After the dataset has been created, open it and view the **Explore** page to see a sample of the data. This data contains historical features and labels for bike rentals.

<hr>

<center> <h2> Train a machine learning model </h2></center>

Azure Machine Learning includes an _automated machine learning_ capability that automatically tries multiple pre-processing techniques and model-training algorithms in parallel. These automated capabilities use the power of cloud compute to find the best performing supervised machine learning model for your data.

Note

The automated machine learning capability in Azure Machine Learning supports supervised machine learning models - in other words, models for which the training data includes known label values. You can use automated machine learning to train models for:

- Classification (predicting categories or classes)
- Regression (predicting numeric values)
- Time series forecasting (predicting numeric values at a future point in time)

<h3> Run an automated machine learning experiment </h3>


In [Azure Machine Learning](https://ml.azure.com/home), operations that you run are called experiments. Follow the steps to run an experiment that uses automated machine learning to train a regression model that predicts bicycle rentals.

1. In Azure Machine Learning studio, view the **Automated ML** page (under **Author**).

2. Create an Automated ML run with the following settings:

    - **Select dataset:**
    - **Dataset:** bike-rentals
- **Configure run:**
    - **New experiment name:** mslearn-bike-rental
    - **Target column:** rentals (this is the label that the model is trained to predict)
    - **Select compute cluster:** the compute cluster that you created previously
- **Select task and settings:**
    - **Task type:** Regression (the model predicts a numeric value)
    
    <img src = "images/image2.png" width = 1000 height = 1000 >
    
Notice under task type there are settings View additional configuration settings and View Featurization settings. Now configure these settings.

- **Additional configuration settings:**
    - **Primary metric:** Select Normalized root mean squared error (more about this metric later!)
    - **Explain best model:** Selected — this option causes automated machine learning to calculate feature importance for the best - model which makes it possible to determine the influence of each feature on the predicted label.
    - **Use all supported models:** Unselected. You'll restrict the experiment to try only a few specific algorithms.
    - **Allowed models:** Select only RandomForest and LightGBM — normally you'd want to try as many as possible, but each model added increases the time it takes to run the experiment.
    - **Exit criterion:**
        - **Training job time (hours):** 0.5 — ends the experiment after a maximum of 30 minutes.
        - **Metric score threshold:** 0.085 — if a model achieves a normalized root mean squared error metric score of 0.085 or less, the experiment ends.
    - **Concurrency:** do not change
- **Featurization settings:**
- Enable featurization: Selected — automatically preprocess the features before training.

Click Next to go to the next selection pane.

- **[Optional] Select the validation and test type**
    - **Validation type:** Auto
    - **Test dataset (preview):** No test dataset required

3. When you finish submitting the automated ML run details, it starts automatically. Wait for the run status to change from Preparing to Running.

4. When the run status changes to Running, view the **Models** tab and observe as each possible combination of training algorithm and pre-processing steps is tried and the performance of the resulting model is evaluated. The page automatically refreshes periodically, but you can also select **↻ Refresh**. It might take 10 minutes or so before models start to appear, as the cluster nodes must be initialized before training can begin.

5. Wait for the experiment to finish. It might take a while — now might be a good time for a coffee break!

<h3> Review the best model </h3>


After the experiment has finished you can review the best performing model. In this case, you used exit criteria to stop the experiment. Thus the "best" model the experiment generated might not be the best possible model, just the best one found within the time allowed for this exercise.

1. On the Details tab of the automated machine learning run, note the best model summary.

<img src = "images/image3.png" width = 1000 height = 1000 />



2. Select the **Algorithm name** for the best model to view its details.

    The best model is identified based on the evaluation metric you specified, Normalized root mean squared error.

    A technique called cross-validation is used to calculate the evaluation metric. After the model is trained using a portion of the data, the remaining portion is used to iteratively test, or cross-validate, the trained model. The metric is calculated by comparing the predicted value from the test with the actual known value, or label.

    The difference between the predicted and actual value, known as the residuals, indicates the amount of error in the model. The particular performance metric you used, normalized root mean squared error, is calculated by squaring the errors across all of the test cases, finding the mean of these squares, and then taking the square root. What all of this means is that smaller this value is, the more accurate the model's predictions.

3. Next to the Normalized root mean squared error value, select **View all other metrics** to see values of other possible evaluation metrics for a regression model.

<img src = "images/image4.png" width = 1000 height = 1000 />

4. Select the **Metrics** tab and select the **residuals** and **predicted_true** charts if they are not already selected.

<img src = "images/image5.png" width = 1000 height = 1000 />

    Review the charts which show the performance of the model. The chart compares the predicted values against the true values, and shows the residuals, the differences between predicted and actual values, as a histogram.

    The Predicted vs. True chart should show a diagonal trend in which the predicted value correlates closely to the true value. The dotted line shows how a perfect model should perform. The closer the line of your model's average predicted value is to the dotted line, the better its performance. A histogram below the line chart shows the distribution of true values.
    
<img src = "images/image6.png" width = 1000 height = 1000 />
   
    The **Residual Histogram** shows the frequency of residual value ranges. Residuals represent variance between predicted and true values that can't be explained by the model, in other words, errors. You should hope to see the most frequently occurring residual values clustered around zero. You want to small errors with fewer errors at the extreme ends of the scale.

<img src = "images/image7.png" width = 1000 height = 1000 />


5. Select the **Explanations** tab. Select an explanation ID amd then select **Aggregate feature Importance**. This chart shows how much each feature in the dataset influences the label prediction, like this:

<img src = "images/image8.png" width = 1000 height = 1000 />

<hr/>

<center><h2>Deploy a model as a service</h2></center>
After you've used automated machine learning to train some models, you can deploy the best performing model as a service for client applications to use.

<h3> Deploy a predictive service </h3>

In Azure Machine Learning, you can deploy a service as an Azure Container Instances (ACI) or to an Azure Kubernetes Service (AKS) cluster. For production scenarios, an AKS deployment is recommended, for which you must create an inference cluster compute target. In this exercise, you'll use an ACI service, which is a suitable deployment target for testing, and does not require you to create an inference cluster.

1. In [Azure Machine Learning studio](https://ml.azure.com/), on the **Automated ML** page, select the run for your automated machine learning experiment.

2. On the **Details** tab, select the algorithm name for the best model.

<img src = "images/image9.png" width = 1000 height = 1000 />

3. on the **Model** tab, select the **Deploy** button and use the **Deploy to web service** option to deploy the model with the following settings:

**Name:** predict-rentals
**Description:** Predict cycle rentals
**Compute type:** Azure Container Instance
**Enable authentication:** Selected

4. Wait for the deployment to start - this may take a few seconds. Then, in the **Model summary** section, observe the **Deploy status** for the** predict-rentals** service, which should be **Running**. Wait for this status to change to Successful, which may take some time. You may need to select **↻ Refresh** periodically.

5. In Azure Machine Learning studio, view the **Endpoints** page and select the **predict-rentals** real-time endpoint. Then select the **Consume** tab and note the following information there. If you do not see the **Consume** tab, the deployment is not completely finished - you will need to wait and refresh the page. You would need the information from the Consume tab to connect to your deployed service from a client application.

    - The REST endpoint for your service
    - The primary or secondary key for your service
    
    
    <img src = "images/image10.png" width = 1000 height = 1000 />

<h3> Test the deployed service </h3>

Now you can test your deployed service.

1. On the Endpoints page, open the predict-auto-price real-time endpoint.

2. When the predict-auto-price endpoint opens, view the Test tab.

3. In the input data pane, replace the template JSON with the following input data:


<img src = "images/image11.png" width = 1000 height = 1000 />

4. Click on the Test button.

5. Review the test results, which include a predicted number of rentals based on the input features. The test pane took the input data and used the model you trained to return the predicted number of rentals.

<img src = "images/image12.png" width = 1000 height = 1000 />

Let's review what you have done. You used a dataset of historical bicycle rental data to train a model. The model predicts the number of bicycle rentals expected on a given day, based on seasonal and meteorological features. In this case, the labels are number of bicycle rentals.

<h3> Knowledge check </h3>

1. An automobile dealership wants to use historic car sales data to train a machine learning model. The model should predict the price of a pre-owned car based on its make, model, engine size, and mileage. What kind of machine learning model should the dealership use automated machine learning to create?

Classification

Regression

Time series forecasting
2. A bank wants to use historic loan repayment records to categorize loan applications as low-risk or high-risk based on characteristics like the loan amount, the income of the borrower, and the loan period. What kind of machine learning model should the bank use automated machine learning to create?

Classification

Regression

Time series forecasting
3. You want to use automated machine learning to train a regression model with the best possible R2 score. How should you configure the automated machine learning experiment?

Set the Primary metric to R2 score

Block all algorithms other than GradientBoosting

Enable featurization

<h2> Summary </h2>


In this module, you explored machine learning and learned how to use the automated machine learning capability of Azure Machine Learning to train and deploy a predictive model.

<h2> Clean-up </h2>

The web service you created is hosted in an Azure Container Instance. If you don't intend to experiment with it further, you should delete the endpoint to avoid accruing unnecessary Azure usage.

In Azure Machine Learning studio, on the Endpoints tab, select the predict-rentals endpoint. Then select Delete (🗑) and confirm that you want to delete the endpoint.

<mark style="background-color: #FFFF00"> Note that: Deleting the endpoint ensures your subscription won't be charged for the container instance in which it is hosted. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to recreate it.</mark >
    
<mark style="background-color: #FFFF00">  To delete your workspace:</mark >

<mark style="background-color: #FFFF00">1. In the Azure portal, in the Resource groups page, open the resource group you specified when creating your Azure Machine Learning workspace.</mark >


<mark style="background-color: #FFFF00"> 2. Click Delete resource group, type the resource group name to confirm you want to delete it, and select Delete.
    </mark >