# Exercise 1 - Getting Started with Azure ML

Azure Machine Learning (Azure ML) is a cloud-based service that enables data scientists, AI software engineers, and others to collaborate on machine learning projects and manage data science workloads at scale. This is the first in a series of hands-on exercises that are designed to introduce the core concepts and components on an Azure ML solution. These exercises assume an existing knowledge of Python and general machine learning concepts and frameworks. Each exercise is provided in its own notebook - it is assumed that you will complete the exercises in order.

In a separate browser tab, sign into your Azure subscription and view your portal at https://portal.azure.com. As you proceed through the tasks below, you'll toggle between this notebook and the portal to visually confirm that the code you've run here has had the intended results in your Azure subscription.

## Task 1: Install the Azure ML SDK for Python

The Azure ML SDK for Python provides classes you can use to work with Azure ML in your Azure subscription. Run the cell below to install the **azureml-sdk** Python package, including the optional *notebooks* component; which provides some functionality specific to the Jupyter Notebooks environment.

> **More Information**: For more details about installing the Azure ML SDK and its optional components, see the [Azure ML SDK Documentation](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/install?view=azure-ml-py)

In [None]:
#!pip install --upgrade azureml-sdk[notebooks]

import azureml.core
print("Ready to use Azure ML", azureml.core.VERSION)

## Task 2: Sign Into your Azure Subscription

Now that you've installed the SDK, you can use it to create, manage, and use Azure ML related objects in your Azure subscription, which means you'll need an authenticated connection between the code in this notebook and your Azure subscription. To create this authenticated connection, you can use the **authentication** module in the Azure ML SDK. In this case, you'll use the **InteractiveLoginAuthentication** class to generate a session token.

Run the cell below, and when prompted, click the https://microsoft.com/devicelogin link and enter the automatically generated code. Then, sign into your Azure subscription in the browser tab that is opened. After you have successfully signed in, you can close the browser tab that was opened and return to this notebook.

In [None]:
from azureml.core import authentication

auth = authentication.InteractiveLoginAuthentication()
print('Signed in')

## Task 3: Create a Workspace

The first object you need to create is an Azure ML *workspace*. As its name suggests, a workspace is a centralized place to manage all of the Azure ML resources you need to work on a machine learning project.

> **More Information**: To learn more about workspaces, see the [Azure ML Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace).

You can create a workspace using the visual interface in the Azure portal, but in this exercise you'll use the Azure ML SDK to create the workspace using code. This approach makes it easier to keep a record of the steps used to provision your Azure ML environment, and enables you to automate the steps should you need to recreate things later.

In the code below, enter appropriate values for the *SUBSCRIPTION_ID*, *RESOURCE_GROUP*, *WORKSPACE_NAME*, and *REGION* constants (you can find your Azure subscription ID in the Azure portal - just click the &#128273; **Subscriptions** tab on the left and then select the subscription you want to use). Then run the cell to create your workspace.

> **Note**: If you hadn't previously created an authenticated session, you'd automatically be prompted to sign into your Azure subscription!

In [None]:
from azureml.core import Workspace

SUBSCRIPTION_ID = '[subscription-id]' # Get this from the Azure portal
RESOURCE_GROUP_NAME  = '[rg-name]' # Get this from the Azure portal
WORKSPACE_NAME  = '[ws-name]' # Name of your choice - if it doesn't exist, it will be created
REGION = 'northeurope'# Or a region of your choice

ws = None
try:
    # Find existing workspace
    ws = Workspace(workspace_name=WORKSPACE_NAME,
                   subscription_id=SUBSCRIPTION_ID,
                   resource_group= RESOURCE_GROUP_NAME)
    print (ws.name, "found.")
except Exception as ex:
    # If workspace not found, create it
    print(ex.message)
    print("Attempting to create new workspace...")
    ws = Workspace.create(name=WORKSPACE_NAME, 
                      subscription_id=SUBSCRIPTION_ID,
                      resource_group=RESOURCE_GROUP_NAME,
                      create_resource_group=True,
                      location=REGION 
                     )
    print(ws.name, "created.")
finally:
    # Save the workspace configuration for later
    if ws != None:
        ws.write_config()
        print(ws.name, "saved.")

Switch to the browser tab containing the [Azure portal](https://portal.azure.com), and find the resource group you specified. It should contain the workspace along with some other Azure resources, including a storage account (where the workspace will store data, code, and other saved items), an *AppInsights* instance (used to monitor the workspace), and a *KeyVault* instance (used to manage secure information).

Click the workspace to open it, and note that it provides a graphical environment in which you can manage various Azure ML assets, such as *experiments*, *pipelines*, *compute*, *models*, and others. You will explore these kinds of asset in subsequent exercises.

In the code above, note that you used the **write_config** method to save the workspace configuration. This saved a JSON configuration file in a hidden folder named **.azureml**, which you can verify with the following cell.

In [None]:
# Print the config.json file
with open("./.azureml/config.json","r") as f:
    print(f.read())

This saved configuration file enables you to easily obtain a reference to the workspace by simply loading it, as demonstrated in the following cell. Note that this method will prompt you to reauthenticate against your Azure subscription if your session has expired.

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, "loaded")

## Task 4: Run an Experiment

Let's see how Azure ML can help track metrics from a simple experiment that uses Python code to examine some data.

In this case, you'll use a simple dataset that contains details of patients that have been tested for diabetes. You'll run an experiment to explore the data, extracting statistics, visualizations, and data samples. With the addition of a few lines, the code uses an Azure ML *experiment* to log details of the run.

In [None]:
from azureml.core import Experiment, Run
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 

# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace = ws, name = "diabetes-experiment")
print("Starting experiment:", experiment.name)

# Start logging data from the experiment
run = experiment.start_logging()

# load the diabetes dataset
data = pd.read_csv('data/diabetes.csv')

# Count the rows and log the result
row_count = (len(data))
run.log("observations", row_count)

# Create box plots for each feature variable by the "diabetic" label and log them
num_cols = data.columns[:-1]
for col in num_cols:
    fig = plt.figure(figsize=(9, 6))
    ax = fig.gca()
    data.boxplot(column = col, by = "Diabetic", ax = ax)
    ax.set_title(col + ' by Diabetic')
    ax.set_ylabel(col)
    run.log_image(name = col, plot = fig)
plt.show()

# Create a list of mean diabetes pedigree per age and log it
mean_by_age = data[["Age", "DiabetesPedigree"]].groupby(["Age"]).mean().reset_index()
ages = mean_by_age["Age"].tolist()
pedigrees = mean_by_age["DiabetesPedigree"].tolist()
for index in range(len(ages)):
       run.log_row("Mean Diabetes Pedigree by Age", Age = ages[index],Diabetes_Pedigree = pedigrees[index])

# Save a sample of the data and upload it to the experiment output
data.sample(100).to_csv("sample.csv", index=False, header=True)
run.upload_file(name = 'outputs/sample.csv', path_or_stream = './sample.csv')

# Complete tracking and get link to details
run.complete()


## Task 5: View Experiment Results

After the experiment has been finished, you can view the results. Start by running the following cell:

In [None]:
run

*Don't worry if the status is still **Running**, it can take a while to update. Eventually it will be set to **Completed**.*

Note that the experiment has been assigned a unique ID, and the output includes a link to a details page in the Azure portal. Click this link to open a new browser tab and view the experiment run details, noting the following:

On the **Details** tab:

- The **Tracked Metrics** list includes the *observations* value (the number of records in the dataset), an image for each matplotlib plot that was generated, and a 
table for the mean diabetes pedigree by age.
- The *Mean Diabetes Pedigree by Age* table is plotted as a chart.
- Each matplotlib plot image is shown.

On the **Outputs** tab:

- The ouputs generated by the experiment are listed - including each of the plot images and a CSV file containing a sample of the data used in the experiment.

Clicking **Back to Experiment** shows a page for this experiment with a list of all previous runs (in this case, there's only been one). This enables you to track multiple runs of the same experiment so you can observe variations in the metrics produced based on parameters or random data variation.

The **Experiments** tab in your Azure ML Workspace lists all of the experiments that have been run in the workspace.

> **More Information**: To find out more about running experiments, see [this topic](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-runs) in the Azure ML documentation. For details of how to log metrics in a run, see [this topic](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-track-experiments).