# Study Note - Building AI Solutions with Azure Machine Learning
This notebook collects the notes taken through the course of **[Build AI solutions with Azure Machine Learning](https://docs.microsoft.com/en-us/learn/paths/build-ai-solutions-with-azure-ml-service/)** offered by Microsoft, with supplements from the **[documentation of Azure Machine Learning SDK for Python](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py)**.

This notebook contains Labs 08 - 13 of the learning course, which correspond to "Optimize and Manage Models" section in the exam guideline.

## 08 Tune hyperparameters with Azure Machine Learning

Hyperparameter tuning is accomplished by training the multiple models, using the same algorithm and training data but different hyperparameter values. The resulting model from each training run is then evaluated to determine the performance metric for which you want to optimize (for example, accuracy), and the best-performing model is selected.

In Azure Machine Learning, you achieve this through an experiment that consists of a hyperdrive run, which initiates **a child run** for each hyperparameter combination to be tested. Each child run uses a training script with parameterized hyperparameter values to train a model, and logs the target performance metric achieved by the trained model.

### Defining a search space

For discrete distributions:
- qnormal
- quniform
- qlognormal
- qloguniform

For continous distributions:
- normal
- uniform
- lognormal
- loguniform


```python
from azureml.train.hyperdrive import choice, normal

param_space = {
                 '--batch_size': choice(16, 32, 64),
                 '--learning_rate': normal(10, 3)
              }
```

For a discrete parameter, use a **choice** from a list of explicit values. Example: `'--batch_size': choice(16, 32, 64)`

### Configuring sampling
#### Grid sampling

Grid sampling can only be employed when all hyperparameters are discrete, and is used to try every possible combination of parameters in the search space.

```python
from azureml.train.hyperdrive import GridParameterSampling, choice

param_space = {
                 '--batch_size': choice(16, 32, 64),
                 '--learning_rate': choice(0.01, 0.1, 1.0)
              }

param_sampling = GridParameterSampling(param_space)
```

#### Random sampling

Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values

```python
from azureml.train.hyperdrive import RandomParameterSampling, choice, normal

param_space = {
                 '--batch_size': choice(16, 32, 64),
                 '--learning_rate': normal(10, 3)
              }

param_sampling = RandomParameterSampling(param_space)
```

#### Bayesian sampling

Bayesian sampling chooses hyperparameter values based on the Bayesian optimization algorithm, which tries to select parameter combinations that will result in improved performance from the previous selection.

```python
from azureml.train.hyperdrive import BayesianParameterSampling, choice, uniform

param_space = {
                 '--batch_size': choice(16, 32, 64),
                 '--learning_rate': uniform(0.5, 0.1)
              }

param_sampling = BayesianParameterSampling(param_space)
```

You can only use Bayesian sampling with **choice, uniform, and quniform** parameter expressions, and you can't combine it with an early-termination policy.


### Configuring early termination

To help prevent wasting time, you can **set an early termination policy** that abandons runs that are unlikely to produce a better result than previously completed runs. The policy is evaluated at an ***evaluation_interval*** you specify, based on each time the target performance metric is logged. You can also set a ***delay_evaluation*** parameter to avoid evaluating the policy until a minimum number of iterations have been completed.


#### Bandit policy

You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.

```python
from azureml.train.hyperdrive import BanditPolicy

early_termination_policy = BanditPolicy(slack_amount = 0.2,
                                        evaluation_interval=1,
                                        delay_evaluation=5)
```
This example applies the policy for every iteration after the first five, and abandons runs where the reported target metric is 0.2 or more worse than the best performing run **after the same number of intervals**.

You can also apply a bandit policy using a slack factor, which compares the performance metric as a ratio rather than an absolute value.

#### Median stopping policy

A median stopping policy abandons runs where the target performance metric is worse than the median of the running averages for all runs.

```python
from azureml.train.hyperdrive import MedianStoppingPolicy

early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
                                                delay_evaluation=5)
```

#### Truncation selection policy

A truncation selection policy cancels the lowest performing X% of runs at each evaluation interval based on the truncation_percentage value you specify for X.

```python
from azureml.train.hyperdrive import TruncationSelectionPolicy

early_termination_policy = TruncationSelectionPolicy(truncation_percentage=10,
                                                     evaluation_interval=1,
                                                     delay_evaluation=5)
```

### Running a hyperparameter tuning experiment

In Azure Machine Learning, you can tune hyperparameters by running a ***hyperdrive experiment***.

To run a hyperdrive experiment, you need to create a training script just the way you would do for any other training experiment, except that your script must:

- Include an argument for each hyperparameter you want to vary.
- Log the target performance metric. This enables the hyperdrive run to evaluate the performance of the child runs it initiates, and identify the one that produces the best performing model.

#### Note: The above two requirements should already be met in the script based on previous lectures

To prepare the hyperdrive experiment, you must use a **HyperDriveConfig** object to configure the experiment run, as shown in the following example code:

```python
# Configuring and running a hyperdrive experiment
from azureml.core import Experiment
from azureml.train.hyperdrive import HyperDriveConfig, PrimaryMetricGoal

# Assumes ws, sklearn_estimator and param_sampling are already defined

hyperdrive = HyperDriveConfig(estimator=sklearn_estimator,
                              hyperparameter_sampling=param_sampling, # Sampleing method
                              policy=None,
                              primary_metric_name='Accuracy',
                              primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                              max_total_runs=6,
                              max_concurrent_runs=4)

experiment = Experiment(workspace = ws, name = 'hyperdrive_training')
hyperdrive_run = experiment.submit(config=hyperdrive)
```

You can monitor hyperdrive experiments in Azure Machine Learning studio, or by using the Jupyter Notebooks **RunDetails** widget.

The experiment will initiate a child run for each hyperparameter combination to be tried, and you can retrieve the logged metrics these runs using the following code:

```python
for child_run in run.get_children():
    print(child_run.id, child_run.get_metrics())
    
# list all runs in descending order of performance like this
for child_run in hyperdrive_.get_children_sorted_by_primary_metric():
    print(child_run)
    
# retrieve the best performing run
best_run = hyperdrive_run.get_best_run_by_primary_metric()
```


## 09 Automate machine learning model selection with Azure Machine Learning

Azure Machine Learning includes support for automated machine learning through **a visual interface in Azure Machine Learning studio for *Enterprise* edition workspaces only**. You can **use the Azure Machine Learning SDK to run automated machine learning experiments in either *Basic* or *Enterprise* edition workspaces**.

Automated Machine Learning is one of the two big features, Automated ML and Designer, in AML studio. You can use the visual interface in Azure Machine Learning studio or the SDK to leverage this capability. The SDK gives you greater control over the settings for the automated machine learning experiment, but the visual interface is easier to use.

### Models
You can use automated machine learning in Azure Machine Learning to train models for the following types of machine learning task:

- Classification
- Regression
- Time Series Forecasting

### Data Preprocessing
As well as trying a selection of algorithms, automated machine learning can apply preprocessing transformations to your data; improving the performance of the model.

Automated machine learning applies **scaling and normalization** to numeric data automatically, helping prevent any large-scale features from dominating training. *During an automated machine learning experiment, multiple scaling or normalization techniques will be applied.*

You can choose to have automated machine learning apply **optional** preprocessing transformations such as:

- Missing value imputation to eliminate nulls in the training dataset.
- Categorical encoding to convert categorical features to numeric indicators.
- Dropping high-cardinality features, such as record IDs.
- Feature engineering (for example, deriving individual date parts from DateTime features)
- Others...

### Data
When using the SDK to run an automated machine learning experiment, you can submit the data in the following ways:

- Specify a dataset or dataframe of training data that includes features and the label to be predicted.
- Optionally, specify a second validation data dataset or dataframe that will be used to validate the trained model. if this is not provided, Azure Machine Learning will apply cross-validation using the training data.

Alternatively:

- Specify a dataset, dataframe, or numpy array of X values containing the training features, with a corresponding y array of label values.
- Optionally, specify X_valid and y_valid datasets, dataframes, or numpy arrays of X_valid values to be used for validation.

### Codes

```python
# Configuring an Automated Machine Learning Experiment
from azureml.train.automl import AutoMLConfig

automl_run_config = RunConfiguration(framework='python')
automl_config = AutoMLConfig(name='Automated ML Experiment',
                             task='classification',
                             primary_metric = 'AUC_weighted',
                             compute_target=aml_compute,
                             training_data = train_dataset,
                             validation_data = test_dataset,
                             label_column_name='Label',
                             featurization='auto',
                             iterations=12,
                             max_concurrent_iterations=4)

    # Retrieve the list of metrics available for a particular task type
# from azureml.train.automl.utilities import get_primary_metrics
# get_primary_metrics('classification')

# Submitting an Automated Machine Learning Experiment
from azureml.core.experiment import Experiment

automl_experiment = Experiment(ws, 'automl_experiment')
automl_run = automl_experiment.submit(automl_config)

# Retrieving the Best Run and its Model
best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)
```

Automated machine learning uses scikit-learn pipelines to encapsulate preprocessing steps with the model. You can view the steps in the fitted model you obtained from the best run using the code above like this:

```python
for step_ in fitted_model.named_steps:
    print(step)
```

## 10 Explain machine learning models with Azure Machine Learning

As machine learning becomes increasingly integral to decisions that affect health, safety, economic wellbeing, and other aspects of people's lives, it's important to be able to understand **how models make predictions**; and to be able to **explain the rationale for machine learning based decisions**.

### Type of Feature Importance

Model explainers use statistical techniques to calculate feature importance. This enables you to quantify the relative influence each feature in the training dataset has on label prediction. Explainers work by evaluating a test data set of feature cases and the labels the model predicts for them.

- **Global feature importance** quantifies the relative importance of each feature in the test dataset as a whole. It provides a general comparison of the extent to which each feature in the dataset influences prediction.
- **Local feature importance** measures the influence of each feature value for a specific individual prediction.

There could be multiple reasons why local importance for an individual prediction varies from global importance for the overall dataset; for example, Sam might have a lower income than average, but the loan amount in this case might be unusually small.

For a multi-class classification model, a local importance values for each possible class is calculated for every feature, with the total across all classes always being 0.

For a regression model, there are no classes so the local importance values simply indicate the level of influence each feature has on the predicted scalar label.

### Explainers

You can use the Azure Machine Learning SDK to create explainers for models, even if they were not trained using an Azure Machine Learning experiment.

To interpret a local model, you must install the **azureml-interpret** package and use it to create an explainer. There are multiple types of explainer.

#### MimicExplainer 

An explainer that creates a *global surrogate model* that approximates your trained model and can be used to generate explanations. This explainable model must have the same kind of architecture as your trained model (for example, linear or tree-based).

```python
from interpret.ext.blackbox import MimicExplainer
from interpret.ext.glassbox import DecisionTreeExplainableModel

mim_explainer = MimicExplainer(model=loan_model,
                             initialization_examples=X_test,
                             explainable_model = DecisionTreeExplainableModel,
                             features=['loan_amount','income','age','marital_status'], 
                             classes=['reject', 'approve'])
```

#### TabularExplainer 

An explainer that acts as a wrapper around various SHAP explainer algorithms, automatically choosing the one that is most appropriate for your model architecture.

```python
from interpret.ext.blackbox import TabularExplainer

tab_explainer = TabularExplainer(model=loan_model,
                             initialization_examples=X_test,
                             features=['loan_amount','income','age','marital_status'],
                             classes=['reject', 'approve'])
```
#### PFIExplainer 

A *Permutation Feature Importance* explainer that analyzes feature importance by shuffling feature values and measuring the impact on prediction performance.

```python
from interpret.ext.blackbox import PFIExplainer

pfi_explainer = PFIExplainer(model = loan_model,
                             features=['loan_amount','income','age','marital_status'],
                             classes=['reject', 'approve'])
```

### Explaining global feature importance

To retrieve global importance values for the features in your mode, you call the **explain_global()** method of your explainer to get a global explanation, and then use the **get_feature_importance_dict()** method to get a dictionary of the feature importance values.

```python
# MimicExplainer
global_mim_explanation = mim_explainer.explain_global(X_train)
global_mim_feature_importance = global_mim_explanation.get_feature_importance_dict()


# TabularExplainer
global_tab_explanation = tab_explainer.explain_global(X_train)
global_tab_feature_importance = global_tab_explanation.get_feature_importance_dict()


# PFIExplainer
global_pfi_explanation = pfi_explainer.explain_global(X_train, y_train) # requires the actual lables
global_pfi_feature_importance = global_pfi_explanation.get_feature_importance_dict()
```

### Explaining local feature importance

```python
# MimicExplainer
local_mim_explanation = mim_explainer.explain_local(X_test[0:5])
local_mim_features = local_mim_explanation.get_ranked_local_names()
local_mim_importance = local_mim_explanation.get_ranked_local_values()


# TabularExplainer
local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
local_tab_features = local_tab_explanation.get_ranked_local_names()
local_tab_importance = local_tab_explanation.get_ranked_local_values()

# The PFIExplainer doesn't support local feature importance explanations.
```

### Creating explanations

When you use an estimator or a script to train a model in an Azure Machine Learning experiment, you can create an explainer and upload the explanation it generates to the run for later analysis.

To **create an explanation in the experiment script**, you'll need to ensure that the **azureml-interpret** and **azureml-contrib-interpret** packages are installed in the run environment. Then you can use these to create an explanation from your trained model and upload it to the run outputs.

```python
# Import Azure ML run library
from azureml.core.run import Run
from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient
from interpret.ext.blackbox import TabularExplainer
# other imports as required

# Get the experiment run context
run = Run.get_context()

# code to train model goes here

# Get explanation
explainer = TabularExplainer(model, X_train, features=features, classes=labels)
explanation = explainer.explain_global(X_test)

# Get an Explanation Client and upload the explanation
explain_client = ExplanationClient.from_run(run)
explain_client.upload_model_explanation(explanation, comment='Tabular Explanation')

# Complete the run
run.complete()
```

You can view the explanation you created for your model in the **Explanations** tab for the run in Azure Machine learning studio.

You can also use the **ExplanationClient** object to download the explanation in Python.

```python
from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient

client = ExplanationClient.from_run_id(workspace=ws,
                                       experiment_name=experiment.experiment_name, 
                                       run_id=run.id)
explanation = client.download_model_explanation()
feature_importances = explanation.get_feature_importance_dict()
```

## 11 Detect and mitigate unfairness in models with Azure Machine Learning

### Disparity
[To add more notes]

To evaluate the fairness of a model, you can apply the same predictive performance metric to subsets of the data, based on the sensitive features on which your population is grouped, and measure the disparity in those metrics across the subgroups.

For example, suppose the loan approval model exhibits an overall recall metric of 0.67 - in other words, it correctly identifies 67% of cases where the applicant repaid the loan. The question is whether or not the model provides a similar rate of correct predictions for different age groups.

Let's say that we find that the recall for validation cases where the applicant is 25 or younger is 0.50, and recall for cases where the applicant is over 25 is 0.83. In other words, the model correctly identified 50% of the people in the 25 or younger age group who successfully repaid a loan (and therefore misclassified 50% of them as loan defaulters), but found 83% of loan repayers in the older age group (misclassifying only 17% of them). The disparity in prediction performance between the groups is 33%, with the model predicting significantly more false negatives for the younger age group.

A model with lower disparity in predictive performance between sensitive feature groups might be favorable then the model with higher disparity and overall accuracy.

#### Side Note – under what situations we might choose a model with lower accuracy/AUC over a higher one?
- Time required for training
- Interpretability
- Lower disparity between sensitive feature groups


## 12 Monitor models with Azure Machine Learning

After a machine learning model has been deployed into production, it's important to understand how it is being used by capturing and viewing **telemetry**.

**Application Insights** is an application performance management service in Microsoft Azure that enables the capture, storage, and analysis of telemetry data from applications.

You can use Application Insights to monitor telemetry from many kinds of application, including applications that are not running in Azure. All that's required is a low-overhead instrumentation package to capture and send the telemetry data to Application Insights. The necessary package is already included in Azure Machine Learning Web services, so you can use it to capture and review telemetry from models published with Azure Machine Learning.

### Enable Application insights

To log telemetry in application insights from an Azure machine learning service, you must **have an Application Insights resource associated with your Azure Machine Learning workspace**, and you must **configure your service to use it for telemetry logging**.

When you create an Azure Machine Learning workspace, you can select an Azure Application Insights resource to associate with it. If you do not select an existing Application Insights resource, a new one is created in the same resource group as your workspace.

```python
# Find out the Application Insights associated with the workspace
from azureml.core import Workspace

ws = Workspace.from_config()
ws.get_details()['applicationInsights']

# Enable Application Insights for a service when deploying a new real-time service
dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,
                                                memory_gb = 1,
                                                enable_app_insights=True) # enable Application Insights

# Enable Application Insights for a service already deployed: modify the deployment configuration for Azure Kubernetes Service (AKS) based services in the Azure portal.
service = ws.webservices['my-svc']
service.update(enable_app_insights=True)
```

### Capture and view telemetry

Application Insights automatically captures any information written to the standard output and error logs, and provides a query capability to view data in these logs.

To capture telemetry data for Application insights, you can write any values to the standard output log in the scoring script for your service by using a **print statement (Note: not using log())**.

```python
def init():
    global model
    model = joblib.load(Model.get_model_path('my_model'))
def run(raw_data):
    data = json.loads(raw_data)['data']
    predictions = model.predict(data)
    log_txt = 'Data:' + str(data) + ' - Predictions:' + str(predictions)
    print(log_txt)
    return predictions.tolist()
```

To analyze captured log data, you can use the Log Analytics query interface for Application Insights in the Azure portal. This interface supports a SQL-like query syntax that you can use to extract fields from logged data, including custom dimensions created by your Azure Machine Learning service.

## 13 Monitor  data drift with Azure Machine Learning

Changing trends in data over time can reduce the accuracy of the predictions made by a model. Monitoring for this data drift is an important way to ensure your model continues to predict accurately.

### Monitor data drift by comparing datasets
Azure Machine Learning supports data drift monitoring through the use of datasets. You can compare two registered datasets to detect data drift, or you can capture new feature data submitted to a deployed model service and compare it to the dataset with which the model was trained.

To monitor data drift using registered datasets, you need to register two datasets:

- A baseline dataset - usually the original training data.
- A target dataset that will be compared to the baseline based on time intervals. This dataset requires a column for each feature you want to compare, and a timestamp column so the rate of data drift can be measured.

```python
# create dataset monitors
from azureml.datadrift import DataDriftDetector

monitor = DataDriftDetector.create_from_datasets(workspace=ws,
                                                 name='dataset-drift-detector',
                                                 baseline_data_set=train_ds,
                                                 target_data_set=new_data_ds,
                                                 compute_target='aml-cluster',
                                                 frequency='Week',
                                                 feature_list=['age','height', 'bmi'],
                                                 latency=24)

# backfill to immediately compare the baseline dataset to existing data in the target dataset
import datetime as dt

backfill = monitor.backfill( dt.datetime.now() - dt.timedelta(weeks=6), dt.datetime.now())
```

### Monitor data drift in service inference data

```python
# Register the baseline dataset with the model

from azureml.core import Model, Dataset

model = Model.register(workspace=ws,model_path='./model/model.pkl', model_name='my_model',    
                       datasets=[(Dataset.Scenario.TRAINING, train_ds)])
```
To collect inference data for comparison, you must **enable data collection** for services in which the model is used. To do this, you must use the **ModelDataCollector** class in each service's ***scoring script***, **writing code to capture data and predictions and write them to the data collector** (which will store the collected data in Azure blob storage)

```python
from azureml.monitoring import ModelDataCollector

def init():
    global model, data_collect, predict_collect
    model_name = 'my_model'
    model = joblib.load(Model.get_model_path(model_name))

    # Enable collection of data and predictions
    data_collect = ModelDataCollector(model_name,
                                      designation='inputs',
                                      features=['age','height', 'bmi'])
    predict_collect = ModelDataCollector(model_name,
                                         designation='predictions',
                                         features=['prediction'])
def run(raw_data):
    data = json.loads(raw_data)['data']
    predictions = model.predict(data)

    # collect data and predictions
    data_collect(data)
    predict_collect(predictions)

    return predictions.tolist()
```

With the data collection code in place in the scoring script, you can enable data collection in the deployment configuration:

```python
from azureml.core.webservice import AksWebservice

dep_config = AksWebservice.deploy_configuration(collect_model_data=True)
```

Now that the baseline dataset is registered with the model, and the target data is being collected by deployed services, you can configure data drift monitoring by using a **DataDriftDetector** class:

```python
from azureml.datadrift import DataDriftDetector, AlertConfiguration

# create a new DataDriftDetector object for the deployed model
model = ws.models['my_model']
datadrift = DataDriftDetector.create_from_model(ws, model.name, model.version,
                                     services=['my-svc'],
                                     frequency="Week")
```

The data drift detector will run at the specified frequency, but you can run it on-demand as an experiment:

```python
from azureml.core import Experiment, Run
from azureml.widgets import RunDetails
import datetime as dt

# or specify existing compute cluster
run = datadrift.run(target_date=dt.today(),
                    services=['my-svc'],
                    feature_list=['age','height', 'bmi'],
                    compute_target='aml-cluster')

# show details of the data drift run
exp = Experiment(ws, datadrift._id)
dd_run = Run(experiment=exp, run_id=run.id)
RunDetails(dd_run).show()
```

### Configure alerts

```python
alert_email = AlertConfiguration('data_scientists@contoso.com')
monitor = DataDriftDetector.create_from_datasets(ws, 'dataset-drift-detector', 
                                                 baseline_data_set, target_data_set,
                                                 compute_target=cpu_cluster,
                                                 frequency='Week', latency=2,
                                                 drift_threshold=.3,
                                                 alert_configuration=alert_email)
```

Summarize the whole workflow from building, deploying, consuming, to monitoring a model.
- Refer to Jupyter Notebook for the complete codes
