Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/responsible-ai/model-analysis/regression/azureml-model-analysis-regression.png)

# Model analysis for regression scenarios
**This notebook will demonstrate on how to compute Responsible AI insights like explanations, counterfactual examples, causal effects and error analysis on remote compute for a regression model.**

## Contents
1. [Prerequisites](#Prerequisites)
1. [Dataset](#Dataset)
1. [Create or attach existing AmlCompute cluster](#AmlCompute)
1. [Train model on remote compute](#Train)
1. [Generate RAI insights](#Generate)
1. [Responsible AI dashboard](#Dashboard)

# Prerequisites
## Install azureml-responsible-ai 
`pip install azureml-responsibleai` before running this notebook.

In [None]:
#!pip install azureml-responsibleai

In [None]:
import azureml.core
from azureml.core import Workspace, Experiment, Run
from azureml.core import Model
from azureml.core.dataset import Dataset
from azureml.core import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies

from responsibleai import ModelAnalysis
from azureml.responsibleai.common.pickle_model_loader import PickleModelLoader
from azureml.responsibleai.tools.model_analysis.model_analysis_config import ModelAnalysisConfig
from azureml.responsibleai.tools.model_analysis.model_analysis_run import ModelAnalysisRun
from azureml.responsibleai.tools.model_analysis.explain_config import ExplainConfig
from azureml.responsibleai.tools.model_analysis.causal_config import CausalConfig
from azureml.responsibleai.tools.model_analysis.counterfactual_config import CounterfactualConfig
from azureml.responsibleai.tools.model_analysis.error_analysis_config import ErrorAnalysisConfig

In [None]:
# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

## Link an AzureML workspace

To use this notebook, an Azure Machine Learning workspace is required.
Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.

In [None]:
user_workspace = Workspace.from_config()
print('Workspace name: ' + user_workspace.name, 
      'Azure region: ' + user_workspace.location, 
      'Subscription id: ' + user_workspace.subscription_id, 
      'Resource group: ' + user_workspace.resource_group, sep = '\n')

# Dataset
This notebook uses the Boston housing dataset. Below we load the Boston Hosuing dataset and split the dataset into train and test sets.

In [None]:
import sklearn
import pandas as pd
from sklearn.model_selection import train_test_split

data = sklearn.datasets.load_boston()
target_feature = 'y'
continuous_features = data.feature_names
data_df = pd.DataFrame(data.data, columns=data.feature_names)

X_train, X_test, y_train, y_test = train_test_split(data_df, data.target, test_size=0.2, random_state=7)

train_data = X_train.copy()
test_data = X_test.copy()
train_data[target_feature] = y_train
test_data[target_feature] = y_test

## Upload the train and test dataset to datastore
In the cell below, we upload the train and test datasets to the default datastore and register the train data and test data as azureml datasets.

In [None]:
from azureml.core import Dataset
datastore = user_workspace.get_default_datastore()

# Upload train data to datastore
train_name = 'boston_train'
train_datastore_path = (datastore, train_name)
train_dataset = Dataset.Tabular.register_pandas_dataframe(
    train_data, train_datastore_path, train_name)


# Upload test data to datastore
test_name = 'boston_test'
test_datastore_path = (datastore, test_name)
test_dataset = Dataset.Tabular.register_pandas_dataframe(
    test_data, test_datastore_path, test_name)

label = 'y'

## Identify continous and categorical features
Below we identify continous and categorical features in the above dataset.

In [None]:
continuous_features = X_train.columns
categorical_features = []
feature_names = X_train.columns
continuous_features

# Create or attach existing AmlCompute cluster

You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model and computing RAI insights for the trained model. In this tutorial, you create `AmlCompute` as your training compute resource.

> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.

As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.# Create cimpute cluster

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "rai-cluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=user_workspace, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',
                                                           max_nodes=6)
    compute_target = ComputeTarget.create(user_workspace, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

# Train model on remote compute
In this section, we train a simple regression model on the remote compute.

Add `azureml-responsibleai` as a pip dependency in the run configuration.

In [None]:
run_config = RunConfiguration(framework="python")
conda_dependencies = CondaDependencies.create()
run_config.environment.python.conda_dependencies = conda_dependencies
run_config.environment.python.conda_dependencies.add_pip_package("azureml-responsibleai=={}".format(azureml.core.VERSION))
run_config.target = compute_target

In [None]:
run_config

Copy the train script into the script directory. This train script will be used to train the model and register the model on remote compute.  

In [None]:
import shutil
import os

# create script folder
script_folder = './sample_projects/regression-boston'
if not os.path.exists(script_folder):
    os.makedirs(script_folder)

# Copy the sample script to script folder.
shutil.copy('train.py', script_folder)

# Create the explainer script that will run on the remote compute.
script_file_name = script_folder + '/train.py'

Submit the train script via `ScriptRunConfig` to train the model on remote compute.

In [None]:
# Now submit a run on AmlCompute for model explanations
from azureml.core.script_run_config import ScriptRunConfig

exp_name = "RAI-Regression-Boston"
experiment = Experiment(user_workspace, exp_name)


script_run_config = ScriptRunConfig(source_directory=script_folder,
                                    script='train.py',
                                    run_config=run_config)

run = experiment.submit(script_run_config)

# Show run details
run

Wait for the above model training run to complete.

In [None]:
run.wait_for_completion(raise_on_error=True, wait_post_processing=True)

# Generate RAI insights

This section will walk you through the workflow to compute Responsible AI insights like model explanations, counterfactual examples, causal effects and error analysis using model analysis workflow on your remote compute for the model trained in the previous section.

## Configure model analysis and submit RAI insight computation runs
In this section, we will demonstrate how to configure model analysis, submit the model analysis run and submit the individual RAI computations for explanations, counterfactual examples, error analysis and causal effects for your trained model


### Create ModelAnalysis configuration

Create `ModelAnalysisConfig` for computing the RAI insights for the trained model. The `ModelAnalysisConfig` requires the following:-
1. The registered model which was registered during the model training.
2. The train and test datasets.
3. `confidential_datastore_name`which is the name of the datastore where the analyses will be uploaded.
4. List of the feature column names by dropping the name of the label column from the list of all column names.
5. List of categorical features.
6. Azureml run configuration whcih was setup in the previous section.

In [None]:
registered_model = Model.list(user_workspace, 'boston')[0]
model_loader = PickleModelLoader('boston.pkl')

train_dataset = Dataset.get_by_name(workspace=user_workspace, name='boston_train')
test_dataset = Dataset.get_by_name(workspace=user_workspace, name='boston_test')

ma = ModelAnalysisConfig(
    title="RAI Regression Boston",
    model=registered_model,
    model_type='regression',
    model_loader=model_loader,
    train_dataset=train_dataset,
    test_dataset=test_dataset,
    X_column_names=feature_names,
    target_column_name=label,
    confidential_datastore_name=user_workspace.get_default_datastore().name,
    run_configuration=run_config,
    categorical_column_names=categorical_features
)

### Submit Model Analysis run

The model analysis run takes a snapshot of the data in preparation for model explanation, error analysis, causal and counterfactual.
The model analysis run is the parent run for the model explanation, error analysis, causal and counterfactual runs.

In [None]:
experiment = Experiment(user_workspace, exp_name)

model_analysis_run = experiment.submit(ma)
model_analysis_run.wait_for_completion(raise_on_error=True,
                                       wait_post_processing=True)
model_analysis_run

### Submit run for explanations

Run model explanation based on the model analysis.
The explanation run is a child run of the model analysis run.
In the future, the `add_request` method will allow extra parameters to configure the explanation generated.

In [None]:
ec = ExplainConfig(model_analysis_run, run_config)
ec.add_request("Compute Explanations")
explain_run = model_analysis_run.submit_child(ec)

### Submit run for error analysis

Run error analysis based on the model analysis.
The error analysis run is a child run of the model analysis run.

In [None]:
ec = ErrorAnalysisConfig(model_analysis_run, run_config)
ec.add_request(max_depth=3, comment="Compute ErrorAnalysis")
error_analysis_run = model_analysis_run.submit_child(ec)

### Submit run for counterfactual examples

Generate counterfactuals for all the samples in the `test_dataset` based on the model analysis.
The counterfactual run is a child run of the model analysis run.
You may use the `add_request` method that allows you to specify extra parameters to configure the counterfactual examples to be generated.

In [None]:
cf_config = CounterfactualConfig(model_analysis_run, run_config)
cf_config.add_request(total_CFs=10, desired_range=[10, 300], feature_importance=True)
cf_run = model_analysis_run.submit_child(cf_config)

### Submit run for causal effects

Compute causal effects based on the model analysis.
The causal run is a child run of the model analysis run.
You may use the `add_request` method that allows you to specify extra parameters to configure the causal effects to be generated.

In [None]:
causal_config = CausalConfig(model_analysis_run, run_config)
causal_config.add_request(
    treatment_features=['ZN', 'NOX'],
    nuisance_model='linear',
    skip_cat_limit_checks=True)
causal_run = model_analysis_run.submit_child(causal_config)

## Download and inspect RAI insights
In this section, we will demonstrate how to download the RAI insights computed in previous section and look at different aspects of your trained model.

### Download explanations and view global feature importance
Before downloading the explanations, make sure that the `explain_run` has completed.

The `explanation_manager.list` method below returns a list of metadata dictionaries for each explain run.  In this case, there is a single explain run.  So, the list contains a single dictionary. 

You can then download the computed explanations using the `download_by_id` method in the `explanation_manager` and look at the feature importance.

In [None]:
explain_run.wait_for_completion(raise_on_error=True, wait_post_processing=True)
explanations_meta = model_analysis_run.explanation_manager.list()
explanation = model_analysis_run.explanation_manager.download_by_id(explanations_meta[0]['id'])

In [None]:
explanation.get_feature_importance_dict()

### Download error analysis report
Before downloading the error analysis report, make sure that the `error_analysis_run` has completed.

The `error_analysis_manager.list` method below returns a list of metadata dictionaries for each error analysis run.  In this case, there is a single error analysis run.  So, the list contains a single dictionary. 

You can then download the computed error analysis report using the `download_by_id` method in the `error_analysis_manager` and inspect the error analysis report.

In [None]:
error_analysis_run.wait_for_completion(raise_on_error=True, wait_post_processing=True)
erroranalysis_meta = model_analysis_run.error_analysis_manager.list()
erroranalysis_report = model_analysis_run.error_analysis_manager.download_by_id(erroranalysis_meta[0]['id'])

You can view the json tree and heatmap representations on the error analysis report directly, without the visualization widget or uploading it to AzureML

In [None]:
erroranalysis_report.tree

### Download counterfactuals examples
Before downloading the counterfactual examples, make sure that the `cf_run` has completed.

The `counterfactual_manager.list` method below returns a list of metadata dictionaries for each counterfactual run.  In this case, there is a single counterfactual run.  So, the list contains a single dictionary.

The `download_by_id()` method available in the `counterfactual_manager` can be used to download the counterfactual example.


In [None]:
cf_run.wait_for_completion(raise_on_error=True, wait_post_processing=True)
cf_meta = model_analysis_run.counterfactual_manager.list()
cf_meta
counterfactual_object = model_analysis_run.counterfactual_manager.download_by_id(cf_meta[0]['id'])

You can use `visualize_as_dataframe()` method to view the generated counterfactual examples for the samples in `test_dataset`.

In [None]:
counterfactual_object.visualize_as_dataframe()

You can use `summary_importance` property to see the feature importance which is computed when generating counterfactual examples. 

In [None]:
counterfactual_object.summary_importance

### Download causal effects
Before downloading the causal effects, make sure that the `causal_run` has completed.

The `causal_manager.list` method below returns a list of metadata dictionaries for each causal effects run.  In this case, there is a single causal effects run.  So, the list contains a single dictionary. 

You can then download the computed causal effects using the `download_by_id` method in the `causal_manager` and inspect the downloaded causal effects.

In [None]:
causal_run.wait_for_completion(raise_on_error=True, wait_post_processing=True)
causal_meta = model_analysis_run.causal_manager.list()
causal_object = model_analysis_run.causal_manager.download_by_id(causal_meta[0]['id'])

In [None]:
causal_object['global_effects']

# Responsible AI dashboard
The dashboard containing the responsible AI insights, which were computed in previous sections, can be found under the Models section in [AzureML studio](https://ml.azure.com/).