# Reviewing Automated Machine Learning Explanations

As machine learning becomes more and more and more prevelant, the predictions made by models have greater influence over many aspects of our society. For example, machine learning models are an increasingly significant factor in how banks decide to grant loans or doctors prioritise treatments. The ability to interpret and explain models is increasingly important, so that the rationale for the predictions made by machine learning models can be explained and justified, and any inadvertant bias in the model can be identified.

When you use automated machine learning to train a model, you have the option to generate explanations of feature importance that quantify the extent to which each feature influences label prediction. In this lab, you'll explore the explanations generated by an automated machine learning experiment.

## Connect to Your Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK.

> **Note**: If the authenticated session with your Azure subscription has expired since you completed the previous exercise, you'll be prompted to reauthenticate.

In [1]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.13.0 to work with BSM_MLWorkspace3


## Run an Automated Machine Learning Experiment

To reduce time in this lab, you'll run an automated machine learning experiment with only three iterations.

Note that the **model_explainability** configuration option is set to **True**.

> **Important**: Change *your-compute-cluster* to the name of your compute cluster in the code below before running it! Cluster names must be globally unique names between 2 to 16 characters in length. Valid characters are letters, digits, and the - character.

In [3]:
import pandas as pd
from azureml.train.automl import AutoMLConfig
from azureml.core.experiment import Experiment
from azureml.widgets import RunDetails
from azureml.core import Dataset
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Set the training cluster name
cluster_name = "B-ML3-CmpCluster"

try:
    # Prepare data for training
    default_ds = ws.get_default_datastore()
    if 'diabetes dataset' not in ws.datasets:
        default_ds.upload_files(files=['./data/diabetes.csv', './data/diabetes2.csv'], # Upload the diabetes csv files in /data
                            target_path='diabetes-data/', # Put it in a folder path in the datastore
                            overwrite=True, # Replace existing files of the same name
                            show_progress=True)

        # Create a tabular dataset from the path on the datastore (this may take a short while)
        tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

        # Register the tabular dataset
        try:
            tab_data_set = tab_data_set.register(workspace=ws, 
                                    name='diabetes dataset',
                                    description='diabetes data',
                                    tags = {'format':'CSV'},
                                    create_new_version=True)
            print('Dataset registered.')
        except Exception as ex:
            print(ex)
    else:
        print('Dataset already registered.')
    train_data = ws.datasets.get("diabetes dataset")

    # Prepare compute
    try:
        # Check for existing compute target
        training_cluster = ComputeTarget(workspace=ws, name=cluster_name)
        print('Found existing cluster, use it.')
    except ComputeTargetException:
        # If it doesn't already exist, create it
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        training_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
        training_cluster.wait_for_completion(show_output=True)

    # Configure Auto ML
    automl_config = AutoMLConfig(name='Automated ML Experiment',
                                task='classification',
                                compute_target=training_cluster,
                                training_data = train_data,
                                n_cross_validations = 2,
                                label_column_name='Diabetic',
                                iterations=3,
                                primary_metric = 'AUC_weighted',
                                max_concurrent_iterations=3,
                                featurization='off',
                                model_explainability=True # Generate feature importance!
                                )

    # Run the Auto ML experiment
    print('Submitting Auto ML experiment...')
    automl_experiment = Experiment(ws, 'diabetes_automl')
    automl_run = automl_experiment.submit(automl_config)
    automl_run.wait_for_completion(show_output=True)
    RunDetails(automl_run).show()

except Exception as ex:
    print(ex)

Dataset already registered.
Found existing cluster, use it.
Submitting Auto ML experiment...
Running on remote or ADB.

Current status: DatasetCrossValidationSplit. Generating CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## View Feature Importance

When the experiment has completed in the widget above, click the run that produced the best result to see its details. Then scroll to the bottom of the visualizations to see the relative feature importance.


## View the Model Explanation in Azure Machine Learning studio

With the experiment run completed, click the link in the widget to see the run in Azure Machine Learning studio, and view the **Explanations** tab. Then:

1. Select the explainer that was created by the automated machine learning run.
2. View the **Global Importance** chart, which shows the overall global feature importance.
3. View the **Summary Importance** chart, which shows each data point from the test data in a *swarm*, *violin*, or *box* plot.
4. Select an individual point to see the **Local Feature Importance** for the individual prediction for the selected data point.

> **More Information**: For more information Automated machine Learning, see the [Azure ML documentation](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-interpretability-automl).