# Install required libraries
After your install these libraries it is recommended that you **restart the notebook kernel** from the Kernel menu above. After restarting the kernel, start from the `Understanding the automated ML generated model using model explainability` section.

You can ignore any incompatibility errors. **Install the libraries only once**.

In [None]:
!pip install -U azureml-sdk
!pip install -U azureml-train-automl
!pip install -U azureml-explain-model
!pip install -U azureml-contrib-interpret
!pip install -U azureml-sdk[automl]

# Understanding the automated ML generated model using model explainability 
In this notebook, you will retrieve the best model from the automated machine learning experiment you performed previously. Then you will use the model interpretability features of the Azure Machine Learning Python SDK to indentify which features had the most impact on the prediction.

**Please make sure you have completed Exercise 1 before continuing**.

## Import required libraries

Remember to restart the kernel before proceeding.

Run the following cell to import all the modules used in this notebook.

In [None]:
import azureml
from azureml.core import Run
from azureml.core import Workspace
from azureml.core import Model
from azureml.core import Experiment

import azureml.automl

from azureml.train.automl.run import AutoMLRun

from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations
from azureml.interpret import ExplanationClient
from azureml.interpret import MimicWrapper
from interpret_community.mimic.models import LGBMExplainableModel

# Verify AML SDK Installed 
print("SDK Version:", azureml.core.VERSION)

import pandas as pd
pd.options.display.float_format = '{:.10g}'.format
print("pandas Version:", pd.__version__)

**Setup constants**

In [None]:
#Provide the name of the Experiment you used with Automated Machine Learning
experiment_name = 'automl-regression'

# the train data is available here
train_data_url = ('https://quickstartsws9073123377.blob.core.windows.net/'
                  'azureml-blobstore-0d1c4218-a5f9-418b-bf55-902b65277b85/'
                  'training-formatted.csv')

# this is the URL to the CSV file containing a small set of test data
test_data_url = ('https://quickstartsws9073123377.blob.core.windows.net/'
                  'azureml-blobstore-0d1c4218-a5f9-418b-bf55-902b65277b85/'
                  'fleet-formatted.csv')

## Connect to the Azure Machine Learning Workspace

Run the following cell to connect the Azure Machine Learning **Workspace**.

In [None]:
ws = Workspace.from_config()
print(ws)

Find the run id of your Automated ML experiment in the Azure Machine Learning studio

In the following cell, be sure to set the value for `run_id` as directed by the comments (*this value can be acquired from the Azure Machine Learning Portal*).
To get these values, do the following:
1. Navigate to your Azure Machine Learning workspace in the Azure Portal and login with the credentials provided.
2. From the left navigation bar select `Overwiew` and then select `Launch the Azure Machine Learning studio`.
3. From the left navigation bar select `Experiments` and then identify the first run in the `automl-regression` experiment at the bottom of the run list. This should be have `Run 1` in the `Run` column and `automl` in the `Run type` column.
4. Click on `Run 1` link to open the run details screen where you can capture the `Run ID` value which should be an identifier starting with `AutoML_`.

In [None]:
#Provide the Run Id of the automl type run in your experiment 
run_id = 'AutoML_...'

# Get the best model trained with automated machine learning

Retrieve the Run from the Experiment and then get the underlying AutoMLRun to get at the best model and child run objects:

In [None]:
existing_experiment = Experiment(ws,experiment_name)

automl_run = AutoMLRun(existing_experiment, run_id)
automl_run

Retrieve the best run and best model from the automated machine learning run by executing the following cell:

In [None]:
best_run, best_model = automl_run.get_output()

## Load the train and test data

Model interpretability works by passing training and test data thru the created model and evaluating the result of which values had a given impact. 

Load the training and test data by running the following cell.

In [None]:
# load the original training data
train_data = pd.read_csv(train_data_url)
X_train = train_data.iloc[:,1:74]
y_train = train_data.iloc[:,0].values.flatten()

# load some test vehicle data that the model has not seen
X_test = pd.read_csv(test_data_url)
X_test = X_test.drop(columns=["Car_ID", "Battery_Age"])
X_test.rename(columns={'Twelve_hourly_temperature_forecast_for_next_31_days_reversed': 
                       'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first'}, 
              inplace=True)
X_test


# Get the explanations for best model produced by the Automated ML experiment

For automated machine learning models, you can use `ExplanationClient` to examine the features that were most impactful to the model. The best run already has explanations computed, so we only need to download them. 

Run the following cell to render the feature importance of the `best model` using the features Pandas DataFrame. Which feature had the greatest importance globally on the model?

In [None]:
has_explanation = False
try:
    client = ExplanationClient.from_run(best_run)
    # get model explanation data
    explanation = client.download_model_explanation()
    # or only get the top k (e.g., 20) most important features with their importance values
    explanation = client.download_model_explanation(top_k=20)
    global_importance_values = explanation.get_ranked_global_values()
    global_importance_names = explanation.get_ranked_global_names()
    df = pd.DataFrame(list(zip(global_importance_names, global_importance_values)),
                      columns=['FeatureName', 'FeatureImportance'])
    has_explanation = True
except:
    print('AutoML Run did not generate explanations!')

Run the following cell to render the feature importance of the `best model` using the features Pandas DataFrame created above. Which feature had the greatest importance globally on the model?

In [None]:
if has_explanation:
    print(df.head(10))
else:
    print('AutoML Run did not generate explanations!')

#  Use MimicExplainer for computing explanations for the best model

In [None]:
automl_explainer_setup_obj = automl_setup_model_explanations(best_model, X=X_train, 
                                                             X_test=X_test, y=y_train, 
                                                             task='regression')
        
explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, 
                 init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,
                 features=automl_explainer_setup_obj.engineered_feature_names, 
                 feature_maps=[automl_explainer_setup_obj.feature_map],
                 classes=automl_explainer_setup_obj.classes)

raw_explanations = explainer.explain(['local', 'global'], get_raw=True, 
                             raw_feature_names=automl_explainer_setup_obj.raw_feature_names,
                             eval_dataset=automl_explainer_setup_obj.X_test_transform)

engineered_explanations = explainer.explain(['local', 'global'], 
                                            eval_dataset=automl_explainer_setup_obj.X_test_transform)

## Raw feature importance

In [None]:
keys = list(raw_explanations.get_feature_importance_dict().keys())
values = list(raw_explanations.get_feature_importance_dict().values())
df = pd.DataFrame(list(zip(keys, values)), 
                  columns=['FeatureName', 'FeatureImportance'])

df.head(10)

## Engineered feature importance

In [None]:
keys = list(engineered_explanations.get_feature_importance_dict().keys())
values = list(engineered_explanations.get_feature_importance_dict().values())
df = pd.DataFrame(list(zip(keys, values)), 
                  columns=['FeatureName', 'FeatureImportance'])

df.head(10)