#  Understanding the automated ML generated forecast model using model explainability 
In this exercise, you will retrieve the automated ML model you created in the previous exercise and then apply the model explainability features of the Azure Machine Learning SDK against it to gain insight into the features that most impact its predictions.

In [1]:
import os
import numpy as np
import pandas as pd
import logging

import azureml
from azureml.core import Run
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.run import Run
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig

import pickle

# Verify AML SDK Installed
# view version history at https://pypi.org/project/azureml-sdk/#history 
print("SDK Version:", azureml.core.VERSION)

import sklearn

sklearn_version = sklearn.__version__
print('The scikit-learn version is {}.'.format(sklearn_version))

SDK Version: 1.0.43
The scikit-learn version is 0.20.3.


## Load the training and test data
Run the following cell to load the data.

In [2]:
# Load our training data set
data = pd.read_csv('./daily-battery-time-series.csv', delimiter=',')
data = data[['Date','Battery_ID','Battery_Age_Days','Number_Of_Trips','Daily_Trip_Duration','Daily_Cycles_Used', 'Lifetime_Cycles_Used', 'Battery_Rated_Cycles']]

field_to_predict = 'Daily_Cycles_Used'
X_train = data.iloc[:1000][['Date','Battery_ID','Battery_Age_Days','Daily_Trip_Duration',field_to_predict]]
X_test = data.iloc[-500:][['Date','Battery_ID','Battery_Age_Days','Daily_Trip_Duration',field_to_predict]] 

y_train = X_train.pop(field_to_predict).values
y_test = X_test.pop(field_to_predict).values

### Setup
To begin, you will need to provide the following information about your Azure Subscription.

**If you are using your own Azure subscription, please provide names for subscription_id, resource_group, workspace_name and workspace_region to use.** Note that the workspace needs to be of type [Machine Learning Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace).

**If an environment is provided to you be sure to replace XXXXX in the values below with your unique identifier.**

In the following cell, be sure to set the values for `subscription_id`, `resource_group`, `workspace_name` and `workspace_region` as directed by the comments (*these values can be acquired from the Azure Portal*).

To get these values, do the following:
1. Navigate to the Azure Portal and login with the credentials provided.
2. From the left hand menu, under Favorites, select `Resource Groups`.
3. In the list, select the resource group with the name similar to `XXXXX`.
4. From the Overview tab, capture the desired values.

Execute the following cell by selecting the `>|Run` button in the command bar above.

In [3]:
#Provide the Subscription ID of your existing Azure subscription
subscription_id = "8c924580-ce70-48d0-a031-1b21726acc1a" # <- needs to be the subscription with the Quick-Starts resource group

#Provide values for the existing Resource Group 
resource_group = "QuickStarts" # <- replace XXXXX with your unique identifier

#Provide the Workspace Name and Azure Region of the Azure Machine Learning Workspace
workspace_name = "quick-starts-ws" # <- replace XXXXX with your unique identifier
workspace_region = "eastus" # <- region of your Quick-Starts resource group 

#Provide the name of the Automated ML experiment you executed previously
experiment_name = "Battery-Cycles-zst-3"

## Connect to the Azure Machine Learning Workspace

Run the following cell to connect the Azure Machine Learning **Workspace** created in part 1.

**Important Note**: You will be prompted to login in the text that is output below the cell. Be sure to navigate to the URL displayed and enter the code that is provided. Once you have entered the code, return to this notebook and wait for the output to read `Workspace configuration succeeded`.

In [4]:
# By using the exist_ok param, if the worskpace already exists we get a reference to the existing workspace
ws = Workspace.create(
    name = workspace_name,
    subscription_id = subscription_id,
    resource_group = resource_group, 
    location = workspace_region,
    exist_ok = True)

print("Workspace Provisioning complete.")

Workspace Provisioning complete.


Retrieve the Run from the Experiment:

In [5]:
existing_experiment = Experiment(ws,experiment_name)
run = list(Run.list(existing_experiment))[0]

Examine the details of the run:

In [6]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Retrieve the underlying AutoMLRun to get at the best model and child run objects:

In [7]:
from azureml.train.automl.run import AutoMLRun
automl_run = AutoMLRun(existing_experiment, run.id)
best_run, best_model = automl_run.get_output()

In [9]:
from azureml.train.automl.automlexplainer import explain_model

shap_values, expected_values, sorted_global_importance_values, sorted_global_importance_names, _ , _ = explain_model(best_model, X_train, X_test, best_run=best_run, 
                                                                                                             y_train=y_train)
#Overall feature importance
feature_importance = dict(zip(sorted_global_importance_names, sorted_global_importance_values))


Run the following cell to render the feature importance using a Pandas DataFrame. Which feature had the greatest importance globally on the model?

In [10]:
features_df = pd.DataFrame(list(zip(sorted_global_importance_names, sorted_global_importance_values)), dtype=float)
pd.options.display.float_format = '{:.20g}'.format
features_df

Unnamed: 0,0,1
0,Daily_Trip_Duration,0.0090999663936463
1,week,5.094743458131866e-05
2,wday,4.679478201891244e-05
3,Battery_Age_Days,4.574088205361588e-05
4,qday,3.954948205435811e-05
5,day,3.4172700212751704e-05
6,half,1.863518682604234e-05
7,month,1.4828523295098928e-05
8,quarter,4.193429935833817e-06
9,grain_Battery_ID_0,0.0


As you might have guessed, the `Daily_Trip_Duration` feature has the greatest impact on the `Daily_Cycles_Used` forecast.