<h1 style="color:#3da1da;font-size: 300%;" id="timeshap tutorial" align="center"  >TimeSHAP Tutorial - TensorFlow - AReM dataset</h1><p>&nbsp;

<a id='top_cell'></a>

## Table of contents
1. [Data Processing](#1.-Data-Processing)
  1. [Data Loading](#1.1-Data-Loading)
  2. [Data Treatment](#1.2-Data-Treatment)
2. [Model](#2.-Model)
  1. [Model Definition](#2.1-Model-Definition)
  2. [Model Training](#2.2-Model-Training)
3. [TimeSHAP](#3.-TimeSHAP)
  1. [Local Explanations](#3.1-Local-Explanations)
  2. [Global Explanations](#3.2-Global-Explanations)
  3. [Individual Plots](#3.3-Individual-Plots)
    

# TimeSHAP

TimeSHAP is a model-agnostic, recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. 

TimeSHAP computes local event/timestamp- feature-, and cell-level attributions. 
    
Aditionally TimeSHAP also computes global event- and feature-level explanations.
    
As sequences can be arbitrarily long, TimeSHAP also implements a pruning algorithm based on Shapley Values, 
that finds a subset of consecutive, recent events that contribute the most to the decision.

---
# 1. Data-Processing
---

In [1]:
pip install timeshap

Collecting timeshap
  Downloading timeshap-1.0.2-py3-none-any.whl (65 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.9/65.9 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting feedzai-altair-theme
  Downloading feedzai_altair_theme-1.1.1-py3-none-any.whl (11 kB)
Collecting typing-extensions==4.0.*
  Downloading typing_extensions-4.0.1-py3-none-any.whl (22 kB)
Installing collected packages: typing-extensions, feedzai-altair-theme, timeshap
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.1.1
    Uninstalling typing_extensions-4.1.1:
      Successfully uninstalled typing_extensions-4.1.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-io 0.21.0 requires tensorflow-io-gcs-filesystem==0.21.0, which is not installed.
tensorflow 2.6.4 requires h5py~=3.1.0, but

In [2]:
import pandas as pd
import numpy as np
import timeshap

np.random.seed(42)

import warnings
warnings.filterwarnings('ignore')

In [3]:
from timeshap import __version__
__version__

'1.0.2'

In [4]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

## 1.1 Data-Loading

In [5]:
import os
import re

data_directories = next(os.walk("../input/arem1108/AReM"))[1]

all_csvs = []
for folder in data_directories:
    if folder in ['bending1', 'bending2']:
        continue
    folder_csvs = next(os.walk(f"../input/arem1108/AReM/{folder}"))[2]
    for data_csv in folder_csvs:
        if data_csv == 'dataset8.csv' and folder == 'sitting':
            # this dataset only has 479 instances
            # it is possible to use it, but would require padding logic
            continue
        loaded_data = pd.read_csv(f"../input/arem1108/AReM/{folder}/{data_csv}", skiprows=4)
        print(f"{folder}/{data_csv} ------ {loaded_data.shape}")
        
        csv_id = re.findall(r'\d+', data_csv)[0]
        loaded_data['id'] = csv_id
        loaded_data['all_id'] = f"{folder}_{csv_id}"
        loaded_data['activity'] = folder
        all_csvs.append(loaded_data)
all_data = pd.concat(all_csvs)
raw_model_features = ['avg_rss12', 'var_rss12', 'avg_rss13', 'var_rss13', 'avg_rss23', 'var_rss23']
all_data.columns = ['timestamp', 'avg_rss12', 'var_rss12', 'avg_rss13', 'var_rss13', 'avg_rss23', 'var_rss23', 'id', 'all_id', 'activity']

sitting/dataset15.csv ------ (480, 7)
sitting/dataset2.csv ------ (480, 7)
sitting/dataset11.csv ------ (480, 7)
sitting/dataset13.csv ------ (480, 7)
sitting/dataset10.csv ------ (480, 7)
sitting/dataset4.csv ------ (480, 7)
sitting/dataset1.csv ------ (480, 7)
sitting/dataset9.csv ------ (480, 7)
sitting/dataset7.csv ------ (480, 7)
sitting/dataset3.csv ------ (480, 7)
sitting/dataset12.csv ------ (480, 7)
sitting/dataset6.csv ------ (480, 7)
sitting/dataset14.csv ------ (480, 7)
sitting/dataset5.csv ------ (480, 7)
walking/dataset15.csv ------ (480, 7)
walking/dataset8.csv ------ (480, 7)
walking/dataset2.csv ------ (480, 7)
walking/dataset11.csv ------ (480, 7)
walking/dataset13.csv ------ (480, 7)
walking/dataset10.csv ------ (480, 7)
walking/dataset4.csv ------ (480, 7)
walking/dataset1.csv ------ (480, 7)
walking/dataset9.csv ------ (480, 7)
walking/dataset7.csv ------ (480, 7)
walking/dataset3.csv ------ (480, 7)
walking/dataset12.csv ------ (480, 7)
walking/dataset6.csv ------

## 1.2 Data Treatment

### Separate in train and test

In [6]:
# choose ids to use for test
ids_for_test = np.random.choice(all_data['id'].unique(), size = 4, replace=False)

d_train =  all_data[~all_data['id'].isin(ids_for_test)]
d_test = all_data[all_data['id'].isin(ids_for_test)]

In [7]:
all_data.shape

(35520, 10)

In [8]:
###  Normalize Features

###  Normalize Features

In [9]:
class NumericalNormalizer:
    def __init__(self, fields: list):
        self.metrics = {}
        self.fields = fields

    def fit(self, df: pd.DataFrame ) -> list:
        means = df[self.fields].mean()
        std = df[self.fields].std()
        for field in self.fields:
            field_mean = means[field]
            field_stddev = std[field]
            self.metrics[field] = {'mean': field_mean, 'std': field_stddev}

    def transform(self, df: pd.DataFrame) -> pd.DataFrame:
        # Transform to zero-mean and unit variance.
        for field in self.fields:
            f_mean = self.metrics[field]['mean']
            f_stddev = self.metrics[field]['std']
            # OUTLIER CLIPPING to [avg-3*std, avg+3*avg]
            df[field] = df[field].apply(lambda x: f_mean - 3 * f_stddev if x < f_mean - 3 * f_stddev else x)
            df[field] = df[field].apply(lambda x: f_mean + 3 * f_stddev if x > f_mean + 3 * f_stddev else x)
            if f_stddev > 1e-5:
                df[f'p_{field}_normalized'] = df[field].apply(lambda x: ((x - f_mean)/f_stddev))
            else:
                df[f'p_{field}_normalized'] = df[field].apply(lambda x: x * 0)
        return df

In [10]:
#all features are numerical
normalizor = NumericalNormalizer(raw_model_features)
normalizor.fit(d_train)
d_train_normalized = normalizor.transform(d_train)
d_test_normalized = normalizor.transform(d_test)

### Features

In [11]:
model_features = [f"p_{x}_normalized" for x in raw_model_features]
time_feat = 'timestamp'
label_feat = 'activity'
sequence_id_feat = 'all_id'

plot_feats = {
    'p_avg_rss12_normalized': "Mean Chest <-> Right Ankle",
    'p_var_rss12_normalized': "STD Chest <-> Right Ankle",
    'p_avg_rss13_normalized': "Mean Chest <-> Left Ankle",
    'p_var_rss13_normalized': "STD Chest <-> Left Ankle",
    'p_avg_rss23_normalized': "Mean Right Ankle <-> Left Ankle",
    'p_var_rss23_normalized': "STD Right Ankle <-> Left Ankle",
}

### Transform the dataset from multi-label to binary classification

In [12]:
# possible activities ['cycling', 'lying', 'sitting', 'standing', 'walking']
#Select the activity to predict
chosen_activity = 'cycling'

d_train_normalized['label'] = d_train_normalized['activity'].apply(lambda x: int(x == chosen_activity))
d_test_normalized['label'] = d_test_normalized['activity'].apply(lambda x: int(x == chosen_activity))

This example notebook requires TensorFlow!

Install it if you haven't already:
```
!pip install tensorflow
```

In [13]:
def df_to_numpy(df, model_feats, label_feat, group_by_feat, timestamp_Feat):
    sequence_length = len(df[timestamp_Feat].unique())

    data_tensor = np.zeros(
        (len(df[group_by_feat].unique()), sequence_length, len(model_feats)))
    labels_tensor = np.zeros((len(df[group_by_feat].unique()), 1))

    for i, name in enumerate(df[group_by_feat].unique()):
        name_data = df[df[group_by_feat] == name]
        sorted_data = name_data.sort_values(timestamp_Feat)

        data_x = sorted_data[model_feats].values
        labels = sorted_data[label_feat].values
        assert labels.sum() == 0 or labels.sum() == len(labels)
        data_tensor[i, :, :] = data_x
        labels_tensor[i, :] = labels[0]
    return data_tensor, labels_tensor

In [14]:
X_train, y_train = df_to_numpy(d_train_normalized, model_features, 'label', sequence_id_feat, time_feat)

X_test, y_test = df_to_numpy(d_test_normalized, model_features, 'label', sequence_id_feat, time_feat)

___
# 2. Model


This example notebook requires Tensorflow!

Install it if you haven't already:
```
!pip install tensorflow
```

## 2.1 Model Definition

In [15]:
import tensorflow as tf

inputs = tf.keras.layers.Input(shape=(None, 6))
lstm1 = tf.keras.layers.LSTM(64)(inputs)
ff1 = tf.keras.layers.Dense(64, activation='relu')(lstm1)
ff2 = tf.keras.layers.Dense(1, activation='sigmoid')(ff1)
model = tf.keras.models.Model(inputs=inputs, outputs=ff2)

2022-11-20 01:46:23.938998: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-20 01:46:23.939964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-20 01:46:24.254970: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-20 01:46:24.255978: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-20 01:46:24.256749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from S

## 2.2 Model Training

In [16]:
model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              optimizer=tf.keras.optimizers.Adam(0.001))

model.fit(X_train, y_train, epochs=20, batch_size=55, validation_data=(X_test, y_test))

2022-11-20 01:46:29.715373: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/20


2022-11-20 01:46:32.952410: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f9e1830f2d0>

---
# 3. TimeSHAP
---

In [17]:
import timeshap

### Model entry point

In [18]:
f = lambda x: model.predict(x)

### Baseline event

In [19]:
from timeshap.utils import calc_avg_event
average_event = calc_avg_event(d_train_normalized, numerical_feats=model_features, categorical_feats=[])

In [20]:
average_event

Unnamed: 0,p_avg_rss12_normalized,p_var_rss12_normalized,p_avg_rss13_normalized,p_var_rss13_normalized,p_avg_rss23_normalized,p_var_rss23_normalized
0,0.086757,-0.535183,0.134042,-0.412379,0.118957,-0.39324


### Baseline event

In [21]:
from timeshap.utils import calc_avg_sequence
average_sequence = calc_avg_sequence(d_train_normalized, numerical_feats=model_features, categorical_feats=[],model_features=model_features, entity_col=sequence_id_feat)

In [22]:
average_sequence

array([[0.0000e+00, 3.9960e+01, 5.0000e-01, 1.4500e+01, 1.1850e+00,
        1.5250e+01],
       [2.5000e+02, 4.0125e+01, 6.0500e-01, 1.4750e+01, 9.7000e-01,
        1.4875e+01],
       [5.0000e+02, 3.9835e+01, 5.0000e-01, 1.4125e+01, 8.7000e-01,
        1.6625e+01],
       ...,
       [1.1925e+05, 4.0125e+01, 8.3000e-01, 1.6000e+01, 9.7000e-01,
        1.4165e+01],
       [1.1950e+05, 3.9750e+01, 6.0500e-01, 1.4000e+01, 1.2050e+00,
        1.5000e+01],
       [1.1975e+05, 3.9750e+01, 6.6000e-01, 1.4750e+01, 9.4000e-01,
        1.5000e+01]])

### Average score over baseline

In [23]:
from timeshap.utils import get_avg_score_with_avg_event
avg_score_over_len = get_avg_score_with_avg_event(f, average_event, top=480)

## 3.1 Local Explanations

### Select sequences to explain

In [24]:
positive_sequence_id = f"cycling_{np.random.choice(ids_for_test)}"
pos_x_pd = d_test_normalized[d_test_normalized['all_id'] == positive_sequence_id]

# select model features only
pos_x_data = pos_x_pd[model_features]
# convert the instance to numpy so TimeSHAP receives it
pos_x_data = np.expand_dims(pos_x_data.to_numpy().copy(), axis=0)

### Local Report on positive instance

In [25]:
from timeshap.explainer import local_report

pruning_dict = {'tol': 0.025}
event_dict = {'rs': 42, 'nsamples': 32000}
feature_dict = {'rs': 42, 'nsamples': 32000, 'feature_names': model_features, 'plot_features': plot_feats}
cell_dict = {'rs': 42, 'nsamples': 32000, 'top_x_feats': 2, 'top_x_events': 2}
local_report(f, pos_x_data, pruning_dict, event_dict, feature_dict, cell_dict=cell_dict, entity_uuid=positive_sequence_id, entity_col='all_id', baseline=average_event)

Assuming all features are model features


In [26]:
pos_x_data

array([[[-2.59368832e-02,  3.14850274e-01,  2.79856335e-01,
          1.51355233e+00, -1.75500776e-01,  8.22066977e-01],
        [-1.09064083e+00,  3.55980899e-01,  1.30055749e+00,
          1.59564151e-01,  8.39396605e-01, -7.73022905e-01],
        [-1.34294035e+00,  1.79098268e+00,  1.05753341e+00,
          1.93959171e+00,  1.00232984e+00, -7.55494444e-01],
        ...,
        [-1.04859091e+00,  2.23884948e+00,  1.25195267e+00,
          1.22174453e+00,  1.62461702e+00, -4.98410361e-01],
        [ 2.65706170e-03, -7.81756912e-02,  8.14509322e-01,
          8.36941237e-02,  3.64338257e-01, -1.77065400e-03],
        [ 3.81106333e-01, -5.67173114e-01, -1.73140559e-01,
          4.16355012e-01,  5.76347768e-01,  1.50142668e-01]]])

In [27]:
d_test_normalized

Unnamed: 0,timestamp,avg_rss12,var_rss12,avg_rss13,var_rss13,avg_rss23,var_rss23,id,all_id,activity,p_avg_rss12_normalized,p_var_rss12_normalized,p_avg_rss13_normalized,p_var_rss13_normalized,p_avg_rss23_normalized,p_var_rss23_normalized,label
0,0,41.50,0.50,2.67,2.360000,8.75,0.83,15,sitting_15,sitting,0.339056,-0.535183,-2.117333,0.416355,-1.107950,-0.562681,0
1,250,41.50,0.50,1.00,0.000000,7.50,0.87,15,sitting_15,sitting,0.339056,-0.535183,-2.442013,-0.960978,-1.353331,-0.539310,0
2,500,41.50,0.50,0.00,0.000000,7.75,0.43,15,sitting_15,sitting,0.339056,-0.535183,-2.636433,-0.960978,-1.304255,-0.796394,0
3,750,41.50,0.50,1.50,0.500000,8.00,0.00,15,sitting_15,sitting,0.339056,-0.535183,-2.344804,-0.669170,-1.255179,-1.047635,0
4,1000,41.50,0.50,2.00,0.000000,7.33,0.47,15,sitting_15,sitting,0.339056,-0.535183,-2.247594,-0.960978,-1.386703,-0.773023,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
475,118750,36.00,4.24,15.50,6.786964,17.00,3.67,5,cycling_5,cycling,-0.586042,1.174023,0.377066,3.000000,0.511567,1.096680,1
476,119000,34.00,2.12,22.50,4.500000,16.00,1.87,5,cycling_5,cycling,-0.922441,0.205169,1.738001,1.665292,0.315262,0.044972,1
477,119250,33.25,6.57,20.00,3.740000,22.67,0.94,5,cycling_5,cycling,-1.048591,2.238849,1.251953,1.221745,1.624617,-0.498410,1
478,119500,39.50,1.50,17.75,1.790000,16.25,1.79,5,cycling_5,cycling,0.002657,-0.078176,0.814509,0.083694,0.364338,-0.001771,1


## 3.2 Global Explanations

### Explain all 

TimeSHAP offers methods to explain all instances and save as CSV.
This allows for global explanations and local plots with no calculation delay.

In [28]:
from timeshap.explainer import global_report

pos_dataset = d_test_normalized[d_test_normalized['label'] == 1]
schema = schema = list(pos_dataset.columns)
pruning_dict = {'tol': [0.05, 0.075], 'path': './prun_all_tf.csv'}
event_dict = {'path': './event_all_tf.csv', 'rs': 42, 'nsamples': 32000}
feature_dict = {'path': './feature_all_tf.csv', 'rs': 42, 'nsamples': 32000, 'feature_names': model_features, 'plot_features': plot_feats,}
prun_stats, global_plot = global_report(f, pos_dataset, pruning_dict, event_dict, feature_dict, average_event, model_features, schema, sequence_id_feat, time_feat, )
prun_stats

Calculating pruning algorithm
Calculating event data
Calculating feat data
Calculating pruning indexes


Unnamed: 0,Tolerance,Mean,Std
0,0.05,24.5,1.290994
1,0.075,21.25,0.5
2,No Pruning,480.0,0.0


In [29]:
global_plot

## 3.3 Individual Plots

### Local Plots

In [30]:
from timeshap.plot import plot_temp_coalition_pruning, plot_event_heatmap, plot_feat_barplot, plot_cell_level
from timeshap.explainer import local_pruning, local_event, local_feat, local_cell_level

In [31]:
# select model features only
pos_x_data = pos_x_pd[model_features]
# convert the instance to numpy so TimeSHAP receives it
pos_x_data = np.expand_dims(pos_x_data.to_numpy().copy(), axis=0)

##### Pruning algorithm

In [32]:
pruning_dict = {'tol': 0.025,}
coal_plot_data, coal_prun_idx = local_pruning(f, pos_x_data, pruning_dict, average_event, positive_sequence_id, sequence_id_feat, False)
# coal_prun_idx is in negative terms
pruning_idx = pos_x_data.shape[1] + coal_prun_idx
pruning_plot = plot_temp_coalition_pruning(coal_plot_data, coal_prun_idx, plot_limit=40)
pruning_plot

##### Event-level explanation

In [33]:
event_dict = {'rs': 42, 'nsamples': 32000}
event_data = local_event(f, pos_x_data, event_dict, positive_sequence_id, sequence_id_feat, average_event, pruning_idx)
event_plot = plot_event_heatmap(event_data)
event_plot

##### Feature-level explanation

In [34]:
feature_dict = {'rs': 42, 'nsamples': 32000, 'feature_names': model_features, 'plot_features': plot_feats}
feature_data = local_feat(f, pos_x_data, feature_dict, positive_sequence_id, sequence_id_feat, average_event, pruning_idx)
feature_plot = plot_feat_barplot(feature_data, feature_dict.get('top_feats'), feature_dict.get('plot_features'))
feature_plot

##### Cell-level explanation

In [35]:
cell_dict = {'rs': 42, 'nsamples': 32000, 'top_x_events': 3, 'top_x_feats': 3}
cell_data = local_cell_level(f, pos_x_data, cell_dict, event_data, feature_data, positive_sequence_id, sequence_id_feat, average_event, pruning_idx)
feat_names = list(feature_data['Feature'].values)[:-1] # exclude pruned events
cell_plot = plot_cell_level(cell_data, feat_names, feature_dict.get('plot_features'))
cell_plot

### Global Plots

In [36]:
from timeshap.explainer import prune_all, pruning_statistics, event_explain_all, feat_explain_all
from timeshap.plot import plot_global_event, plot_global_feat

pos_dataset = d_test_normalized[d_test_normalized['label'] == 1]

In [37]:
pos_dataset

Unnamed: 0,timestamp,avg_rss12,var_rss12,avg_rss13,var_rss13,avg_rss23,var_rss23,id,all_id,activity,p_avg_rss12_normalized,p_var_rss12_normalized,p_avg_rss13_normalized,p_var_rss13_normalized,p_avg_rss23_normalized,p_var_rss23_normalized,label
0,0,32.00,4.85,17.50,3.350000,22.50,3.20,15,cycling_15,cycling,-1.258841,1.452798,0.765905,0.994134,1.591245,0.822067,1
1,250,40.50,1.12,14.00,2.240000,21.75,1.30,15,cycling_15,cycling,0.170857,-0.251838,0.085437,0.346321,1.444016,-0.288069,1
2,500,40.50,2.60,11.33,4.500000,18.25,5.31,15,cycling_15,cycling,0.170857,0.424532,-0.433662,1.665292,0.756948,2.054902,1
3,750,34.50,1.50,20.67,2.870000,19.00,2.83,15,cycling_15,cycling,-0.838341,-0.078176,1.382214,0.713999,0.904177,0.605883,1
4,1000,34.50,1.50,21.25,3.270000,18.25,4.38,15,cycling_15,cycling,-0.838341,-0.078176,1.494977,0.947445,0.756948,1.511520,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
475,118750,36.00,4.24,15.50,6.786964,17.00,3.67,5,cycling_5,cycling,-0.586042,1.174023,0.377066,3.000000,0.511567,1.096680,1
476,119000,34.00,2.12,22.50,4.500000,16.00,1.87,5,cycling_5,cycling,-0.922441,0.205169,1.738001,1.665292,0.315262,0.044972,1
477,119250,33.25,6.57,20.00,3.740000,22.67,0.94,5,cycling_5,cycling,-1.048591,2.238849,1.251953,1.221745,1.624617,-0.498410,1
478,119500,39.50,1.50,17.75,1.790000,16.25,1.79,5,cycling_5,cycling,0.002657,-0.078176,0.814509,0.083694,0.364338,-0.001771,1


##### Pruning statistics

In [39]:
pruning_dict = {'tol': [0.05, 0.075], 'path': 'outputs/prun_all_tf.csv'}
prun_indexes = prune_all(f, pos_dataset, pruning_dict, average_event, model_features, schema, sequence_id_feat, time_feat)
pruning_stats = pruning_statistics(prun_indexes, pruning_dict.get('tol'))
pruning_stats

Unnamed: 0,Tolerance,Mean,Std


In [40]:
prun_indexes

Unnamed: 0,Entity,Tolerance,Pruning idx


##### Global event-level

In [41]:
event_dict = {'path': './event_all_tf.csv', 'rs': 42, 'nsamples': 32000}
event_data = event_explain_all(f, pos_dataset, event_dict, prun_indexes, average_event, model_features, schema, sequence_id_feat, time_feat, verbose = True)
event_global_plot = plot_global_event(event_data)
event_global_plot

##### Global feature-level

In [51]:
event_data

Unnamed: 0,Random Seed,NSamples,Event,Shapley Value,t (event index),Entity,Tolerance
0,42,32000,Event -1,0.025192,0,cycling_15,0.050
1,42,32000,Event -2,0.039286,-1,cycling_15,0.050
2,42,32000,Event -3,0.011212,-2,cycling_15,0.050
3,42,32000,Event -4,0.015135,-3,cycling_15,0.050
4,42,32000,Event -5,0.029887,-4,cycling_15,0.050
...,...,...,...,...,...,...,...
186,42,32000,Event -18,0.016567,-17,cycling_5,0.075
187,42,32000,Event -19,0.009182,-18,cycling_5,0.075
188,42,32000,Event -20,0.013276,-19,cycling_5,0.075
189,42,32000,Event -21,0.003506,-20,cycling_5,0.075


In [43]:
feature_dict = {'path': './feature_all_tf.csv', 'rs': 42, 'nsamples': 32000, 'feature_names': model_features, 'plot_features': plot_feats, }
feat_data = feat_explain_all(f, pos_dataset, feature_dict, prun_indexes, average_event, model_features, schema, sequence_id_feat, time_feat)
feat_global_plot = plot_global_feat(feat_data, **feature_dict)
feat_global_plot

In [44]:
feat_data

Unnamed: 0,Random Seed,NSamples,Feature,Shapley Value,Entity,Tolerance
0,42,32000,p_avg_rss12_normalized,0.048359,cycling_15,0.05
1,42,32000,p_var_rss12_normalized,0.039539,cycling_15,0.05
2,42,32000,p_avg_rss13_normalized,0.064313,cycling_15,0.05
3,42,32000,p_var_rss13_normalized,0.174675,cycling_15,0.05
4,42,32000,p_avg_rss23_normalized,0.054835,cycling_15,0.05
5,42,32000,p_var_rss23_normalized,0.059829,cycling_15,0.05
6,42,32000,Pruned Events,0.130256,cycling_15,0.05
7,42,32000,p_avg_rss12_normalized,0.047011,cycling_15,0.075
8,42,32000,p_var_rss12_normalized,0.031326,cycling_15,0.075
9,42,32000,p_avg_rss13_normalized,0.064509,cycling_15,0.075


In [45]:
pos_dataset

Unnamed: 0,timestamp,avg_rss12,var_rss12,avg_rss13,var_rss13,avg_rss23,var_rss23,id,all_id,activity,p_avg_rss12_normalized,p_var_rss12_normalized,p_avg_rss13_normalized,p_var_rss13_normalized,p_avg_rss23_normalized,p_var_rss23_normalized,label
0,0,32.00,4.85,17.50,3.350000,22.50,3.20,15,cycling_15,cycling,-1.258841,1.452798,0.765905,0.994134,1.591245,0.822067,1
1,250,40.50,1.12,14.00,2.240000,21.75,1.30,15,cycling_15,cycling,0.170857,-0.251838,0.085437,0.346321,1.444016,-0.288069,1
2,500,40.50,2.60,11.33,4.500000,18.25,5.31,15,cycling_15,cycling,0.170857,0.424532,-0.433662,1.665292,0.756948,2.054902,1
3,750,34.50,1.50,20.67,2.870000,19.00,2.83,15,cycling_15,cycling,-0.838341,-0.078176,1.382214,0.713999,0.904177,0.605883,1
4,1000,34.50,1.50,21.25,3.270000,18.25,4.38,15,cycling_15,cycling,-0.838341,-0.078176,1.494977,0.947445,0.756948,1.511520,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
475,118750,36.00,4.24,15.50,6.786964,17.00,3.67,5,cycling_5,cycling,-0.586042,1.174023,0.377066,3.000000,0.511567,1.096680,1
476,119000,34.00,2.12,22.50,4.500000,16.00,1.87,5,cycling_5,cycling,-0.922441,0.205169,1.738001,1.665292,0.315262,0.044972,1
477,119250,33.25,6.57,20.00,3.740000,22.67,0.94,5,cycling_5,cycling,-1.048591,2.238849,1.251953,1.221745,1.624617,-0.498410,1
478,119500,39.50,1.50,17.75,1.790000,16.25,1.79,5,cycling_5,cycling,0.002657,-0.078176,0.814509,0.083694,0.364338,-0.001771,1


In [46]:
prun_indexes

Unnamed: 0,Entity,Tolerance,Pruning idx


In [47]:
schema

['timestamp',
 'avg_rss12',
 'var_rss12',
 'avg_rss13',
 'var_rss13',
 'avg_rss23',
 'var_rss23',
 'id',
 'all_id',
 'activity',
 'p_avg_rss12_normalized',
 'p_var_rss12_normalized',
 'p_avg_rss13_normalized',
 'p_var_rss13_normalized',
 'p_avg_rss23_normalized',
 'p_var_rss23_normalized',
 'label']

In [48]:
sequence_id_feat

'all_id'