# Calculating SHAP values

**ATTENTION:**

Notebook language: **Python**

## Loading model and data

In [1]:
import pickle5 as pickle

In [2]:
with open('./model/model.pickle', 'rb') as fp:
    model = pickle.load(fp) 

In [3]:
import pandas as pd
import numpy as np

In [4]:
df_preprocessed = pd.read_csv('./data/data_preprocessed.csv', index_col=0)
df_raw = pd.read_csv('./data/raw_data.csv', index_col=0)

In [5]:
X_preprocessed = df_preprocessed.drop('status', axis=1)

In [6]:
y_preprocessed = df_preprocessed.status

## Calculating predictions (background prediction)

In [6]:
y_hat = pd.DataFrame(model.predict_proba(X_preprocessed)[:, 1])
y_hat.to_csv('./data/y_hat.csv')

## Creating explainer

In [7]:
import dalex as dx

In [8]:
explainer = dx.Explainer(model, X_preprocessed, y_preprocessed)

Preparation of a new explainer is initiated

  -> data              : 361035 rows 7 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 361035 values
  -> model_class       : scripts.RandomForestModified.RangerForestClassifierModified (default)
  -> label             : Not specified, model's class short name will be used. (default)
  -> predict function  : <function yhat_proba_default at 0x7fdfd5656d30> will be used (default)
  -> predict function  : Accepts pandas.DataFrame and numpy.ndarray.
  -> predicted values  : min = 0.00584, mean = 0.295, max = 0.995
  -> model type        : classification will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.973, mean = -0.188, max = 0.95
  -> model_info        : package scripts

A new explainer has been created!


## Functions to calculate explanations

In [12]:
import os

path = './results'

if not os.path.exists(path):
    os.makedirs(path)
else:
    print("The folder already exists")

In [11]:
from scripts.calculate_SHAP import extract_preprocessed__calculate__save

## Calculations

### Robert Lewandowski

#### Season 2021

In [12]:
subset = df_raw[np.logical_and(df_raw['player'] == 'Robert Lewandowski', df_raw['season'] == 2021)]

In [15]:
extract_preprocessed__calculate__save(
    main_dir = './results', 
    task_hierarchy = ['lewandowski', 'season2021'],
    explainer = explainer, 
    subset = subset, 
    df_preprocessed = df_preprocessed,
    target = 'status'
)

The folder already exists


#### Season 2020

In [16]:
subset = df_raw[np.logical_and(df_raw['player'] == 'Robert Lewandowski', df_raw['season'] == 2020)]

In [17]:
extract_preprocessed__calculate__save(
    main_dir = './results', 
    task_hierarchy = ['lewandowski', 'season2020'],
    explainer = explainer, 
    subset = subset, 
    df_preprocessed = df_preprocessed,
    target = 'status'
)

The folder already exists


#### Season 2019

In [18]:
subset = df_raw[np.logical_and(df_raw['player'] == 'Robert Lewandowski', df_raw['season'] == 2019)]

In [19]:
extract_preprocessed__calculate__save(
    main_dir = './results', 
    task_hierarchy = ['lewandowski', 'season2019'],
    explainer = explainer, 
    subset = subset, 
    df_preprocessed = df_preprocessed,
    target = 'status'
)

The folder already exists


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=638a36e2-efff-486f-858d-cbca546da2c6' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>