# Tell Me a Story! - SHAPStories Example

This notebook shows an example of how to generate SHAPstories using Ollama models.

In [17]:
import shap
import pandas as pd
import numpy as np

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from stories import SHAPstory, unwrap
from stories.llm_wrappers import OllamaWrapper

## FIFA Example

### Load data

In [18]:
# Load Data and Split
data = pd.read_csv("../data/FIFA_2018_Statistics.csv")
data = data.merge(
    data[["Date", "Team", "Goal Scored"]],
    left_on=["Date", "Opponent"],
    right_on=["Date", "Team"],
    suffixes=["", "_y"]).drop(columns=["Team_y"]).rename(columns={"Goal Scored_y": "Goal against"})

feature_names = [i for i in data.columns if data[i].dtype in [np.int64, np.int64]]
x = data[feature_names]

y = (data["Man of the Match"] == "Yes")

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

### Train and Compare Accuracy of Various Models

In [31]:
logreg_pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("classifier", LogisticRegression(max_iter=5000, solver="lbfgs"))
])

param_grid = {
    "classifier__C": np.logspace(-3, 3, 13)
}

cv = KFold(n_splits=5, shuffle=True, random_state=42)

logreg_search = GridSearchCV(
    estimator=logreg_pipe,
    param_grid=param_grid,
    cv=cv,
    scoring="roc_auc",   # swap for the metric you care about
    n_jobs=-1,
    refit=True
)

logreg_search.fit(x_train, y_train)
best_logreg = logreg_search.best_estimator_

predictions = logreg_search.predict(x_test)
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

model = unwrap(logreg_search)
preprocessor = logreg_search.best_estimator_[:-1]
x_test_preprocessed = preprocessor.transform(x_test)
x_test_preprocessed = pd.DataFrame(x_test_preprocessed, columns=x_test.columns)

Accuracy: 0.7307692307692307


Manually Created Descriptions

In [20]:
task_description = """predict whether a football team will have the "Man of the Match" winner in a FIFA Worldcup match, based on the team's statistics """

input_description = "the match"

class_descriptions = {
    0: "class for the team that will not have a player who wins the 'Man of the Match'",
    1: "class for the team that will have a player who wins the 'Man of the Match'"
}

feature_desc = [
    'Number of goals scored by the team during the match.',
    'Percentage of ball possession by the team during the match.',
    'Number of attempts or shots taken by the team.',
    'Number of shots that were on target.',
    'Number of shots that went off target.',
    'Number of shots that were blocked by the opponent.',
    'Number of corner kicks taken by the team.',
    'Number of times the team was caught offside.',
    'Number of free kicks taken by the team.',
    "Number of saves made by the team's goalkeeper.",
    'Percentage of passes that successfully reached a teammate.',
    'Total number of passes made by the team.',
    "Total distance covered by the team's players during the match, in kilometers.",
    'Number of fouls committed by the team.',
    'Number of yellow cards received by the team.',
    'Number of yellow-red cards received by the team.',
    'Number of red cards received by the team.',
    'Number of goals scored by the team during the penalty shootout.',
    "Number of goals that were conceded by the team's goalkeeper.",
]

features_df = pd.DataFrame({
    "Feature name": list(x.columns),
    "Feature description": feature_desc
})

In [21]:
#llm = OllamaWrapper("gemma3:4b")
llm = OllamaWrapper("llama3.1:8b", temperature=0.01)

In [None]:
masker = shap.maskers.Independent(x_test_preprocessed)
explainer = shap.LinearExplainer(model, masker=masker)

[ 1.36943342 -0.35041932  0.04761816 -0.05062415  0.04482779  0.22016867
  0.23428525 -1.00833151  0.24064103 -0.01579097  0.307003    0.02927064
  0.2603713   0.18055327 -0.2190477   0.0126809  -0.         -0.06280948
  0.20469938]


Generate Stories for both pre-trained random forest and SVM

In [24]:
story_generator = SHAPstory(
    logreg_search.best_estimator_, 
    explainer, 
    llm, 
    features_df, 
    task_description, 
    input_description, 
    class_descriptions)
shap_df, predictions_df = story_generator.gen_variables(x_test)
prompt = story_generator.generate_prompt(x_test, predictions_df, shap_df, 0)
print(prompt)


An AI model was used to predict whether a football team will have the "Man of the Match" winner in a FIFA Worldcup match, based on the team's statistics . 
The input features to the model include data about the match. 
The target variable represents one of the following classes:
- class label 0 represents the class for the team that will not have a player who wins the 'Man of the Match'
- class label 1 represents the class for the team that will have a player who wins the 'Man of the Match'

The AI model predicted a certain instance of the dataset to belong to the class with label 1 
(i.e. class for the team that will have a player who wins the 'Man of the Match') with probability 85.57%. 

The provided SHAP table was generated to explain this
outcome. It includes every feature along with its value for that instance, and the
SHAP value assigned to it. 

The goal of SHAP is to explain the prediction of an instance by 
computing the contribution of each feature to the prediction. The
SH

In [25]:
stories = story_generator.generate_stories(x_test.iloc[:1])
print(stories[0])

As I examined the SHAP table, it became clear that the AI model had pinpointed several key factors that led to its prediction of the team winning the "Man of the Match" award. The most influential feature was the number of goals scored by the team, which contributed a significant 1.37 points to the overall prediction. This suggests that the model deemed the team's goal-scoring ability as a major strength.

However, there were also some notable negative contributions from other features. For instance, being caught offside multiple times resulted in a -1.01 point deduction, implying that the model viewed this aspect of their play as a weakness.

The team's passing accuracy and distance covered also played a significant role in the prediction, with the former contributing a positive 0.31 points and the latter adding another 0.26 points. The AI model seemed to appreciate the team's ability to maintain possession and cover long distances on the pitch.

Additionally, the number of free kicks

In [27]:
#print the model coefficients together with the feature names
print("Model Coefficients:")
for feature, coeff in zip(x.columns, model.coef_[0]):
    print(f"{feature}: {coeff}")

Model Coefficients:
Goal Scored: 2.603174962718653
Ball Possession %: -0.3667828313184956
Attempts: -0.3940137557756107
On-Target: -0.18941008058530623
Off-Target: 0.3200078321708717
Blocked: -0.3335095639603849
Corners: 1.3077749468709388
Offsides: 0.9696393748889564
Free Kicks: 0.11204944408673453
Saves: -0.020303591642769604
Pass Accuracy %: 0.30569590335590757
Passes: 0.11433448525682152
Distance Covered (Kms): -0.32341688199139534
Fouls Committed: -0.12127185186592503
Yellow Card: 0.1681021394588032
Yellow & Red: -0.03248507481350835
Red: -0.3892388560444398
Goals in PSO: 0.46306906879791354
Goal against: -3.227996998385036


In [33]:
2.603174962718653 * \
    (x_test_preprocessed.iloc[0]["Goal Scored"] -
     x_test_preprocessed["Goal Scored"].mean())

np.float64(1.3694334206710412)

In [13]:
y_test.iloc[0]

np.True_