# Tell Me a Story! - SHAPStories Example

This notebook shows an example of how to generate SHAPstories using Ollama models.

In [1]:
import shap
import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from stories import SHAPstory
from stories.llm_wrappers import OllamaWrapper

  from .autonotebook import tqdm as notebook_tqdm


## FIFA Example

### Load data

In [2]:
# Load Data and Split
data = pd.read_csv("../data/FIFA_2018_Statistics.csv")
data = data.merge(
    data[["Date", "Team", "Goal Scored"]],
    left_on=["Date", "Opponent"],
    right_on=["Date", "Team"],
    suffixes=["", "_y"]).drop(columns=["Team_y"]).rename(columns={"Goal Scored_y": "Goal against"})

feature_names = [i for i in data.columns if data[i].dtype in [np.int64, np.int64]]
x = data[feature_names]

y = (data["Man of the Match"] == "Yes")

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

### Train and Compare Accuracy of Various Models

In [3]:
model = LogisticRegression(max_iter=5000)
model.fit(x_train, y_train)
predictions = model.predict(x_test)
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

Accuracy: 0.7307692307692307


Manually Created Descriptions

In [4]:
task_description = """predict whether a football team will have the "Man of the Match" winner in a FIFA Worldcup match, based on the team's statistics """

input_description = "the match"

class_descriptions = {
    0: "class for the team that will not have a player who wins the 'Man of the Match'",
    1: "class for the team that will have a player who wins the 'Man of the Match'"
}

feature_desc = [
    'Number of goals scored by the team during the match.',
    'Percentage of ball possession by the team during the match.',
    'Number of attempts or shots taken by the team.',
    'Number of shots that were on target.',
    'Number of shots that went off target.',
    'Number of shots that were blocked by the opponent.',
    'Number of corner kicks taken by the team.',
    'Number of times the team was caught offside.',
    'Number of free kicks taken by the team.',
    "Number of saves made by the team's goalkeeper.",
    'Percentage of passes that successfully reached a teammate.',
    'Total number of passes made by the team.',
    "Total distance covered by the team's players during the match, in kilometers.",
    'Number of fouls committed by the team.',
    'Number of yellow cards received by the team.',
    'Number of yellow-red cards received by the team.',
    'Number of red cards received by the team.',
    'Number of goals scored by the team during the penalty shootout.',
    "Number of goals that were conceded by the team's goalkeeper.",
]

features_df = pd.DataFrame({
    "Feature name": list(x.columns),
    "Feature description": feature_desc
})

In [5]:
llm = OllamaWrapper("gemma3:4b")

In [6]:
masker = shap.maskers.Independent(x_test)
explainer = shap.LinearExplainer(model, masker=masker)
shap_values = explainer.shap_values(x_test)

Generate Stories for both pre-trained random forest and SVM

In [7]:
story_generator = SHAPstory(model, explainer, llm, features_df, task_description, input_description, class_descriptions)
shap_df, predictions_df = story_generator.gen_variables(x_test)
prompt = story_generator.generate_prompt(x_test, predictions_df, shap_df, 0)
print(prompt)


An AI model was used to predict whether a football team will have the "Man of the Match" winner in a FIFA Worldcup match, based on the team's statistics . 
The input features to the model include data about the match. 
The target variable represents one of the following classes:
- class label 0 represents the class for the team that will not have a player who wins the 'Man of the Match'
- class label 1 represents the class for the team that will have a player who wins the 'Man of the Match'

The AI model predicted a certain instance of the dataset to belong to the class with label 1 
(i.e. class for the team that will have a player who wins the 'Man of the Match') with probability 87.30%. 

The provided SHAP table was generated to explain this
outcome. It includes every feature along with its value for that instance, and the
SHAP value assigned to it. 

The goal of SHAP is to explain the prediction of an instance by 
computing the contribution of each feature to the prediction. The
SH

In [None]:
stories = story_generator.generate_stories(x_test.iloc[:2])
print(stories[0])

In [None]:
#print the model coefficients together with the feature names
print("Model Coefficients:")
for feature, coeff in zip(x.columns, model.coef_[0]):
    print(f"{feature}: {coeff}")

Model Coefficients:
Goal Scored: 1.8420094657828152
Ball Possession %: -0.04892884433415059
Attempts: -0.15978373044009475
On-Target: 0.04620357842843351
Off-Target: 0.2188836875312209
Blocked: -0.07176236929831954
Corners: 0.49654726326880705
Offsides: 0.6422003412718447
Free Kicks: 0.046069875458706784
Saves: -0.0006398866946944256
Pass Accuracy %: 0.06108223641166268
Passes: 0.0012225035667891282
Distance Covered (Kms): -0.022155142664064934
Fouls Committed: -0.032966165362065196
Yellow Card: 0.10116191283059259
Yellow & Red: -0.0011237542167242686
Red: -0.22035687006377555
Goals in PSO: 0.4310101744827301
Goal against: -2.21482805781323


In [None]:
-0.15978373044009475 * (x_test.iloc[0]["Attempts"] - x_test["Attempts"].mean())

np.float64(0.0983284495015967)