# Tell Me a Story! - SHAPStories Example

This notebook shows an example of how to generate SHAPstories using Ollama models.

In [None]:
import shap
import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from stories import SHAPstory
from stories.llm_wrappers import OllamaWrapper

  from .autonotebook import tqdm as notebook_tqdm


## FIFA Example

### Load data

In [2]:


# Load Data and Split
data = pd.read_csv("../data/FIFA_2018_Statistics.csv")

feature_names = [i for i in data.columns if data[i].dtype in [np.int64, np.int64]]
x = data[feature_names]

y = (data["Man of the Match"] == "Yes")

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [3]:
x.columns

Index(['Goal Scored', 'Ball Possession %', 'Attempts', 'On-Target',
       'Off-Target', 'Blocked', 'Corners', 'Offsides', 'Free Kicks', 'Saves',
       'Pass Accuracy %', 'Passes', 'Distance Covered (Kms)',
       'Fouls Committed', 'Yellow Card', 'Yellow & Red', 'Red',
       'Goals in PSO'],
      dtype='object')

### Train and Compare Accuracy of Various Models

In [4]:
model = LogisticRegression(max_iter=5000)
model.fit(x_train, y_train)
predictions = model.predict(x_test)
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

Accuracy: 0.6923076923076923


Manually Created Descriptions

In [None]:
task_description = """predict whether a football team will have the "Man of the Match" winner in a FIFA Worldcup match, based on the team's statistics """

input_description = "the match"

class_descriptions = {
    0: "class for the team that will not have a player who wins the 'Man of the Match'",
    1: "class for the team that will have a player who wins the 'Man of the Match'"
}

feature_desc = [
    'Number of goals scored by the team during the match.',
    'Percentage of ball possession by the team during the match.',
    'Number of attempts or shots taken by the team.',
    'Number of shots that were on target.',
    'Number of shots that went off target.',
    'Number of shots that were blocked by the opponent.',
    'Number of corner kicks taken by the team.',
    'Number of times the team was caught offside.',
    'Number of free kicks taken by the team.',
    "Number of saves made by the team's goalkeeper.",
    'Percentage of passes that successfully reached a teammate.',
    'Total number of passes made by the team.',
    "Total distance covered by the team's players during the match, in kilometers.",
    'Number of fouls committed by the team.',
    'Number of yellow cards received by the team.',
    'Number of yellow-red cards received by the team.',
    'Number of red cards received by the team.',
    'Number of goals scored by the team during the penalty shootout.'
]

features_df = pd.DataFrame({
    "Feature name": list(x.columns),
    "Feature description": feature_desc
})

In [7]:
llm = OllamaWrapper("gemma3:4b")

In [8]:
masker = shap.maskers.Independent(x_test)
explainer = shap.LinearExplainer(model, masker=masker)
shap_values = explainer.shap_values(x_test)

Generate Stories for both pre-trained random forest and SVM

In [11]:
story_generator = SHAPstory(llm, explainer, features_df, task_description, input_description, class_descriptions)
shap_df, predictions_df = story_generator.gen_variables(model, x_test)
prompt = story_generator.generate_prompt(x_test, predictions_df, shap_df, 0)
print(prompt)


An AI model was used to predict whether a football team will have the "Man of the Match" winner in a FIFA Worldcup match, based on the team's statistics . 
The input features to the model include data about the match. 
The target variable represents one of the following classes:
- class label 0 represents the class for the team that will not have a player who wins the 'Man of the Match'
- class label 1 represents the class for the team that will have a player who wins the 'Man of the Match'

The AI model predicted a certain instance of the dataset to belong to the class with label 1 
(i.e. class for the team that will have a player who wins the 'Man of the Match') with probability 92.58%. 

The provided SHAP table was generated to explain this
outcome. It includes every feature along with its value for that instance, and the
SHAP value assigned to it. 

The goal of SHAP is to explain the prediction of an instance by 
computing the contribution of each feature to the prediction. The
SH

In [12]:
stories = story_generator.generate_stories(model, x_test.iloc[:2])
print(stories[0])

Based on the SHAP values, the model predicted a "Man of the Match" winner for this team due to a potent combination of attacking prowess and a dominant midfield performance. The highest positive SHAP value, associated with ‘Goal Scored’ (2), suggests the team’s offensive output was a key driver of the prediction, indicating they were likely to have a player recognized for scoring goals.  Furthermore, a high SHAP value for ‘Ball Possession %’ (59%) demonstrates a sustained control of the game, underlining the team's ability to dictate play. The team also had a strong midfield performance, indicated by ‘Pass Accuracy %’ (89%), showing precise passing that created scoring chances. Importantly, the high ‘Attempts’ (13) suggests the team aggressively created chances. However, the negative SHAP value associated with ‘Passes’ (485) may imply that while the team completed many passes, they weren’t particularly effective at moving the ball into dangerous areas, a potentially contributing factor