# Tell Me a Story! - SHAPStories Example

This notebook shows an example of how to generate SHAPstories using Ollama models.

In [1]:
import shap
import pandas as pd
import numpy as np

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from stories import SHAPstory
from stories.stories import SHAP_TABLE_COLUMNS
from stories.llm_wrappers import OllamaWrapper
from stories.prompts import make_shap_narrative_prompt, make_rank_evaluation_prompt
from stories.explainers import make_linear_interventional_shap_explainer

  from .autonotebook import tqdm as notebook_tqdm


## FIFA Example

### Load data

In [2]:
# Load Data and Split
data = pd.read_csv("../data/FIFA_2018_Statistics.csv")
data = data.merge(
    data[["Date", "Team", "Goal Scored"]],
    left_on=["Date", "Opponent"],
    right_on=["Date", "Team"],
    suffixes=["", "_y"]).drop(columns=["Team_y"]).rename(columns={"Goal Scored_y": "Goal against"})

feature_names = [i for i in data.columns if data[i].dtype in [np.int64, np.int64]]
x = data[feature_names]

y = (data["Man of the Match"] == "Yes")

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

### Train and Compare Accuracy of Various Models

In [3]:
logreg_pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("classifier", LogisticRegression(max_iter=5000, solver="lbfgs"))
])

param_grid = {
    "classifier__C": np.logspace(-3, 3, 13)
}

cv = KFold(n_splits=5, shuffle=True, random_state=42)

logreg_search = GridSearchCV(
    estimator=logreg_pipe,
    param_grid=param_grid,
    cv=cv,
    scoring="roc_auc",   # swap for the metric you care about
    n_jobs=-1,
    refit=True
)

logreg_search.fit(x_train, y_train)
best_logreg = logreg_search.best_estimator_

test_predictions = logreg_search.predict(x_test)
test_probabilities = logreg_search.predict_proba(x_test)
accuracy = accuracy_score(y_test, test_predictions)
print("Accuracy:", accuracy)

Accuracy: 0.7307692307692307


Manually Created Descriptions

In [4]:
task_description = """predict whether a football team will have the "Man of the Match" winner in a FIFA Worldcup match, based on the team's statistics """

input_description = "the match"

class_descriptions = {
    0: "class for the team that will not have a player who wins the 'Man of the Match'",
    1: "class for the team that will have a player who wins the 'Man of the Match'"
}

feature_desc = [
    'Number of goals scored by the team during the match',
    'Percentage of ball possession by the team during the match',
    'Number of attempts or shots taken by the team',
    'Number of shots that were on target',
    'Number of shots that went off target',
    'Number of shots that were blocked by the opponent',
    'Number of corner kicks taken by the team',
    'Number of times the team was caught offside',
    'Number of free kicks taken by the team',
    "Number of saves made by the team's goalkeeper",
    'Percentage of passes that successfully reached a teammate',
    'Total number of passes made by the team',
    "Total distance covered by the team's players during the match, in kilometers",
    'Number of fouls committed by the team',
    'Number of yellow cards received by the team',
    'Number of yellow-red cards received by the team',
    'Number of red cards received by the team',
    'Number of goals scored by the team during the penalty shootout',
    "Number of goals that were conceded by the team's goalkeeper",
]

features_df = pd.DataFrame({
    "Feature name": list(x.columns),
    "Feature description": feature_desc
})

prompt_template = make_shap_narrative_prompt(
    task_description,
    input_description,
    class_descriptions,
)

In [5]:
#llm = OllamaWrapper("gemma3:4b")
llm = OllamaWrapper("llama3.1-greedy:latest", temperature=0.0)

In [18]:
explainer = make_linear_interventional_shap_explainer(logreg_search, x_test, max_story_features=8, only_positive_shaps=True)

Generate Stories for both pre-trained random forest and SVM

In [19]:
story_generator = SHAPstory(
    explainer, 
    llm, 
    prompt_template,
    features_df
)

sample_index = 0

x_test_0 = x_test.iloc[sample_index]
test_prediction_0 = int(test_predictions[sample_index])
test_probabilities_0 = test_probabilities[sample_index][test_prediction_0]

explanation_df = explainer.make_explanation_table(x_test_0)
explanation_df = explanation_df.merge(features_df, on="Feature name")

prompt = story_generator.generate_prompt(test_prediction_0, test_probabilities_0, explanation_df[SHAP_TABLE_COLUMNS])

print(prompt)

An AI model was used to predict whether a football team will have the "Man of the Match" winner in a FIFA Worldcup match, based on the team's statistics . The input features to the model include data about the match. The target variable represents one of the following classes:
- class label 0 represents the class for the team that will not have a player who wins the 'Man of the Match'
- class label 1 represents the class for the team that will have a player who wins the 'Man of the Match'

The AI model predicted a certain instance of the dataset to belong to the class with label 1 with probability 0.86.

The post-hoc feature attribution method used to explain the instance's predicted class is SHAP. The goal of SHAP is to explain the prediction of an instance by computing the contribution of each feature to the prediction. Each individual SHAP value is a measure of how much additional probability this feature adds or subtracts  in the predicted probability relative to the base level pro

In [25]:
stories = story_generator.generate_stories(x_test.iloc[0:1])

In [26]:
explanation_tables = []

for i, story in enumerate(stories):
    explanation_table = story["explanation_table"].copy()
    explanation_table["story_id"] = i
    descriptions = explanation_table[["Feature name", "Feature description"]].to_string(index=False)
    story["rank_evaluation_prompt"] = make_rank_evaluation_prompt(
        task_description,
        input_description,
        class_descriptions, 
        story["story"],
        feature_desc=descriptions
    )

    with open(f"evaluation_prompt_{i}.txt", "w") as fout:
        fout.write(story["rank_evaluation_prompt"])

    explanation_tables.append(explanation_table)

In [27]:
print(stories[0]["story"])

The model predicted a team to win the "Man of the Match" award with a high probability, and SHAP analysis revealed that the team's impressive goal-scoring performance was a major contributor to this prediction. With 2 goals scored, significantly above the average of 1.38, the team demonstrated exceptional attacking prowess, which likely caught the model's attention. The team's ability to successfully execute passes at an 89% rate also played a significant role in the prediction, as it indicated a high level of cohesion and teamwork. However, the model also took into account the team's relatively low distance covered during the match, which might have been seen as a potential weakness. On the other hand, the team's ability to take advantage of free kicks, with 24 attempts, was another key factor in the prediction. The model may have viewed this as an indication of the team's willingness to take risks and try new things, which could be beneficial in high-pressure situations like the Worl

In [28]:
print(stories[0]["generation_prompt"])

An AI model was used to predict whether a football team will have the "Man of the Match" winner in a FIFA Worldcup match, based on the team's statistics . The input features to the model include data about the match. The target variable represents one of the following classes:
- class label 0 represents the class for the team that will not have a player who wins the 'Man of the Match'
- class label 1 represents the class for the team that will have a player who wins the 'Man of the Match'

The AI model predicted a certain instance of the dataset to belong to the class with label 1 with probability 0.86.

The post-hoc feature attribution method used to explain the instance's predicted class is SHAP. The goal of SHAP is to explain the prediction of an instance by computing the contribution of each feature to the prediction. Each individual SHAP value is a measure of how much additional probability this feature adds or subtracts  in the predicted probability relative to the base level pro

In [24]:
pd.concat(explanation_tables).to_csv("explanations.csv", header=True, index=False)