# Setup

Run the following cell to install the necessary dependencies.

In [None]:
!pip install shap

Run the following cell to import the necessary packages.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split

import numpy as np
import matplotlib.pyplot as plt
import shap
import xgboost

# Exploratory Data Analysis

Let's classify the sentiment (positive or negative) of movie reviews. For that, we will use the IMDB dataset, which contains 25,000 movie reviews from the Internet Movie Database (IMDB) and their sentiment labels. The dataset comes with the SHAP library.

In [None]:
X, y = shap.datasets.imdb()

Take a look at the dataset (X and y) to get a feeling for the data you are working with.

In [None]:
# TODO Conduct a simple exploratory data analysis (EDA), e.g.,
# 1. Print the size of the dataset (i.e., number of movie reviews)
# 2. Print one movie review
# 3. Print the labels
# 4. Plot a histogram of the review lengths
# 5. ...

# Prepare Dataset

Split the data into a training and a test set. Use 20% of the examples for the test set.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

Use TF-IDF representations to convert the text examples into a numeric word-count matrix.

In [None]:
vectorizer = TfidfVectorizer(stop_words='english', min_df=10, max_features=1000)
X_train_vec = vectorizer.fit_transform(X_train).toarray() # .toarray() transforms the sparse matrix into a dense matrix to avoid issues with shap later on
X_test_vec = vectorizer.transform(X_test).toarray()

# Print the dimensions of the resulting TF-IDF matrices
print("Training Set:", X_train_vec.shape)
print("Test Set:", X_test_vec.shape)

# SHAP for an XGBoost Model

Instantiate and fit an XGBClassifier model, i.e., a classifier based on decision trees.

In [None]:
model = xgboost.XGBClassifier()
model.fit(X_train_vec, y_train)

Calculate SHAP values using a shap.Explainer object. Then plot the SHAP values in a waterfall chart for a single movie review.

Starting point: https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/waterfall.html

In [None]:
# TODO Calculate SHAP values using a shap.Explainer object

Look at a particular movie review. Also print its ground truth label and predicted label.

In [None]:
index = 2

print("Review Text:\n", X_test[index])
print(f"Ground Truth: {'POSITIVE' if y_test[index] else 'NEGATIVE'}")
print(f"Model Prediction: {'POSITIVE' if model.predict([X_test_vec[index]])[0] == 1 else 'NEGATIVE'}")

In [None]:
# TODO Plot a waterfall chart for that movie review using shap.plots.waterfall(...) and the calculated SHAP values

# SHAP for a Transformer-based Model

Now, let's generate SHAP values for a Transformer-based sentiment classifier on the same dataset.

Starting point: https://shap.readthedocs.io/en/latest/example_notebooks/text_examples/sentiment_analysis/Positive%20vs.%20Negative%20Sentiment%20Classification.html

In [None]:
import transformers

In [None]:
# TODO Run a sentiment-analysis transformers pipeline

In [None]:
# TODO Calculate SHAP values using a shap.Explainer object

In [None]:
# TODO Plot the SHAP values for a single moview review, e.g., using shap.plots.text(...) and the calculated SHAP values

# Practice Questions

1.   Why are LLMs sometimes bad at solving multi-step reasoning problems? In general, how can we improve the performance of LLM-based problem-solving approaches on multi-step reasoning problems?
2.   What are the advantages and disadvantages of using inherently transparent models? In which situation does it make sense, in which situations not?
3.   How can we make LLMs produce a chain of reasoning steps to solve a complex problem through prompting alone, i.e., without fine-tuning the model?
4.   How is Tree-of-Thoughts (ToT) different from Chain-of-Thought (CoT) and Input-Output/Standard Prompting (IO)? What kinds of problems can and cannot be solved with each technique?
5.   When facing a knowledge-related problem or question that is difficult to understand, which LLM prompting technique can help in such cases?
6.   When facing a clearly described problem or question where the correct answer is difficult to find, which LLM prompting technique can help in such cases?
7.  What is the probability-faithfulness dilemma in XAI and how can LLM prompting techniques help to avoid or reduce it?