# Captum: Model Interpretability for PyTorch

Captum is a model interpretability library for PyTorch that provides insights into how models make predictions. It offers:

- Attribution techniques to identify input feature importance
- Tools for understanding model behavior and decision-making
- Methods to analyze neural network internals
- Visualization capabilities for model explanations
- Support for both vision and text models

In [None]:
import torch
import numpy as np
from captum.attr import ShapleyValueSampling, LLMAttribution, TextTemplateInput, ProductBaselines
from captum.attr import IntegratedGradients
from transformers import AutoModelForCausalLM, AutoTokenizer


# Model and Tokenizer
We selected the DistilGPT model and tokenizer from Huggingface, since it is
- reasonably small (parameters are ca 350MB)
- runs relatively fast, even on CPU
- funfact: biases (e.g. gender) are more pronounced than in ChatGPT2

In [None]:
model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")

Wrap model in various attribution techniques

In [None]:
svs = ShapleyValueSampling(model)
ingrad = IntegratedGradients(model)