# Getting started

In this notebook, we'll walk through the basic functionality of AT2.
Applying AT2 requires learning to attribute a particular model's behavior using a dataset of training examples.
In this notebook, we'll use an existing instance of AT2 trained to attribute the generations of [`microsoft/Phi-4-mini-instruct`](https://huggingface.co/microsoft/Phi-4-mini-instruct), a small but performant model.

In [5]:
import torch as ch
from pathlib import Path
from transformers import AutoTokenizer, AutoModelForCausalLM
from datasets import load_dataset

In [1]:
from at2.tasks import SimpleContextAttributionTask
from at2.utils import get_model_and_tokenizer
from at2 import AT2Trainer, AT2Attributor, AT2FeatureExtractor, AT2ScoreEstimator

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /mnt/xfs/home/bencw/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


We'll start by loading the model and its tokenizer.

In [7]:
model_name = "microsoft/Phi-4-mini-instruct"
dtype = ch.bfloat16
attn_implementation = "flash_attention_2"
model, tokenizer = get_model_and_tokenizer(model_name, dtype=dtype, attn_implementation=attn_implementation)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Next, we'll create an "attribution task."
An attribution task consists of an input sequence, a generated sequence, a model/tokenizer and a set of sources to which we would like to attribute the generated sequence.
In this case, the input sequence is a news article from [CNN DailyMail](https://huggingface.co/datasets/abisee/cnn_dailymail) and a request to summarize it, the generated sequence is the model's response and the sources are sentences from the news article.
By pinpointing the sentences from the news article that the model uses to generate a particular statement, we obtain a "citation" for this statement.
We've defined a class, `SimpleContextAttributionTask` to be able to quickly create an attribution task from an example in the dataset.

In [12]:
dataset = load_dataset("cnn_dailymail", "3.0.0", split="validation")
example = dataset[0]
context = example["article"]
query = "Summarize the article in up to three sentences."

task = SimpleContextAttributionTask(
    context=context,
    query=query,
    model=model,
    tokenizer=tokenizer,
    source_type="sentence",
)

The first step is to generate a response for the context (a news article) and instruction (a request to summarize it).
The `AttributionTask` class that `task` is an instance of handles this for us.

In [13]:
print("### Context ###")
print(context[:500] + "..." if len(context) > 500 else context)
print()
print("### Instruction ###")
print(query)
print()
# Generates a response and caches relevant information for attribution
print("### Generated response ###")
print(task.generation)

### Context ###
(CNN)Share, and your gift will be multiplied. That may sound like an esoteric adage, but when Zully Broussard selflessly decided to give one of her kidneys to a stranger, her generosity paired up with big data. It resulted in six patients receiving transplants. That surprised and wowed her. "I thought I was going to help this one person who I don't know, but the fact that so many people can have a life extension, that's pretty big," Broussard told CNN affiliate KGO. She may feel guided in her ge...

### Instruction ###
Summarize the article in up to three sentences.

### Generated response ###
Zully Broussard's selfless act of donating a kidney to a stranger led to a chain reaction, resulting in six patients receiving transplants. The process, which involved a series of paired donations, was made possible by a computer program called MatchGrid, created by David Jacobs, which quickly matches up donor pairs or chains. The chain of surgeries, which involved five surgeons, 

Next, we'll create an `AT2Attributor` (which uses attention weights to assign an attribution score to each source) that was trained to attribute the responses of `Phi-4-min-instruct`.

In [15]:
attributor = AT2Attributor.from_hub(task, "madrylab/at2-phi-4-mini-instruct")

We'll be interested in attributing a particular sentence from the response.
The `task` we have has a nice utility to help us out with this.

In [16]:
task.show_target_with_indices()

[36m[(0, 139)][0mZully Broussard's selfless act of donating a kidney to a stranger led to a chain reaction, resulting in six patients receiving transplants. [36m[(140, 328)][0mThe process, which involved a series of paired donations, was made possible by a computer program called MatchGrid, created by David Jacobs, which quickly matches up donor pairs or chains. [36m[(329, 507)][0mThe chain of surgeries, which involved five surgeons, a team of physician assistants, nurses, anesthesiologists, and more than 40 support staff, is set to be completed by Friday.


Let's attribute the second sentence!

In [17]:
start, end = (140, 328)
attributor.show_attribution(start=start, end=end, verbose=True)

Computing attribution scores for:
 The process, which involved a series of paired donations, was made possible by a computer program called MatchGrid, created by David Jacobs, which quickly matches up donor pairs or chains.


Unnamed: 0,Score,Source
0,0.006,"Jacobs paid it forward with his programming skills, creating MatchGrid, a program that genetically matches up donor pairs or chains quickly."
1,0.003,That changed when a computer programmer named David Jacobs received a kidney transplant.
2,0.001,But the power that multiplied Broussard's gift was data processing of genetic profiles from donor-recipient pairs.
3,0.001,That's called a paired donation.
4,0.001,"It works on a simple swapping principle but takes it to a much higher level, according to California Pacific Medical Center in San Francisco."
5,0.001,"It's been done before, California Pacific Medical Center said in a statement, but matching up the people in the chain has been laborious and taken a long time."
6,0.0,"That may sound like an esoteric adage, but when Zully Broussard selflessly decided to give one of her kidneys to a stranger, her generosity paired up with big data."
7,0.0,"So high, that it is taking five surgeons, a covey of physician assistants, nurses and anesthesiologists, and more than 40 support staff to perform surgeries on 12 people."


We can also directly access the attribution scores as follows.

In [20]:
scores = attributor.get_attribution_scores(start=start, end=end)
scores

Computing attribution scores for:
 The process, which involved a series of paired donations, was made possible by a computer program called MatchGrid, created by David Jacobs, which quickly matches up donor pairs or chains.


array([ 1.72615051e-04,  4.25338745e-04,  1.55448914e-04,  9.44137573e-05,
        1.04904175e-04,  1.67846680e-04,  8.05854797e-05,  3.07559967e-05,
        2.38418579e-05,  1.07288361e-04,  1.25122070e-03,  6.75201416e-04,
        3.92913818e-04,  1.11579895e-04,  2.51770020e-04,  1.94549561e-04,
        1.36375427e-04,  3.28063965e-04,  1.09672546e-04,  1.91688538e-04,
        1.58309937e-04,  6.24656677e-05,  2.10762024e-04,  1.48773193e-04,
        1.35421753e-04,  9.38415527e-04,  2.80380249e-04,  3.66210938e-04,
        2.47955322e-04,  1.43051147e-04,  2.46047974e-04,  1.93595886e-04,
        5.14984131e-04,  2.70080566e-03,  1.12056732e-04,  5.58471680e-03,
        2.18391418e-04,  1.90734863e-04,  2.53677368e-04,  1.35421753e-04,
        4.17232513e-05, -6.29425049e-05, -6.86645508e-05], dtype=float32)