# Multi-document Context Attribution

In this notebook, we'll walk through applying AT2 to a multi-document context attribution setting.
This involves extending the `ContextAttributionTask` class and providing the relevant document structure for attribution.

In [1]:
import torch as ch
from datasets import load_dataset

In [2]:
from at2.tasks import ContextAttributionTask
from at2.utils import get_model_and_tokenizer
from at2 import AT2Attributor, AT2ScoreEstimator

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /mnt/xfs/home/bencw/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


We'll start by loading a model, its tokenizer, and an existing AT2 score estimator trained for it.

In [3]:
model_name = "microsoft/Phi-4-mini-instruct"
dtype = ch.bfloat16
model, tokenizer = get_model_and_tokenizer(model_name, dtype=dtype)
score_estimator = AT2ScoreEstimator.from_hub("madrylab/at2-phi-4-mini-instruct")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Next, we'll load the dataset.
We'll be working with [Hotpot QA](https://arxiv.org/abs/1809.09600), a multi-hop question answering dataset where answering requires combining information from multiple documents.

In [4]:
dataset = load_dataset("hotpot_qa", "distractor", split="validation", trust_remote_code=True)

We now extend the `ContextAttributionTask` to support this task.
Specifically, we define a `get_prompt_and_document_ranges` function to format the documents into a prompt and provide character ranges for these documents.
These documents can be directly treated as sources, or will be split further into sources automatically (if we set the `source_type` parameter to something besides `"document"`).

In [5]:
class HotpotQAAttributionTask(ContextAttributionTask):
    def __init__(self, example, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.example = example
        self._prompt, self._document_ranges = self.get_prompt_and_document_ranges()

    @property
    def query(self):
        query = self.example["question"]
        return query

    def get_prompt_and_document_ranges(self):
        individual_sentences = self.example["context"]["sentences"]
        prompt = "Passages:\n\n"
        document_ranges = []
        for sentences in individual_sentences:
            passage = " ".join(sentences)
            document_ranges.append((len(prompt), len(prompt) + len(passage)))
            prompt += passage + "\n\n"
        prompt = f"{prompt}Query: {self.query}"
        return prompt, document_ranges

    @property
    def prompt(self):
        return self._prompt

    def _get_document_ranges(self):
        return self._document_ranges

We're now ready to create a task and perform attribution!

In [6]:
task = HotpotQAAttributionTask(dataset[42], model, tokenizer, source_type="sentence")

In [7]:
print(task.input_text)

<|user|>Passages:

Manga Bible (新約聖書 , Shinyaku Seisho ) is a five-volume manga series based on the Christian Bible created under the direction of the non-profit organization Next, a group formed by people from the manga industry.  Though first published in English, the books are originally written in Japanese and each volume is illustrated by a Japanese manga artist.  Each book is adapted from the Bible by Hidenori Kumai.  The first two books were illustrated by manga artist Kozumi Shinozawa, while the remaining three will be illustrated by a different artist.  The first book in the series, "Manga Messiah" was published in 2006 and covered the four gospels of the Bible: Matthew, Mark, Luke, and John.  "Manga Metamorphosis" (2008) covers the events in Acts and several of Paul's letters.  "Manga Mutiny" (2008, 2009) begins in Genesis and ends in Exodus.  "Manga Melech" (2010) picks up where "Manga Mutiny" left off and continues into the reign of David.  The fifth, and currently final bo

In [8]:
print(task.generation)

The Japanese manga series "I"s (アイズ, Aizu) is written and illustrated by Masakazu Katsura, who was born in 1962.


In [9]:
attributor = AT2Attributor(task, score_estimator)
attributor.show_attribution(verbose=True)

Unnamed: 0,Score,Source
0,0.007,"I""s (アイズ , Aizu ) is a Japanese manga series written and illustrated by Masakazu Katsura."
1,0.003,"Masakazu Katsura (桂 正和 , Katsura Masakazu , born December 10, 1962) is a Japanese manga artist, known for several works of manga, including ""Wing-man"", ""Shadow Lady"", ""DNA²"", ""Video Girl Ai"", ""I""s"", and ""Zetman""."
2,0.0,"Manga Bible (新約聖書 , Shinyaku Seisho ) is a five-volume manga series based on the Christian Bible created under the direction of the non-profit organization Next, a group formed by people from the manga industry."
3,0.0,"Silver Spoon (Japanese: 銀の匙 , Hepburn: Gin no Saji ) is a Japanese manga series written and illustrated by Hiromu Arakawa, set in the fictional Ooezo Agricultural High School in Hokkaido."
4,0.0,"Things become even more complicated when Itsuki Akiba returns to Japan; she is a girl Ichitaka was friends with in their childhood before she moved to the United States, and who had a huge crush on him."
5,0.0,"The story's main character is 16-year-old high school student Ichitaka Seto who is in love with his classmate Iori Yoshizuki, but too shy to tell her."
6,0.0,"The first two books were illustrated by manga artist Kozumi Shinozawa, while the remaining three will be illustrated by a different artist."
7,0.0,"Though first published in English, the books are originally written in Japanese and each volume is illustrated by a Japanese manga artist."
