In [9]:

!pip install -q faiss-cpu sentence-transformers transformers pypdf pdfplumber


In [10]:

import os
import faiss
import pdfplumber
import numpy as np
from sentence_transformers import SentenceTransformer
from transformers import pipeline, AutoTokenizer
from google.colab import files


In [11]:

uploaded = files.upload()
file_path = list(uploaded.keys())[0]


Saving Pay_Attention_to_What_Matters.pdf to Pay_Attention_to_What_Matters.pdf


In [12]:

def extract_text_from_pdf(file_path):
    with pdfplumber.open(file_path) as pdf:
        return " ".join([page.extract_text() for page in pdf.pages if page.extract_text()])

document_text = extract_text_from_pdf(file_path)
print(document_text[:1000])  # Preview text




Pay Attention to What Matters
PedroLuizSilva1,2 AntoniodeDomenico1 AliMaatouk3 FadhelAyed1
1HuaweiTechnologiesParis,France
2ÉcolePolytechnique,Palaiseau,France
3YaleUniversity,NewHaven,CT,USA
Abstract
Despite the remarkable success of Large Language Models (LLMs), they still
exhibitalimitedcapabilitytoaligntheiroutputstotheuserinstructions. Inthis
work,weintroduceasimpleandeffectivemethod,whichwenameGUIDE,that
mechanisticallyincreasesattentionscoresininstructiontokens. Tosupportthis
operation, we present Influence, a novel metric that highlights how the user’s
instructionspropagatethroughthetransformerlayersandimpacttheLLMoutput.
Our results show that GUIDE improves the accuracy of following instructions
29.4% to 60.4%, outperforming natural prompting alternatives and Supervised
Fine-Tuningupto1Mtokens.1
1 Introduction
LargeLanguageModels(LLMs)arecurrentlythestate-of-the-artofmostNLPtasks. Despitethis
success,pretrainedLLMssometimesstruggletoaccuratelyinterpretdiverseusers’instructions

In [13]:

def chunk_text(text, chunk_size=500, overlap=50):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunks.append(" ".join(words[i:i + chunk_size]))
    return chunks

chunks = chunk_text(document_text)
print(f"Total Chunks: {len(chunks)}")


Total Chunks: 7


In [14]:

model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(chunks, convert_to_numpy=True)

dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)


In [15]:

query = "Summarize this document"
query_embedding = model.encode([query])
top_k = 5
D, I = index.search(np.array(query_embedding), top_k)
retrieved_chunks = [chunks[i] for i in I[0]]

for i, chunk in enumerate(retrieved_chunks):
    print(f"--- Retrieved Chunk {i+1} ---\n{chunk}\n")


--- Retrieved Chunk 1 ---
‘Important’ + uppercase ∆=0.5 0.8 ∆=1.0 ∆=2.0 0.7 0.6 0.5 0.4 0.3 0.5 1.0 1.5 2.0 2.5 Number of Tokens ×106 (a)ProbabilityofoutputtingasummaryinFrench. )hcnerF ni tuptuo( Finetuning ∆=1 ∆=2 (b) Performance of SFT over number of tokens (in millions) usedduringtraining. Figure4: Summarizationresults: (a)GUIDEoutperformspromptengineeringtechniqueslikeusing uppercase text, and (b) GUIDE demonstrates greater accuracy than SFT up to 1 million training tokens. Needle in a haystack Figure 5 shows the probability of outputting the correct phrase over the contextlengthandthepositionoftheneedle,respectively. TheMistralmodeldemonstratesstable performanceacrossvaryingcontextlengthsandneedlepositionswithinthiswindow. Asexpected,theadditionof∆totheneedletokensconsistentlyenhancesperformancefrom87.0% to92.1%,withoptimalvaluesof∆around1. Wecanalsonotethat,onaverage,theLLMismore effectiveatretrievinginformationwhenitislocatedatthebeginningortheendofthetext. Thisisin accordancew

In [17]:

# Load summarizer and tokenizer
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

def summarize(text, max_input_tokens=1024, max_output_tokens=500, min_output_tokens=60):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=max_input_tokens)
    input_ids = inputs["input_ids"]

    if input_ids.shape[1] > max_input_tokens:
        input_ids = input_ids[:, :max_input_tokens]

    summary = summarizer(
        tokenizer.decode(input_ids[0], skip_special_tokens=True),
        max_length=max_output_tokens,
        min_length=min_output_tokens,
        do_sample=False
    )[0]['summary_text']

    return summary

joined_text = " ".join(retrieved_chunks)
summary = summarize(joined_text)
print("=== Summary ===\n", summary)


Device set to use cpu


=== Summary ===
  GUIDE demonstrates greater accuracy than SFT up to 1 million training tokens. TheMistralmodeldemonstratesstable performanceacrossvaryingcontextlengthsandneedlepositionswithin this window. Theaddition of∆totheneedletokensconsistentlyenhancesperformancefrom87.0% to92.1%.
