# Lookback Lens 🔎 🦙

Demo code for the paper __"Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"__


> TL;DR: We designed a method only using attention maps information (lookback ratio) to detect contextual hallucinations. The detector can be used to reduce hallucination during decoding.

Paper: https://arxiv.org/abs/2407.07071  
Code: https://github.com/voidism/Lookback-Lens  
Twitter discussion: https://x.com/YungSungChuang/status/1811047314397724882  

### Installation

In [2]:
!git clone https://github.com/voidism/Lookback-Lens.git

Cloning into 'Lookback-Lens'...
remote: Enumerating objects: 3809, done.[K
remote: Counting objects: 100% (3809/3809), done.[K
remote: Compressing objects: 100% (2766/2766), done.[K
remote: Total 3809 (delta 998), reused 3808 (delta 997), pack-reused 0[K
Receiving objects: 100% (3809/3809), 22.87 MiB | 16.46 MiB/s, done.
Resolving deltas: 100% (998/998), done.


In [4]:
!cd Lookback-Lens && pip install -r requirements.txt
!cd Lookback-Lens/transformers-4.32.0 && pip install -e .

Collecting torch==2.0.0 (from -r requirements.txt (line 1))
  Downloading torch-2.0.0-cp310-cp310-manylinux1_x86_64.whl.metadata (24 kB)
Collecting numpy==1.26.2 (from -r requirements.txt (line 2))
  Downloading numpy-1.26.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.2/61.2 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken (from -r requirements.txt (line 4))
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting openai (from -r requirements.txt (line 5))
  Downloading openai-1.37.0-py3-none-any.whl.metadata (22 kB)
Collecting nvidia-cuda-nvrtc-cu11==11.7.99 (from torch==2.0.0->-r requirements.txt (line 1))
  Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu11==11.7.99 (from torch==2.0.0->-r requirements.txt (line 1))
  Downloading nvi

In [5]:
!cd Lookback-Lens && gzip -d data/nq-open-10_total_documents_gold_at_4.jsonl.gz

## Skip Step 01 & 02 by downloading the precomputed lookback ratios & annotations.

In [7]:
!cd Lookback-Lens && wget "https://www.dropbox.com/scl/fi/a87iv6xw9xma6ppc5pw2h/step1and2.tar.bz?rlkey=j382rsrwu2wnfwj7sn14ai3qw&dl=0" -O step1and2.tar.bz
!cd Lookback-Lens && tar -xvf step1and2.tar.bz
!cd Lookback-Lens && mv dump/* . && rm -rf dump

--2024-07-25 05:10:46--  https://www.dropbox.com/scl/fi/a87iv6xw9xma6ppc5pw2h/step1and2.tar.bz?rlkey=j382rsrwu2wnfwj7sn14ai3qw&dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.2.18, 2620:100:6017:18::a27d:212
Connecting to www.dropbox.com (www.dropbox.com)|162.125.2.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uc529bea9f92dd78483d24ccf483.dl.dropboxusercontent.com/cd/0/inline/CXVHNAdY4q2riyVmNovhSFkuH8fwaZcOvCDsd-G-Zykazi7ruDOBsVQhXL2lBzyEw1ubi859ZestX43eCc5FuvGSimZh5KBmxJ1N9zdfmdg6TT7u3jKpVnRgNlorVZkUZNHXIeNv1fVBInz6Ng-zmT62/file# [following]
--2024-07-25 05:10:46--  https://uc529bea9f92dd78483d24ccf483.dl.dropboxusercontent.com/cd/0/inline/CXVHNAdY4q2riyVmNovhSFkuH8fwaZcOvCDsd-G-Zykazi7ruDOBsVQhXL2lBzyEw1ubi859ZestX43eCc5FuvGSimZh5KBmxJ1N9zdfmdg6TT7u3jKpVnRgNlorVZkUZNHXIeNv1fVBInz6Ng-zmT62/file
Resolving uc529bea9f92dd78483d24ccf483.dl.dropboxusercontent.com (uc529bea9f92dd78483d24ccf483.dl.dropboxusercontent.com)... 162.125.65.

## Step 03: Fitting Lookback Lens Classifiers (NQ and CNN/DM)


> **Please replace `<hf_auth_token>` with your huggingface auth token for LLaMA2 access.**



### Predefined Span

In [12]:
!cd Lookback-Lens && python step03_lookback_lens.py --auth_token <hf_auth_token> --anno_1 anno-nq-7b.jsonl --anno_2 anno-cnndm-7b.jsonl --lookback_ratio_1 lookback-ratio-nq-7b.pt --lookback_ratio_2 lookback-ratio-cnndm-7b.pt

tokenizer_config.json: 100% 1.62k/1.62k [00:00<00:00, 6.45MB/s]
tokenizer.model: 100% 500k/500k [00:00<00:00, 15.0MB/s]
tokenizer.json: 100% 1.84M/1.84M [00:00<00:00, 13.5MB/s]
special_tokens_map.json: 100% 414/414 [00:00<00:00, 2.61MB/s]
2024-07-25 05:22:33.816180: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-25 05:22:33.816260: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-25 05:22:33.992664: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
100% 3850/3850 [00:00<00:00, 4629.25it/s]
A trivial baseline: if higher lookback ratio means less hallucination.
Baseline AURO

### Sliding Window (=8)

In [13]:
!cd Lookback-Lens && python step03_lookback_lens.py --auth_token <hf_auth_token> --anno_1 anno-nq-7b.jsonl --anno_2 anno-cnndm-7b.jsonl --lookback_ratio_1 lookback-ratio-nq-7b.pt --lookback_ratio_2 lookback-ratio-cnndm-7b.pt --sliding_window 8

2024-07-25 05:32:34.530184: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-25 05:32:34.530253: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-25 05:32:34.539030: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
100% 74256/74256 [00:09<00:00, 7759.19it/s]
A trivial baseline: if higher lookback ratio means less hallucination.
Baseline AUROC: 0.477733716
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentatio

## Step 04: Run Greedy vs Classifier Guided Decoding (NQ and XSum)

> **Please replace `<hf_auth_token>` with your huggingface auth token for LLaMA2 access.**

Warning: Colab Pro is required to run decoding, as inference with LLaMA has high-RAM demand. Choose V100 GPU and turn on the High-RAM Shape option before running the code!

Warning: Running the code without High-RAM Shape option, the program will fail during loading the LLaMA checkpoints!



### Greedy (NQ)

In [None]:
! cd Lookback-Lens && python step04_run_decoding.py --auth_token <hf_auth_token> --model-name meta-llama/Llama-2-7b-chat-hf --data-path data/nq-open-10_total_documents_gold_at_4.jsonl --output-path output-nq-open-greedy-decoding.jsonl --num-gpus 1

### Lookback Lens Guided Decoding (NQ)

In [None]:
! cd Lookback-Lens && python step04_run_decoding.py --auth_token <hf_auth_token> --model_name meta-llama/Llama-2-7b-chat-hf --data_path data/nq-open-10_total_documents_gold_at_4.jsonl --output_path output-nq-open-lookback-decoding.jsonl --num_gpus 1 --do_sample --guiding_classifier classifiers/classifier_anno-cnndm-7b_sliding_window_8.pkl --chunk_size 8 --num_candidates 8

### Greedy (XSum)

In [None]:
! cd Lookback-Lens && python step04_run_decoding.py --auth_token <hf_auth_token> --model_name meta-llama/Llama-2-7b-chat-hf --data_path data/xsum-1000.jsonl --output_path output-xsum-greedy-decoding.jsonl --num_gpus 1

### Lookback Lens Guided Decoding (XSum)

In [None]:
! cd Lookback-Lens && python step04_run_decoding.py --auth_token <hf_auth_token> --model_name meta-llama/Llama-2-7b-chat-hf --data_path data/xsum-1000.jsonl --output_path output-xsum-lookback-decoding.jsonl --num_gpus 1 --do_sample --guiding_classifier classifiers/classifier_anno-cnndm-7b_sliding_window_8.pkl --chunk_size 8 --num_candidates 8

## Step 05: Run Evaluation

### Run Exact Match Evaluation (NQ)

In [None]:
python eval_exact_match.py --hyp output-nq-open-greedy-decoding.jsonl --ref data/nq-open-10_total_documents_gold_at_4.jsonl
python eval_exact_match.py --hyp output-nq-open-lookback-decoding.jsonl --ref data/nq-open-10_total_documents_gold_at_4.jsonl

### Run GPT-4o Evaluation (XSum)

> Please set OpenAI API Key by `OPENAI_API_KEY={your_key}`.

In [None]:
OPENAI_API_KEY={your_key} python step02_eval_gpt4o.py --hyp output-xsum-greedy-decoding.jsonl --ref data/xsum-1000.jsonl --out record-gpt4o-eval-xsum-greedy-decoding.jsonl
OPENAI_API_KEY={your_key} python step02_eval_gpt4o.py --hyp output-xsum-lookback-decoding.jsonl --ref data/xsum-1000.jsonl --out record-gpt4o-eval-xsum-lookback-decoding.jsonl