hallucination-detection

This repository contains the code and data for the paper Unsupervised Token-level Hallucination Detection from Summary Generation By-products by Andreas Marfurt and James Henderson, presented at the GEM workshop at EMNLP 2022.

Short Description

Our method BART-GBP gives token-level hallucination probabilities for summaries generated by BART. We use the facebook/bart-large-cnn model made available by Hugging Face on their model hub. We first align the summary and source document with the help of BART's cross-attention, then classify aligned tokens for intrinsic hallucination and unaligned tokens for extrinsic hallucination. Our method was evaluated on CNN/DailyMail, but we expect it perform similarly on other equally extractive summarization datasets.

Data

We provide the following data:

data/frank_annotations.jsonl: Token-level hallucination annotations of 250 CNN/DM summaries with 15700 words, of which 57 (0.4%) are hallucinations (31 intrinsic, 26 extrinsic).
data/tlhd-cnndm_annotations.jsonl: Token-level hallucination annotations of 150 CNN/DM summaries, one selected sentence per summary, 2100 words with 299 (14.2%) hallucinations (51 intrinsic, 248 extrinsic).

Installation

First, install conda, e.g. from Miniconda. Then create and activate the environment:

conda env create -f environment.yml
conda activate hallucination-detection

Usage

To reproduce the results of BART-GBP on the FRANK dataset, run the following steps:

Get BART's outputs (attentions and decoding entropies): Get BART Outputs
Compute the scores (association strength, fraction unaligned, inverse decoding entropy): Compute BART-GBP Scores
Evaluate the scores by computing the points on the precision/recall curve: Evaluate Scores
Plot the results: Plot Results

For the TLHD-CNNDM dataset, please adjust the paths of outputs, scores, predictions and results.

Get BART Outputs

First we need to save BART's outputs from summary generation.

python save_bart_outputs_for_alignment.py --cross_attention_layers 9 10 --encoder_layers 9 10

Since we use beam search decoding, we have to select the cross-attentions, encoder self-attentions and decoding probabilities of the eventually selected beam. This is taken care of in the BartSummarizer model by the GenerationMixinEncoderDecoder mixin.

As this is a research project, we store these outputs such that we can run multiple experiments on them. In production, one would include the subsequent steps into summary generation to compute hallucination probabilities in an online fashion.

Compute BART-GBP Scores

The following script computes the scores for each token in the summary (sentence):

python bart_gbp_scores.py

Evaluate Scores

To run evaluation, we first convert the token scores into word probabilities of hallucination:

python convert_token_scores_to_word_probs.py

Then we compute the points on the precision/recall curve. They are a result of varying the hallucination probability thresholds for classifying a data point as hallucination. The script also computes the ROC curve:

python evaluate_bart_gbp.py

Plot Results

Once all results are computed, we plot the PR and ROC curves for intrinsic/extrinsic/all hallucinations with:

python plot_results.py

Results

We've uploaded our model predictions and results for BART-GBP and the baselines here.

Contact

In case of problems or questions open a Github issue or write an email to andreas.marfurt [at] idiap.ch.

Acknowledgments

The work was supported as a part of the grant Automated interpretation of political and economic policy documents: Machine learning using semantic and syntactic information, funded by the Swiss National Science Foundation (grant number CRSII5_180320).

Citation

If you use our code, data or models, please cite us.

@inproceedings{marfurt-etal-2022-corpus,
    title = "Unsupervised Token-level Hallucination Detection from Summary Generation By-products",
    author = "Marfurt, Andreas  and
      Henderson, James",
    booktitle = "Proceedings of the Second Workshop on Generation, Evaluation and Metrics",
    month = dec,
    year = "2022",
    publisher = "Association for Computational Linguistics",    
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alignment_utils.py		alignment_utils.py
bart_gbp_scores.py		bart_gbp_scores.py
beam_search_bart.py		beam_search_bart.py
beam_search_beam_finder.py		beam_search_beam_finder.py
beam_search_test.py		beam_search_test.py
compute_convex_hull.py		compute_convex_hull.py
convert_token_scores_to_word_probs.py		convert_token_scores_to_word_probs.py
environment.yml		environment.yml
evaluate_bart_gbp.py		evaluate_bart_gbp.py
evaluate_dae.py		evaluate_dae.py
evaluate_fairseq.py		evaluate_fairseq.py
evaluate_feqa.py		evaluate_feqa.py
evaluate_lexical_overlap.py		evaluate_lexical_overlap.py
evaluation_utils.py		evaluation_utils.py
plot_results.py		plot_results.py
save_bart_outputs_for_alignment.py		save_bart_outputs_for_alignment.py
textual_overlap.py		textual_overlap.py

License

idiap/hallucination-detection

Folders and files

Latest commit

History

Repository files navigation

hallucination-detection

Contents

Short Description

Data

Installation

Usage

Get BART Outputs

Compute BART-GBP Scores

Evaluate Scores

Plot Results

Results

Contact

Acknowledgments

Citation

About

Resources

License

Stars

Watchers

Forks

Languages