MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Authors: Liyan Tang, Philippe Laban, Greg Durrett

Please check out our work here 📃

LLM-AggreFact Benchmark

Description

LLM-AggreFact is a fact verification benchmark. It aggregates 10 of the most up-to-date publicly available datasets on factual consistency evaluation across both closed-book and grounded generation settings. In LLM-AggreFact:

Documents come from diverse sources, including Wikipedia paragraphs, interviews, web text, covering domains such as news, dialogue, science, and healthcare.
Claims to be verified are mostly generated from recent generative models (except for one dataset of human-written claims), without any human intervention in any format, such as injecting certain error types into model-generated claims.

Benchmark Access

Our Benchmark is available on HuggingFace 🤗 More benchmark details can be found here.

from datasets import load_dataset
dataset = load_dataset("lytang/LLM-AggreFact")

The benchmark contains the following fields:

Field	Description
dataset	One of the 10 datasets in the benchmark
doc	Document used to check the corresponding claim
claim	Claim to be checked by the corresponding document
label	1 if the claim is supported, 0 otherwise

MiniCheck Model Evaluation Demo

Please first clone our GitHub Repo and install necessary packages from requirements.txt.

Our MiniCheck models are available on HuggingFace 🤗 More model details can be found from this collection. Below is a simple use case of MiniCheck. MiniCheck models will be automatically downloaded from Huggingface for the first time and cached in the specified directory.

from minicheck.minicheck import MiniCheck

doc = "A group of students gather in the school library to study for their upcoming final exams."
claim_1 = "The students are preparing for an examination."
claim_2 = "The students are on vacation."

# model_name can be one of ['roberta-large', 'deberta-v3-large', 'flan-t5-large']
# lytang/MiniCheck-Flan-T5-Large will be auto-downloaded from Huggingface for the first time
scorer = MiniCheck(model_name='flan-t5-large', device=f'cuda:0', cache_dir='./ckpts')
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])

print(pred_label) # [1, 0]
print(raw_prob)   # [0.9805923700332642, 0.007121307775378227]

A detailed walkthrough of the evaluation process on LLM-Aggrefact and replication of the results is available in this notebook: inference-example-demo.ipynb.

Synthetic Data Generation

Code and our 14K training data will be available soon.

Citation

If you found our work useful, please consider citing our work.

@misc{tang2024minicheck,
      title={MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents}, 
      author={Liyan Tang and Philippe Laban and Greg Durrett},
      year={2024},
      eprint={2404.10774},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
minicheck		minicheck
README.md		README.md
inference-example-demo.ipynb		inference-example-demo.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

minicheck

minicheck

README.md

README.md

inference-example-demo.ipynb

inference-example-demo.ipynb

requirements.txt

requirements.txt

Repository files navigation

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

LLM-AggreFact Benchmark

Description

Benchmark Access

MiniCheck Model Evaluation Demo

Synthetic Data Generation

Citation

About

Releases

Packages

Languages

Liyan06/MiniCheck

Folders and files

Latest commit

History

Repository files navigation

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

LLM-AggreFact Benchmark

Description

Benchmark Access

MiniCheck Model Evaluation Demo

Synthetic Data Generation

Citation

About

Resources

Stars

Watchers

Forks

Languages