Code for "Improving Faithfulness by Augmenting Negative Summaries from Fake Documents"
If you find our work useful, please kindly leave a star. Thanks!
@article{wang-2022-ImprovingFD,
title={Improving Faithfulness by Augmenting Negative Summaries from Fake Documents},
author={Wang, Tianshu and Ladhak,Faisal and Durmus, Esin and He, He},
year={2022}
}
This paper has been accepted by EMNLP 2022.
Download prepared training negative samples for 3 datasets we mentioned from google drive
- xsum: in
datasets
libary - gigaword: in
datasets
libary - wikihow: in
datasets
libary, but need manually download the data from here
You need to generate the positive-negative mapping file see in COFE/data_preprocessing
This repo provide an example for xsum (create_pos_neg_index_xsum.sh
), you can modify it for other datasets.
see in COFE/finetune_xsum.sh
It use deepspeed zero3, and fp16(if your card support it) to accelerate the training.
finetune_xsum.sh
is an example for xsum, you can modify it for other datasets.
Please note there are some comments in the script, you need to modify them according to your environment.
You need to add your own path to the mapping_file_path
, negative_train_file
and negative_train_file
Please see in COFE/decode_xsum.sh
and modify the --resume_from_checkpoint
as your best checkpoint path.
The default output file is COFE_predictions
, you can change it by --save_name
'
The output follow the json format, it's a list of [pred, gold, index]
tuple. The index
is the index of the sample in the test-set, in order to match and for the downstream evaluation.
see an example in COFE/data_preprocessing/xsum_postprocessing.sh
It will generate xsum_res
dictionary and several files in it:
- res.csv store all the source, target, pred as each row
- hypo.txt & refs.txt for BERT-score
- date-dev.jsonl for factCC
This repo provide a python script to run this part.
Please provide path to the xsum_res
dictionary to --result_dir
argument
Please see in this repo here is a short example:
import random
from newsroom import jsonl
from newsroom.analyze import Fragments
summary, text = "textA", "textB"
fragments = Fragments(summary, text)
# Print paper metrics:
print("Coverage:", fragments.coverage())
print("Density:", fragments.density())
print("Compression:", fragments.compression())
# Extractive fragments oracle:
print("List of extractive fragments:")
print(fragments.strings())
Please see in this repo we used this command to get the result:
bert-score -r xsum_res/refs.txt -c xsum_res/hyps.txt --lang en --rescale_with_baseline
We removed the factcc part because we found factcc is not very consistent with the other metircs. But you can still use it by yourself.
Please see in baseline
folder, and follow the readme in each folder.