Analyzing and Evaluating Faithfulness in Dialogue Summarization (EMNLP 2022)

For more details, please find our paper on arXiv .

Human Evaluation Results

Results and descriptions
CSV format + 150 dialogues + 5 candidates per dialogue + 6 error types

Generate FacEval Dataset

We provide our generated FacEval dataset and also the source code to automatically generate them, which can be further improved and extended to other datasets.

We test the code with python 3.7 and below requirements.

pip install -r requirements.txt

The generated FacEval dataset is 'data/faceval_samples.json'.

Demo to generate factually corrupted samples:

python -m spacy download en_core_web_sm
bash data_preparation.sh

Evaluation Demo on BART-Large Model

We provide the evaluation demo on BART-Large model. Because our proposed model-level evaluation needs the direct access to the model's generation probabilities. The model needs to be loaded as well for testing.

First, download the trained BART-Large model (.bin) from Google Drive and place it in folder 'trained_model'.

Run the following bash to compute the score on different error types for BART-Large mdoel.

bash eval_demo.sh

The model score will be saved in folder 'scores_log'

References

If you find our work useful, please consider citing our work.

@inproceedings{wang2022analyzing,
  title={Analyzing and Evaluating Faithfulness in Dialogue Summarization},
  author={Wang, Bin and Zhang, Chen and Zhang, Yan and Chen, Yiming and Li, Haizhou},
   booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
   year={2022}
}

@article{wang2022analyzing,
  title={Analyzing and Evaluating Faithfulness in Dialogue Summarization},
  author={Wang, Bin and Zhang, Chen and Zhang, Yan and Chen, Yiming and Li, Haizhou},
  journal={arXiv preprint arXiv:2210.11777},
  year={2022}
}

Contact to Bin Wang at bwang28c@gmail.com for any issues.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
data		data
fac_gen_src		fac_gen_src
human_result		human_result
scores_log		scores_log
src_model_fact_aug		src_model_fact_aug
trained_model		trained_model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_preparation.sh		data_preparation.sh
eval_demo.sh		eval_demo.sh
poster.pdf		poster.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing and Evaluating Faithfulness in Dialogue Summarization (EMNLP 2022)

Human Evaluation Results

Generate FacEval Dataset

Evaluation Demo on BART-Large Model

References

About

Releases

Packages

Languages

License

BinWang28/FacEval

Folders and files

Latest commit

History

Repository files navigation

Analyzing and Evaluating Faithfulness in Dialogue Summarization (EMNLP 2022)

Human Evaluation Results

Generate FacEval Dataset

Evaluation Demo on BART-Large Model

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages