This is code for the ACL 2023 paper ExplainMeetSum: A Dataset for Explainable Meeting Summarization Aligned with Human Intent
Model named Multi-DYLE is extended-version of DYLE(https://github.com/Yale-LILY/DYLE) for "DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization," which distributed under MIT License Copyright (c) 2021 Yale-LILY.
You can build dataset by executing python file.
# or you can specify your own path
python data/convert.py \
--qmsum QMSUM_ROOT_DIRECTORY \
--dialogue_act ACL2018_ABSSUMM_ROOT_DIRECTORY \
--save_dir EXPLAIMEETSUM_DIRECTORY
ExplainMeetSum data is extended-version of QMSum dataset. To annotate evidence sentence by sentence, we had to split sentences as correctly as we can. Below are our methods how to split sentences
- meeting transcripts
- Committee: use
nltk.sent_tokenize()
- Academic(ICSI), Product(Ami): use dialogue act files in ACL2018_AbsSumm
- Committee: use
- answers in query_list (i.e. summary)
- use
nltk.sent_tokenize()
and merge sentences that splited wrongly (if you want to know, refer tosentence_split.py
) - splited answers are already stored in
data/SummaryEvidence
- use
ExplainMeetSum data should also contain meeting_transcripts
which doesn't exist in data/SummaryEvidence
.
So, you need original QMSum/ dataset.
We splited meeting_transcripts
of ICSI and Ami dataset in QMSum by using dialogue act files.
So, you need acl2018_abssumm/ for dialogue act files.
Install dependencies via:
conda create -n multidyle python=3.9.6
conda activate multidyle
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install nltk==3.6.2 pyrouge==0.1.3 transformers==4.8.1 rouge==1.0.0 datasets==1.11.0
sh train_multi_dyle.sh
You can see other results by editing config.py
sh test_multi_dyle.sh