MeetingBank-utils

Utils to pre-process meetingbank data and reproduce results. (Home page, ACL 2023 paper).

Overview

MeetingBank is a benchmark dataset created from the city councils of 6 major U.S. cities to supplement existing datasets. It contains 1,366 meetings with over 3,579 hours of video, as well as transcripts, PDF documents of meeting minutes, agenda, and other metadata. On average, a council meeting is 2.6 hours long and its transcript contains over 28k tokens, making it a valuable testbed for meeting summarizers and for extracting structure from meeting videos. The datasets contains 6,892 segment-level summarization instances for training and evaluating of performance.

ResultsEval.py

This script is used to evaluate the performance of the system generated summaries. It will calculate the all the metrics described in the paper.

Following instrucions to install SummerTime. Check my installation records here to find some helpful tips.
Add model generate data from Zenodo to the "data/", run the following command

python ResultsEval.py data/<system_results>.json

Dataset

We have uploaded the dataset on Huggingface to enable more convenient access to MeetingBank in your research.

from datasets import load_dataset
meetingbank = load_dataset("huuuyeah/meetingbank")

train_data = meetingbank['train']
test_data = meetingbank['test']
val_data = meetingbank['validation']

def generator(data_split):
  for instance in data_split:
    yiled instance['id'], instance['summary'], instance['transcript']

Resources

MeetingBank dataset will be hosted at Zenodo. Dataset will includes meeting audio, transcripts, meetingbank main JSON file, summaries from 6 systems and human annotations.

Download link for transcripts: zenodo.

Meeting Videos: All meeting videos can be found in https://archive.org/

Meeting Audios: HuggingFace

Acknowledgement

Please cite the following paper in work that uses this dataset:

MeetingBank: A Benchmark Dataset for Meeting Summarization
Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu
In main conference of Association for Computational Linguistics (ACL'23), Toronto, Canada.

Bibtex

@inproceedings{hu-etal-2023-meetingbank,
    title = "MeetingBank: A Benchmark Dataset for Meeting Summarization",
    author = "Yebowen Hu and Tim Ganter and Hanieh Deilamsalehy and Franck Dernoncourt and Hassan Foroosh and Fei Liu",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)",
    month = July,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
utils		utils
README.md		README.md
ResultsEval.py		ResultsEval.py
load_data.py		load_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

utils

utils

README.md

README.md

ResultsEval.py

ResultsEval.py

load_data.py

load_data.py

Repository files navigation

MeetingBank-utils

Overview

ResultsEval.py

Dataset

Resources

Acknowledgement

Bibtex

About

Releases

Packages

Contributors 2

Languages

YebowenHu/MeetingBank-utils

Folders and files

Latest commit

History

Repository files navigation

MeetingBank-utils

Overview

ResultsEval.py

Dataset

Resources

Acknowledgement

Bibtex

About

Resources

Stars

Watchers

Forks

Languages