Skip to content

YebowenHu/MeetingBank-utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MeetingBank-utils

Utils to pre-process meetingbank data and reproduce results. (Home page, ACL 2023 paper).

Overview

MeetingBank is a benchmark dataset created from the city councils of 6 major U.S. cities to supplement existing datasets. It contains 1,366 meetings with over 3,579 hours of video, as well as transcripts, PDF documents of meeting minutes, agenda, and other metadata. On average, a council meeting is 2.6 hours long and its transcript contains over 28k tokens, making it a valuable testbed for meeting summarizers and for extracting structure from meeting videos. The datasets contains 6,892 segment-level summarization instances for training and evaluating of performance.

ResultsEval.py

This script is used to evaluate the performance of the system generated summaries. It will calculate the all the metrics described in the paper.

  1. Following instrucions to install SummerTime. Check my installation records here to find some helpful tips.

  2. Add model generate data from Zenodo to the "data/", run the following command

python ResultsEval.py data/<system_results>.json

Dataset

We have uploaded the dataset on Huggingface to enable more convenient access to MeetingBank in your research.

from datasets import load_dataset
meetingbank = load_dataset("huuuyeah/meetingbank")

train_data = meetingbank['train']
test_data = meetingbank['test']
val_data = meetingbank['validation']

def generator(data_split):
  for instance in data_split:
    yiled instance['id'], instance['summary'], instance['transcript']

Resources

MeetingBank dataset will be hosted at Zenodo. Dataset will includes meeting audio, transcripts, meetingbank main JSON file, summaries from 6 systems and human annotations.

Download link for transcripts: zenodo.

Meeting Videos: All meeting videos can be found in https://archive.org/

Meeting Audios: HuggingFace

Acknowledgement

Please cite the following paper in work that uses this dataset:

MeetingBank: A Benchmark Dataset for Meeting Summarization
Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu
In main conference of Association for Computational Linguistics (ACL'23), Toronto, Canada.

Bibtex

@inproceedings{hu-etal-2023-meetingbank,
    title = "MeetingBank: A Benchmark Dataset for Meeting Summarization",
    author = "Yebowen Hu and Tim Ganter and Hanieh Deilamsalehy and Franck Dernoncourt and Hassan Foroosh and Fei Liu",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)",
    month = July,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages