compsumm

Code, Datasets and Supplementary Appendix for AAAI-19 paper Comparative Document Summarisation via Classification

Supplementary Appendix: pdf

How to use this repository ?

1. Installing

If you have miniconda or anaconda, please use install.sh to install new env compsumm that has all dependencies; otherwise the dependencies are listed in environment.yml

2. Dataset

The dataset are in directory dataset in HDF5 format. There are three files for each of the three news topics used in paper. Each file has following structure:

-- data: Averaged GLOVE vectors of title and first 3 sentences, 300 dimensional
-- y: labels created by dividing timeranges into two groups
-- yn: labels created using month for beefban and wee for capital punishment and guncontrol.
-- title: title of article
-- text: first three sentences
-- datetime: date of publication

The dataset was split 70-20-10 as train-test-val sets 51 times. The precomputed splits are available in:
-- train_idxs: Matrix with each row i containing training indexes of split i.
-- test_idxs: Matrix with each row i containing test indexes of split i.
-- val_idxs: Matrix with each row i containing val indexes of split i.

The paper uses first 10 splits to compute error bars in automatic evaluations results. Please see news.py for example loading of this dataset.

3. Code

Please see demo notebook for example use of subm.py and grad.py

subm.py has utility functions and greedy optimiser for discrete optimisation.
grad.py has utility functions and SGD optimiser for continuous optimisation. SGD optimised wasn't used, LBFGS from scipy was used instead.

models.py has several models for summarisation as classifiers. Models were abstracted into Summ class. This is particularly useful in creating common pattern for different summariser methods and in tuning hyperparameters. Please see models notebook for demo of news.py and models.py.

utils.py has common functions such as balanced accuracy, which is used for evaluation.

4. Crowd-sourced evaluations results

The crowd-sourced evaluations results is in file crowdflower.csv. The design and settings for this experiment is explained in the paper.

Citing

If you use this dataset, please cite this work at

@inproceedings{bista2019compsumm,
  title={Comparative Document Summarisation via Classification},
  author={Bista, Umanga and Mathews, Alexander and Shin, Minjeong and Menon, Aditya Krishna and Xie, Lexing},
  booktitle={AAAI},
  volume={},
  pages={},
  year={2019}
}

The paper can be found in arxiv

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
datasets		datasets
.env		.env
.gitignore		.gitignore
README.md		README.md
appendix.pdf		appendix.pdf
crowdflower.csv		crowdflower.csv
demo.ipynb		demo.ipynb
environment.yml		environment.yml
grad.py		grad.py
install.sh		install.sh
kmedoids.py		kmedoids.py
models.ipynb		models.ipynb
models.py		models.py
news.py		news.py
subm.py		subm.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

compsumm

How to use this repository ?

1. Installing

2. Dataset

3. Code

4. Crowd-sourced evaluations results

Citing

About

Uh oh!

Releases

Packages

Languages

computationalmedia/compsumm

Folders and files

Latest commit

History

Repository files navigation

compsumm

How to use this repository ?

1. Installing

2. Dataset

3. Code

4. Crowd-sourced evaluations results

Citing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages