Skip to content

hMDS: the heterogeneous multi-document summarization corpus

Notifications You must be signed in to change notification settings

MarkusZopf/hMDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

The hMDS Corpus

The hMDS corpus is a heterogeneous multi-document summarization corpus built with a novel corpus construction approach. It consists of 91 topics coming from 3 different domains. You can find the guidelines which were used by the annotators to create the corpus in the Guidelines.md file.

Reference

If you plan to refer to hMDS in your publications, please cite the corresponding Coling 2016 paper:

@InProceedings{Zopf2016hMDS,
  author    = {Zopf, Markus and Peyrard, Maxime and Eckle-Kohler, Judith},
  title     = {The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach},
  booktitle = {Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016)},
  month     = {December},
  year      = {2016},
  address   = {Osaka, Japan},
  publisher = {Association for Computational Linguistics},
  pages     = {1535--1545},
  url       = {https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_AIPHES/publications/2016/2016_COLING_hMDS_cameraReady.pdf},
  website = {https://github.com/AIPHES/hMDS}
}

Obtaining the Corpus

The public parts of the corpus can be found in the hMDS file. Due to copyright restrictions, we are not able to make the full corpus directly available. The subfolder "input", as described in the readme.txt in the hMDS archive files, is missing. To mitigate this issue, we added link lists containing references to the web pages included in the corpus (see Guidelines.md, step 6 for details) which allows an automatic crawling of the corpus.

About

hMDS: the heterogeneous multi-document summarization corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages