A software for manually creating multi-document summarization corpora and a platform for developing complex annotation tasks spanning multiple steps.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
doc
src
LICENSE.txt
NOTICE.txt
README.md
pom.xml

README.md

MDSWriter

MDSWriter is a software for manually creating multi-document summarization corpora and a platform for developing complex annotation tasks spanning multiple steps.

Please use the following citation:

@InProceedings{Meyer:2016:ACLdemo,
  author    = {Meyer, Christian M. and Benikova, Darina and Mieskes, Margot and Gurevych, Iryna},
  title     = {MDSWriter: Annotation tool for creating high-quality multi-document summarization corpora},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations},
  month     = {August},
  year      = {2016},
  address   = {Berlin, Germany},
  publisher = {Association for Computational Linguistics},
  pages     = {97--102},
  url       = {http://www.aclweb.org/anthology/P/P16/P16-4017.pdf}
}

Abstract: In this paper, we present MDSWriter, a novel open-source annotation tool for creating multi-document summarization corpora. A major innovation of our tool is that we divide the complex summarization task into multiple steps which enables us to efficiently guide the annotators and to record all their intermediate results and user–system interaction data. This allows evaluating the individual components of a complex summarization system and learning from the human composition process. MDSWriter is highly flexible and can be adapted to multiple other tasks.

Contact person: Christian M. Meyer, http://www.ukp.tu-darmstadt.de/people/meyer

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

For license information, see LICENSE.txt and NOTICE.txt files.

Documentation and Tutorial

  • Video tutorial explaining our initial setup: https://www.youtube.com/channel/UC1-qTfTCnVBZklJwCj2kGDQ
  • Screenshots of the proposed seven steps for multi-document summarization: doc/screenshots.pdf
  • Corresponding annotation guidelines: doc/annotation_guidelines_en.pdf
  • Installation guide: see below

Requirements

  • Java 7 and higher
  • J2EE platform with JavaServer Pages (JSP) and WebSocket implementations (e.g., Apache Tomcat 7 and higher)
  • Maven
  • MySQL database (or other SQL database)

Installation

  • Install Java, Maven, Tomcat, and MySQL.
  • Download the source code from GitHub.
  • Import the empty schema from doc/mdswriter_schema.sql to your database.
  • Update src/main/webapp/META-INF/context.xml with your database settings.
  • Update src/main/webapp/js/st.js: Set the SERVER_URL variable to the URL the software will be depolyed to.
  • Build the software using mvn package
  • Deploy the war file from target/ to your application server.
  • Open http://localhost:8080/mdswriter (or accordingly) and try to log in using admin1:admin2.
  • Test if everything works and then import your own data into the schema.

Extensibility

Adapting MDSWriter to a new task works best if you first follow the installation guide and get the basic system to work. For developing your application, we recommend using a J2EE-ready IDE, such as Eclipse or IntelliJ. The following steps are necessary to make MDSWriter do what your application needs:

  • Define the annotation steps you want to provide. For each step, add a corresponding JSP file with the user interface to the webapp folder. You can of course reuse the existing user interfaces which should save you quite some development time. All JSP files refer to the common _header, _title, and _footer templates to ensure a similar appearance and menu. For the corresponding guidelines, you may want to add a help file to the webapp/help folder. If you care about internationalization, put all your strings into the property files at resources/i18n/ - currently we have English and German.
  • The core link between user interface (JSP) and MDSWriter server is our WebSocket communication protocol. The Java class de.tudarmstadt.aiphes.mdswriter.Message contains an overview of all predefined messages. Change the messages according to your needs and implement or reuse the corresponding business logic in de.tudarmstadt.aiphes.mdswriter.MDSWriterEndpoint and its child and helper classes. Most likely, you will require authentication and storing user-system interaction data which MDSWriter provides you without further adaptation. In case of a cross-document annotation task, you can also reuse the classes in the de.tudarmstadt.aiphes.mdswriter.doc package.
  • If necessary, make sure that you also update the database schema for your particular task.