Skip to content

JannikStroetgen/KRAUTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

KRAUTS

A German Temporally Annotated News Corpus

KRAUTS (Korpus of newspapeR Articles with Underlinded Temporal expressionS) is a German temporally annotated news corpus accompanied with TimeML annotation guidelines for German. It was developed at Fondazione Bruno Kessler, Trento, Italy and at the Max Planck Institute for Informatics, Saarbrücken, Germany. Our goal is to boost temporal tagging research [1] for German.

The corpus is available under CC-BY-NC license and is described in:

  • Jannik Strötgen, Anne-Lyse Minard, Lukas Lange, Manuela Speranza, Bernardo Magnini:
    KRAUTS: A German Temporally Annotated News Corpus, LREC'18 (to appear)

KRAUTS contains articles from the daily newspaper Dolimiten and from the weekly newspaper Die Zeit.

The annotation guidelines are strongly based on the guidelines defined for Italian, i.e., the It-TimeML guidelines [2]. Our Annex to the It-TimeML guidelines contains (annotated) examples in German and extensions needed to adapt the It-TimeML guidelines to the specific morpho-syntactic features of German. It is available on the It-TimeML website.

A publicly available temporal tagger for German is HeidelTime, which can be found on the HeidelTime github page.

[1] Jannik Strötgen and Michael Gertz: Domain-Sensitive Temporal Tagging, Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2016.
[2] Tommaso Caselli and Rachele Sprugnoli: It-TimeML, TimeML Annotation Guidelines for Italian, version 1.4, 2015.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published