Skip to content
Jannik Strötgen edited this page Oct 7, 2016 · 17 revisions

Table of contents

Changelog

Version 2.2.1

SHA: f7e4c3f

  • added: temponym tagging functionality [Core]
  • added: English temponym resources [Resources]
  • fixed: parameter pos set to "no" (POSTagger.NO) works for all languages now (AllLanguagesTokenizer) [Standalone]
  • fixed: several minor issues

Version 2.1

SHA: 378e476

  • fixed: TreeTaggerWrapper no longer creates temporary files which speeds up processing [Core]
  • fixed: Various improvements to Maven support [Maven]
  • fixed: Errors in Arabic resources [Resources]
  • fixed: Values in IntervalTagger that were switched for a while [Core]

Version 2.0

SHA: b9248e0

  • added: automatically-created resources for 200+ languages [Resources]
  • added: AllLanguagesTokenizer, a simple, generic, whitespace-based tokenizer that can be used with all languages
  • fixed: Minor rule improvements for some languages [Resources]

Version 1.9

SHA: d50e129

  • added: Support for Estonian [Resources]
  • added: Support for Portuguese (thanks to Zunsik Lim) [Resources]
  • added: Resource loading is now easier and and looks in multiple places [Core]
  • fixed: StanfordPOSTaggerWrapper would not accept URLs as model (#26) [Core]
  • fixed: A bug where pattern replacing in rules would mangle some patterns [Core]
  • fixed: Lots of improvements to German and English resources [Resources]
  • fixed: Minor improvements for almost all other languages [Resources]

Version 1.8

SHA: b9c5832

  • fixed: Italian resources have received a major overhaul [Resources]
  • added: Support for Croatian (thanks to Luka Skukan), including a wrapper for the hunpos preprocessing tool [Resources]
  • added: Ability to use regular expressions in POS constraints of rules [Resources]
  • added: Tokenization without the Tree Tagger's Perl script [TreeTaggerWrapper]
  • fixed: various minor bugfixes in the TreeTaggerWrapper and Standalone code
  • fixed: Some code pertaining to the invocation of Processors [Core]

Version 1.7

SHA: 5ca451f

  • added: Support for calculation of BC and AD dates, and dates close to the year 0, including Arabic, Dutch, English, French, German, Italian, Spanish, Vietnamese language resources.
  • added: A preliminary version of Elena Klyachko's Russian resources [Resources]
  • fixed: A minor issue with parameter files in TreeTaggerWrapper [Core]

Version 1.6

SHA: d592d19

  • added: Chinese resources, support in TreeTaggerWrapper as well as the TempEval-2 Reader and Standalone version
  • added: Better handling of overlapping temporal expressions [Core]
  • fixed: Made TempEval-3 Reader more robust to non-TE3-inputs
  • fixed: More stable TreeTaggerWrapper parameter file recognition
  • fixed: Various minor improvements in the resources for all languages [Resources]
  • fixed: Some minor fixes for resource recognition in Standalone [Standalone]

Version 1.5

SHA: 183643e

  • added: French resources, kindly provided by Véronique Moriceau of the LIMSI-CNRS [Resources]
  • added: Support for the IntervalTagger in HeidelTime Standalone [Standalone]
  • added: Support to choose from !StanfordPOSTagger or TreeTagger as HeidelTime Standalone's preprocessing engine [Standalone]
  • added: Interval resources to Vietnamese [Resources]
  • fixed: Improvements in German, English and Vietnamese resources [Resources]
  • added: Rudimentary Maven support [Meta]

Version 1.4.1

SHA: 07c89a5

  • fixed: A bug that would prevent HeidelTime Standalone from loading resources under Windows [Standalone]

Version 1.4

SHA: 07c89a5

  • added: Support for Spanish and Arabic document processing via standalone [Standalone]
  • added: Some more error handling for unexpected user input [Core]
  • fixed: Made fixes and alterations to Spanish, Italian, German, Vietnamese and Arabic resources [Resources]
  • fixed: A bug where underspecified centuries ("UNDEF-century") in resources would break processing [Core]
  • fixed: Made several improvements to the StanfordPOSTagger to work with more unconventional documents
  • fixed: Erroneous normalization of underspecified centuries; it now works according to the TIMEX standard [Resources]
  • fixed: The Windows printResourceInformation.bat script now works with paths that contain spaces [Resources]

Version 1.3

SHA: 1d2fdfa

  • added: Resources for Spanish, Italian, Vietnamese and Arabic [Resources]
  • added: TreeTaggerWrapper, a sentence-tokenization, word-tokenization and part of speech tagging wrapper for the popular TreeTagger. It replaces the DKPro Analysis Engines as preprocessing component.
  • added: Support for regular expressions in normalization resources [Core]
  • added: A new annotator, IntervalTagger that recognizes interval expressions
  • added: New UIMA Collection Reader and Consumer for our participation in the TempEval-3 challenge: TempEval3Reader and TempEval3Writer
  • added: A UIMA Analysis Engine (JVnTextProWrapper) that leverages the JVnTextPro tool to produce word- and sentence-tokenization as well as part of speech tagging for Vietnamese
  • added: A switch (-c) that allows passing the path of the config.props file (issue 3 (on Google Code)) [Standalone]
  • added: Sub-processor priorities to influence when they are being run [Core]
  • added: Switches (-v/-vv) to control the verbosity of logging messages (issue 4 (on Google Code)) [Standalone]
  • added: A descriptor parameter as well as command line switch (-locale) to specify a locale for HeidelTime to base relative date calculation on (issue 1 (on Google Code)) [Core/Standalone]
  • added: setenv and printResourceInformation batch files for Windows
  • added: An optional setting in the ACETernWriter descriptor to prevent conversion from Timex3 to Timex2
  • fixed: Charset in Dutch resources [Resources]
  • fixed: Behaviour when non-hardcoded languages are supplied: The resource folder is assumed to be the name of the language [Core]
  • fixed: Token boundary detection [Core]
  • fixed: Typos in english resources [Resources]
  • fixed: A bug where two overlapping temporal expressions would break XML-conformity (issue 5 (on Google Code)) [Standalone]
  • fixed: A bug that would break resource loading on the Mac OS platform [Core]
  • fixed: TempEval2 Reader now works properly with the italian TempEval2 corpus
  • fixed: ACE Tern reader recognizes DCTs from the ICAB corpus
  • fixed: A bug in TempEval2Writer that would break the output of parallelized UIMA workflows
  • fixed: A bug where when you used the HeidelTime standalone version programmatically, the OutputType would be ignored (issue 7 (on Google Code)) [Standalone]
  • fixed: Several things in a lot of places that made it hard to use HeidelTimeStandalone programmatically (i.a. issue 6 (on Google Code)) [Standalone]

Version 1.2

SHA: f1b7a4f

  • added: links to the journal paper to the readme file
  • fixed: TIMEX3 SET expressions not being translated to TIMEX2 expresions correctly [ACETernWriter]
  • removed: dead code

Version 1.1

SHA: f814184

  • added: support code and english resources for two additional domains: COLLOQUIAL and SCIENTIFIC
  • fixed: an incorrect DCT recognition regex
  • fixed: standalone not recognizing parameters when not all upper case

Version 1.0

SHA: 7dc7ab5

  • conducted major code overhaul, mostly modularization-based refactoring
  • merged the previously independent heideltime-standalone project into the kit repository
  • fixed: made logger message source more apparent

Version 0

SHA: 38cad56

  • added: support to read the ACE Tern 2005 Corpus
  • fixed: regex recognitions of strings in the form of "1995-1996"
  • added: recognition logic for holidays together with english and german resources

Initial Version

SHA: 8a1262f

This release represents the state of HeidelTime's development before the use of a revision control system; i.e., the version, which was available from the dbs.ifi.uni-heidelberg.de page.