Tagging of CoNLL-X files and bugfixes

@danieldk danieldk released this Oct 1, 2015 · 3 commits to master since this release

Changes in this release:

  • Fix a bug where the start/end markers could be used when handling unknown tokens (typically an unseen punctuation character). This change does not require retraining.
  • Add a utility jitar-tag-conllx to tag files that are in the CoNLL-X format. This preserves all other columns.
  • Compute interpolated scores only once.

Downloads

It's Christmas!

@danieldk danieldk released this Jul 31, 2014 · 23 commits to master since this release

Changes compared to Jitar 0.1.0:

  • Add a capitalization marking to tags (as per the TnT paper). This gives and improvement of around .2% on German and English.
  • Add a separate unknown word distribution for words containing a dash. This provides a modest improvement for English and German.
  • API simplification (no more need to use/specify start and end markers).
  • Java-style corpus readers.
  • Unified training and tagging data structures.
  • Add a utility for 10-fold cross-validation.

The changes break existing models, so you should retrain your model when switching to Jitar 0.3.0.

Downloads

Jitar 0.1.0

@danieldk danieldk released this Oct 3, 2013 · 56 commits to master since this release

jitar-0.1.0

[maven-release-plugin]  copy for tag jitar-0.1.0

Downloads