Changes in this release:
- Fix a bug where the start/end markers could be used when handling unknown tokens (typically an unseen punctuation character). This change does not require retraining.
- Add a utility jitar-tag-conllx to tag files that are in the CoNLL-X format. This preserves all other columns.
- Compute interpolated scores only once.
Changes compared to Jitar 0.1.0:
- Add a capitalization marking to tags (as per the TnT paper). This gives and improvement of around .2% on German and English.
- Add a separate unknown word distribution for words containing a dash. This provides a modest improvement for English and German.
- API simplification (no more need to use/specify start and end markers).
- Java-style corpus readers.
- Unified training and tagging data structures.
- Add a utility for 10-fold cross-validation.
The changes break existing models, so you should retrain your model when switching to Jitar 0.3.0.