The UD Tamil treebank is based on the Tamil Dependency Treebank created at the Charles University in Prague by Loganathan Ramasamy.


The treebank was part of HamleDT, a collection of treebanks converted to the Prague dependency style (since 2011). Later versions of HamleDT added a conversion to the Stanford dependencies (2014) and to Universal Dependencies (HamleDT 3.0, 2015). The first release of Universal Dependencies that includes this treebank is UD v1.2 in November 2015. It is essentially the HamleDT conversion but the data is not identical to HamleDT 3.0 because the conversion procedure has been further improved.



  • 2018-04-15 v2.2
    • Repository renamed from UD_Tamil to UD_Tamil-TTB.
    • Added enhanced representation of dependencies propagated across coordination. The distinction of shared and private dependents is derived deterministically from the original Prague annotation.
  • 2017-03-01 v2.0
    • Converted to UD v2 guidelines.
    • Reconsidered PRON vs. DET distinction.
    • Improved advmod vs. obl distinction.
  • 2016-05-15 v1.3
    • Added Latin transliteration of lemmas and full sentences.
    • Added orthographic words (surface tokens) and their mapping to nodes.
    • Improved conversion of AuxY.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v1.2
License: CC BY-NC-SA 3.0
Includes text: yes
Genre: news
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: converted from manual
Contributors: Ramasamy, Loganathan; Zeman, Daniel
Contributing: elsewhere
