The UD Turkish Treebank, also called the IMST-UD Treebank, is a semi-automatic conversion of the IMST Treebank (Sulubacak et al., 2016).
The UD Turkish Treebank, also called the IMST-UD Treebank, is a semi-automatic conversion of the IMST Treebank (Sulubacak et al., 2016), which is itself a reannotated version of the METU-Sabancı Turkish Treebank (Oflazer et al., 2003). All three of the treebanks share the same raw data, a set of 5 635 sentences collected from daily news reports and novels.
This treebank follows a set of morphosyntactic annotation guidelines based on those established by Çağrı Çöltekin, and later revised and restructured by Memduh Gökırmak, Francis Tyers, and Umut Sulubacak. The conversion from the IMST Treebank was done by Umut Sulubacak. The contributors would also like to thank Birsel Karakoç, Hüner Kaşıkara, and Tuğba Pamay for their discussions and insights.
- UD 2.8
- The word "bir", if used as a determiner, gets corresponding UPOS and features.
- Attachment of punctuation fixed using Udapi ud.FixPunct.
- A number of other validation errors fixed.
- Undocumented Aspect=DurPerf changed to Aspect=Dur.
- UD 2.4
- Moved around a few sentences so that both dev and test have over 10K words again.
- UD 2.2
- Repository renamed from UD_Turkish to UD_Turkish-IMST.
- UD 2.1
- No change.
- UD 2.0
- Conversion to UD v2 guidelines.
- UD 1.4
- Fixed annotation and spelling mistakes in generated forms of multiword tokens.
- UD 1.3
- First release in UD.