The Google Korean Universal Dependency Treebank is first converted from the Universal Dependency Treebank v2.0 (legacy), and then enhanced by Chun et al., 2018.
This is a collaborative work by (in alphabetic order):
- Jinho Choi, Emory University
- Jayeol Chun, Emory University
- Na-Rae Han, University of Pittsburgh
- Jena D. Hwang, Institute for Human & Machine Cognition.
- Ryan McDonald, Google Research
- Joakim Nivre, Uppsala University
- Daniel Zeman, Institute of Formal and Applied Linguistics
The project repository: https://github.com/emorynlp/ud-korean
- Building Universal Dependency Treebanks in Korean, Jayeol Chun, Na-Rae Han, Jena D. Hwang, and Jinho D. Choi. In Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC'18, Miyazaki, Japan, 2018.
- 2019-11-15 v2.5
- Google gave permission to drop the "NC" restriction from the license. This applies to the UD annotations (not the underlying content, of which Google claims no ownership or copyright).
- 2018-04-15 v2.2
- Significant rework, fixed some annotation errors.
- Added lemmas and fine-grained part-of-speech tags automatically generated by the KOMA morphological analyzer.
- 2017-11-15 v2.1
- No changes.
- 2017-03-01 v2.0
- Initial UD release.
=================================== Universal Dependency Treebanks v2.0 (legacy information) =================================== ========================= Licenses and terms-of-use ========================= For the following languages German, Spanish, French, Indonesian, Italian, Japanese, Korean and Brazilian Portuguese we will distinguish between two portions of the data. 1. The underlying text for sentences that were annotated. This data Google asserts no ownership over and no copyright over. Some or all of these sentences may be copyrighted in some jurisdictions. Where copyrighted, Google collected these sentences under exceptions to copyright or implied license rights. GOOGLE MAKES THEM AVAILABLE TO YOU 'AS IS', WITHOUT ANY WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED. 2. The annotations -- part-of-speech tags and dependency annotations. These are made available under a CC BY-SA 4.0. GOOGLE MAKES THEM AVAILABLE TO YOU 'AS IS', WITHOUT ANY WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED. See attached LICENSE file for the text of CC BY-NC-SA. Portions of the German data were sampled from the CoNLL 2006 Tiger Treebank data. Hans Uszkoreit graciously gave permission to use the underlying sentences in this data as part of this release. Any use of the data should reference the above plus: Universal Dependency Annotation for Multilingual Parsing Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Tackstrom, Claudia Bedini, Nuria Bertomeu Castello and Jungmee Lee Proceedings of ACL 2013 ======= Contact ======= firstname.lastname@example.org email@example.com firstname.lastname@example.org See https://github.com/ryanmcd/uni-dep-tb for more details
=== Machine-readable metadata ================================================= Data available since: UD v2.0 License: CC BY-SA 4.0 Includes text: yes Genre: news blog Lemmas: automatic UPOS: converted from manual XPOS: automatic Features: not available Relations: converted from manual Contributors: McDonald, Ryan; Nivre, Joakim; Zeman, Daniel; Choi, Jinho; Han, Na-Rae; Hwang, Jena; Chun, Jayeol Contributing: here Contact: email@example.com =============================================================================== (Original treebank contributors: LaMontagne, Adam; Souček, Milan; Järvinen, Timo; Radici, Alessandra)