Summary

The Google Korean Universal Dependency Treebank is first converted from the Universal Dependency Treebank v2.0 (legacy), and then enhanced by Chun et al., 2018.

Acknowledgements

This is a collaborative work by (in alphabetic order):

Jinho Choi, Emory University
Jayeol Chun, Emory University
Na-Rae Han, University of Pittsburgh
Jena D. Hwang, Institute for Human & Machine Cognition.
Ryan McDonald, Google Research
Joakim Nivre, Uppsala University
Daniel Zeman, Institute of Formal and Applied Linguistics

The project repository: https://github.com/emorynlp/ud-korean

Citation

Building Universal Dependency Treebanks in Korean, Jayeol Chun, Na-Rae Han, Jena D. Hwang, and Jinho D. Choi. In Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC'18, Miyazaki, Japan, 2018.

Changelog

2022-11-15 v2.11
- Fixed right-headed apposition and non-projective punctuation.
- Symbols after numbers are units, not punctuation.
- Nouns cannot be attached as mark.
- Fixed: genitive nouns are nmod:poss, not det:poss.
- Fixed: adverbially used nominals are obl, not advmod.
- Fixed: adverbially used verbs are advcl, not advmod.
- Positive copula is always lemmatized 이 so that the validator recognizes it.
- Fixed auxiliaries.
- Fixed: function words should be leaves.
2019-11-15 v2.5
- Google gave permission to drop the "NC" restriction from the license. This applies to the UD annotations (not the underlying content, of which Google claims no ownership or copyright).
2018-04-15 v2.2
- Significant rework, fixed some annotation errors.
- Added lemmas and fine-grained part-of-speech tags automatically generated by the KOMA morphological analyzer.
2017-11-15 v2.1
- No changes.
2017-03-01 v2.0
- Initial UD release.

===================================
Universal Dependency Treebanks v2.0
(legacy information)
===================================

=========================
Licenses and terms-of-use
=========================

For the following languages

  German, Spanish, French, Indonesian, Italian, Japanese, Korean and Brazilian
  Portuguese

we will distinguish between two portions of the data.

1. The underlying text for sentences that were annotated. This data Google
   asserts no ownership over and no copyright over. Some or all of these
   sentences may be copyrighted in some jurisdictions.  Where copyrighted,
   Google collected these sentences under exceptions to copyright or implied
   license rights.  GOOGLE MAKES THEM AVAILABLE TO YOU 'AS IS', WITHOUT ANY
   WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED.

2. The annotations -- part-of-speech tags and dependency annotations. These are
   made available under a CC BY-SA 4.0. GOOGLE MAKES
   THEM AVAILABLE TO YOU 'AS IS', WITHOUT ANY WARRANTY OF ANY KIND, WHETHER
   EXPRESS OR IMPLIED. See attached LICENSE file for the text of CC BY-NC-SA.

Portions of the German data were sampled from the CoNLL 2006 Tiger Treebank
data. Hans Uszkoreit graciously gave permission to use the underlying
sentences in this data as part of this release.

Any use of the data should reference the above plus:

  Universal Dependency Annotation for Multilingual Parsing
  Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg,
  Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang,
  Oscar Tackstrom, Claudia Bedini, Nuria Bertomeu Castello and Jungmee Lee
  Proceedings of ACL 2013

=======
Contact
=======

ryanmcd@google.com
joakim.nivre@lingfil.uu.se
slav@google.com
See https://github.com/ryanmcd/uni-dep-tb for more details

=== Machine-readable metadata =================================================
Data available since: UD v2.0
License: CC BY-SA 4.0
Includes text: yes
Genre: news blog
Lemmas: automatic
UPOS: converted from manual
XPOS: automatic
Features: not available
Relations: converted from manual
Contributors: McDonald, Ryan; Nivre, Joakim; Zeman, Daniel; Choi, Jinho; Han, Na-Rae; Hwang, Jena; Chun, Jayeol
Contributing: here
Contact: jinho.choi@emory.edu
===============================================================================
(Original treebank contributors: LaMontagne, Adam; Souček, Milan; Järvinen, Timo; Radici, Alessandra)

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
eval.log		eval.log
ko_gsd-ud-dev.conllu		ko_gsd-ud-dev.conllu
ko_gsd-ud-test.conllu		ko_gsd-ud-test.conllu
ko_gsd-ud-train.conllu		ko_gsd-ud-train.conllu
stats.xml		stats.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

Acknowledgements

Citation

Changelog

About

Releases

Packages

Contributors 5

License

UniversalDependencies/UD_Korean-GSD

Folders and files

Latest commit

History

Repository files navigation

Summary

Acknowledgements

Citation

Changelog

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages