Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
37 lines (28 sloc) 1.32 KB

Summary

Simplified Chinese Universal Dependencies dataset converted from the GSD (traditional) dataset with manual corrections.

Introduction

This is a simplified Chinese version of the UD Chinese GSD treebank. It is initially automatically converted into simplified Chinese with the OpenCC tool with patterns for mapping punctuation, then corrected with manual fixes.

Changelog

  • 2019-11-15 v2.5
    • Initial release in Universal Dependencies, converted from UD_Chinese-GSD.
    • Google gave permission to drop the "NC" restriction from the license. This applies to the UD annotations (not the underlying content, of which Google claims no ownership or copyright).
    • Fixed punctuation (use East Asian punctuation where appropriate)
    • Fixed various parses and features (e.g., added Case=Ord)
    • Some manual fixes in tokenization
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.5
License: CC BY-SA 4.0
Includes text: yes
Genre: wiki
Lemmas: converted from manual
UPOS: converted with corrections
XPOS: manual native
Features: automatic with corrections
Relations: converted from manual
Contributors: Qi, Peng; Yasuoka, Koichi
Contributing: here
Contact: pengqi@cs.stanford.edu
===============================================================================
You can’t perform that action at this time.