Skip to content

Releases: Lyn4ever29/GuwenEE

GuwenEE语料库

03 Jun 01:45
fee72b4
Compare
Choose a tag to compare

本语料库是一个古汉语领域事件抽取语料库,原始数据来自《二十四史》,从中随机抽取部分句子作为标注语料,通过大规模语言模型与人工相结合的方式构建。包含古汉语句子1000条,7个事件类别(一个分类),72个事件类型(二级分类),1928 个事件。
数据文件包含:

  • GuwenEE.json
    • 语料库全文
  • GuwenEE_Event_Schema.json
    • 事件Schema

In English

This corpus is an event extraction corpus for the field of ancient Chinese, with raw data from the "Twenty Four Histories". Some sentences are randomly selected as annotated corpus, and constructed through a combination of large-scale language models and artificial methods. Contains 1000 ancient Chinese sentences, 7 event categories (one classification), 72 event types (secondary classification), and 1928 events.
The data file contains:

  • GuwenEE.json
    • The whole corpus
  • GuwenEE_Event_Schema.json
    • Event Schema