Skip to content

Teddy-Li/CFIGER

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CFIGER: Chinese Fine-Grained entity typing under FIGER ontology

A repository for a Chinese fine-grained entity typing dataset based on the FIGER ontology. This repository is part of the software release of our paper Cross-lingual Inference with a Chinese Entailment Graph. The dataset based on A Chinese Corpus for Fine-grained Entity Typing.

Annotation Process

The dataset has been annotated through label mapping: we manually mapped the tokens from each of the ~6000 ultra-fine-grained types to a FIGER type; for more details please check our paper. The resulting mappings are here, they should be put under ./u2figer; the resulting re-annotated dataset is here, decompose the zip file and put it under the root directory.

Baselines

We updated the CFET baseline in accordance with our re-annotated data. To run the baseline, take the following steps:

  1. From fastText, download its Chinese model here;
  2. Run preprocess.py in mode embed, data and pred respectively, remember to set the correct path to the downloaded fastText model;
  3. Do training simply with python train.py, configurations can be set in config.py;
  4. For doing inference on datasets in other domains, please refer to predict.py

We have also built another baseline model based on the HierType, which as shown below, has better generalization properties than the present baseline. The Chinese HierType baseline can be found in another repository here.

Results

Evaluation Results for the Two Baselines on CFIGER dataset.

Citing Us

Coming soon.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%