A repository for a Chinese fine-grained entity typing dataset based on the FIGER ontology. This repository is part of the software release of our paper Cross-lingual Inference with a Chinese Entailment Graph. The dataset based on A Chinese Corpus for Fine-grained Entity Typing.
The dataset has been annotated through label mapping: we manually mapped the tokens from each of the ~6000 ultra-fine-grained types to a FIGER type; for more details please check our paper. The resulting mappings are here, they should be put under ./u2figer; the resulting re-annotated dataset is here, decompose the zip file and put it under the root directory.
We updated the CFET baseline in accordance with our re-annotated data. To run the baseline, take the following steps:
- From fastText, download its Chinese model here;
- Run preprocess.py in mode
embed
,data
andpred
respectively, remember to set the correct path to the downloaded fastText model; - Do training simply with
python train.py
, configurations can be set inconfig.py
; - For doing inference on datasets in other domains, please refer to predict.py
We have also built another baseline model based on the HierType, which as shown below, has better generalization properties than the present baseline. The Chinese HierType baseline can be found in another repository here.
Coming soon.