Skip to content

chemicaltree/GEC_CCE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Preprocess code for Cross-Corpora Evaluation of Grammatical Error Correction

This repository contains the Preprocess code for Cross-Corpora Evaluation of Grammatical Error Correction described in :

Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata and Kentaro Inui. 2019. Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough?. In Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis, USA.

If you make use of this code, please cite the above papers.

Pre-requisites

We only support Python 2. It is safest to install everything in a clean virtualenv.

It can be installed as follows:

pip install -r requirements.txt

(NOTE: To get the exact data you may need to use NLTK v2.0b7 for tokenization. )

Data Preparation for Cross-Corpora Evaluation

To convert raw data (.xml) to m2 format use the following preprocessing script.

## CLC-FCE
python2 clcfce_to_m2.py -in dataset/ -out output

## KJ/ICNLAE
python2 kj_to_m2.py -in kj_all.raw -out output_file

Preprocessing Scripts for the Other Corpora

For the other corpora we used such as CoNLL-2014, 2013 and JFELG, you can use the following official preprocessing scripts.

Evaluation

You can evaluate your systems using the following scorers (m2scorer and GLEU).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%