Skip to content

The code for EMNLP2022 paper "Improved grammatical error correction by ranking elementary edits"

Notifications You must be signed in to change notification settings

AlexeySorokin/EditScorer

Repository files navigation

Improved grammatical error correction by ranking elementary edits

Code for EMNLP2022 paper Improved grammatical error correction by ranking elementary edits that provides a state-of-the-art approach to grammatical error correction.

Installation

  • pip install -r requirements.txt
  • (optional) Install ERRANT for evaluation.

Data

  • Download W&I-LOCNESS data
mkdir -p data
cd data && wget https://www.cl.cam.ac.uk/research/nl/bea2019st/data/wi+locness_v2.1.bea19.tar.gz
tar -xzvf wi+locness_v2.1.bea19.tar.gz
cd ..
  • (To reproduce finetuning and evaluation) Download edits generated by GECToR model
cd data
mkdir -p bea_reranking && cd bea_reranking
wget https://www.dropbox.com/s/m5dot9rp0vwkcc8/gector_variants.tar.gz
tar -xzvf gector_variants.tar.gz
cd ../..
Checkpoint folder Language Best F1 Model Threshold Basic model weight
pie_bea-gector English 56.05:star: roberta-base 0.8 0.1
pie_bea_ft2-gector English 57.51:star: roberta-large 0.8 0.1
clang_large_ft2-gector English 58.94:star: roberta-large 0.8 0.1
ru_200K_gpt Russian 53.44:heavy_check_mark: sberbank-ai/ruRoberta-large 0.7 0.1
ru_200K_gpt_ft1 Russian 55.04:heavy_check_mark: sberbank-ai/ruRoberta-large 0.8 0.1

⭐ On BEA-2019 development set ✔️ On RULEC-GEC test set

Russian data

To obtain RULEC-GEC data follow the instructions in RULEC-GEC repository. The zip archive with edits is available via the link, the password is the correction for the first error in its training data.

Model evaluation and application

Variants generation

  • English, GECToR: see our modification of GECToR repository.
  • English, BERT-GEC: run beam search with large beam size (e.g., 15) using their code and then postprocess the output with
python bertgec/output_to_json.py -i BERT_GEC_OUTPUT_FOLDER/test.nbest.tok -o OUTPUT.jsonl
python bertgec/process_bert_gec_outputs.py -i OUTPUT.jsonl -s INPUT_FILE -o OUTPUT.variants -t -3.0 -j

In case the data is simply the list of tokenized sentences, append -r option to the last command.

  • Russian: uses a modification of a GPT-like model, TO APPEAR SOON.

You may use your own generator if it produces the file in the appropriate format (use the provided GECToR edits as reference).

Variants reranking

For each generated edit, our model returns its probability to be correct and applies the edits whose probabilities are higher than the given threshold. We recommend to use 0.8 or 0.9 threshold by default or tune it on development set.

# Faster simultaneous decoding (see the paper)
python apply_model.py -c CHECKPOINT_FOLDER -C CHECKPOINT_NAME -v TEST_VARIANTS_PATH
-O OUTPUT_FOLDER --n_max 8 [-m MODEL_NAME; DEFAULT=roberta-base] [-T THRESHOLDS ...; DEFAULT=0.4 0.5 0.6 0.7 0.8 0.9] [-a BASIC_MODEL_WEIGHTS ...] [-r] 
# Better stagewise decoding (see the paper)
python apply_staged_model.py -c CHECKPOINT_FOLDER -C CHECKPOINT_NAME -v TEST_VARIANTS_PATH
-O OUTPUT_FOLDER -s 8 [-m MODEL_NAME, DEFAULT=roberta-base] [-T THRESHOLDS ..., DEFAULT=0.7 0.8 0.9] [-a BASIC_MODEL_WEIGHTS ...] [-r]

Add -r key when variants were obtained from unlabeled data and correct answers are not known.

  • For example, to make the predictions on development set using checkpoints/pie_bea_ft2-gector/checkpoint_2.pt checkpoint with stagewise decoding and evaluate them for threshold=0.9, run
python apply_staged_model.py -c checkpoints/pie_bea_ft2-gector -C checkpoint_2.pt \
-i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -v data/bea_reranking/gector_variants/bea.dev.variants \
-O dump/reranking -s 8 -a 0.1
./scripts/evaluate.sh -i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -r dump/reranking/pie_bea_ft2-gector/0.9_staged.output

It should produce

=========== Span-Based Correction ============
TP	FP	FN	Prec	Rec	F0.5
2250	903	5211	0.7136	0.3016	0.5605
==============================================

The combined model output for threshold 0.8 is evaluated by

./scripts/evaluate.sh -i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -r dump/reranking/pie_bea_ft2-gector/0.8_alpha=0.10_1.00_staged.output

and produces

=========== Span-Based Correction ============
TP	FP	FN	Prec	Rec	F0.5
2567	1147	4894	0.6912	0.3441	0.5751
==============================================

A larger checkpoints/clang_large_ft2-gector/checkpoint_2.pt checkpoint is used analogously

python apply_staged_model.py -c checkpoints/clang_large_ft2-gector -C checkpoint_2.pt \
-i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -v data/bea_reranking/gector_variants/bea.dev.variants \
-O dump/reranking -m roberta-large -s 8 -a 0.1
./scripts/evaluate.sh -i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -r dump/reranking/clang_large_ft2-gector/0.8_alpha=0.10_1.00_staged.output

=========== Span-Based Correction ============
TP	FP	FN	Prec	Rec	F0.5
2678	1136	4783	0.7021	0.3589	0.5894
==============================================
  • To generate the outputs on the test set, run
python apply_staged_model.py -c checkpoints/clang_large_ft2-gector -C checkpoint_2.pt \
-v data/wi+locness/test/ABCN.test.bea19.orig -O dump/test_output -s 8 -a 0.1 -r

The *.output files for different threshold values are available in OUTPUT_FOLDER (dump/test_output in our case).

Russian

The only difference for Russian is that we use M2Scorer to do evaluation

python apply_staged_model.py -c checkpoints/ru_200K_gpt_ft1 -C checkpoint_2.pt -O dump/reranking \
 -v data/russian_reranking/gpt/test.variants -i data/russian/RULEC-GEC.test.M2 -m sberbank-ai/ruRoberta-large \
 -s 5 -a 0.1

python scripts/m2scorer/scripts/m2scorer.py dump/reranking/ru_200K_gpt_ft1/0.7_alpha\=0.10_1.00_staged.output data/russian/RULEC-GEC.test.M2

Precision   : 0.7367
Recall      : 0.2733
F_0.5       : 0.5502

Model training

python train.py TRAIN_VARIANTS_PATH -T TEST_VARIANTS_PATH -M 768 --loss_by_class -e EPOCHS
-c CHECKPOINT_FOLDER [-L INITIAL_CHECKPOINT_PATH] [-E RECALL_ESTIMATE] [-m MODEL_NAME; DEFAULT=roberta-base] --save_all_checkpoints
--only_generated
  • English, finetuning on W&I-LOCNESS train set using GECToR-generated edits:
python train.py -t data/bea_reranking/gector_variants/bea.train.variants -T \
data/bea_reranking/gector_variants/bea.dev.variants -M 768 --loss_by_class -e 3 \ 
-c checkpoints/pie_bea_ft2_rerun-gector -L checkpoints/pie_bea-gector/checkpoint_2.pt \
-E 0.4 --save_all_checkpoints --only_generated
  • Russian, finetuning on RULEC-GEC data
python train.py -t data/russian_reranking/gpt/train.variants -T data/russian_reranking/gpt/dev.variants \
 -M 768 --loss_by_class -e 5 -c checkpoints/ru_200K_gpt_ft1 -L checkpoints/ru_200K_gpt/checkpoint_1.pt \ 
 -E 0.4 --save_all_checkpoints -m sberbank-ai/ruRoberta-large --only_generated

About

The code for EMNLP2022 paper "Improved grammatical error correction by ranking elementary edits"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages