Skip to content
/ IBGEC Public

Code and Data release for the papre Inherent Biases in Grammatical Error Correction and Text Simplification

Notifications You must be signed in to change notification settings

borgr/IBGEC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IBGEC

Code and Data release for the paper Inherent Biases in Grammatical Error Correction and Text Simplification The unseenest github repo deals with a comfortable implementation in python 3 for unseenest algorithm to predict the histogram of a discreet probabillity while sampling from only some of the discreet possible values.

The poissonBinomial github repo deals with implementations of papers to efficiently and comfortably calculate Poisson Binomial CDF PDF etc.

The batches contains 2,500 simplification and grammatical error correction tasks done by humans (mostly 50 examples per sentence, more on that in the paper). The batches directory also contains the data of validation by humans of grammatical error corrections done by humans (from the data above).

Please cite this paper if you find our resources useful

bib (from ACL anthology):

@inproceedings{choshen-abend-2018-inherent,
    title = "Inherent Biases in Reference-based Evaluation for Grammatical Error Correction",
    author = "Choshen, Leshem  and
      Abend, Omri",
    booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2018",
    address = "Melbourne, Australia",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P18-1059",
    doi = "10.18653/v1/P18-1059",
    pages = "632--642",
    abstract = "The prevalent use of too few references for evaluating text-to-text generation is known to bias estimates of their quality (henceforth, low coverage bias or LCB). This paper shows that overcoming LCB in Grammatical Error Correction (GEC) evaluation cannot be attained by re-scaling or by increasing the number of references in any feasible range, contrary to previous suggestions. This is due to the long-tailed distribution of valid corrections for a sentence. Concretely, we show that LCB incentivizes GEC systems to avoid correcting even when they can generate a valid correction. Consequently, existing systems obtain comparable or superior performance compared to humans, by making few but targeted changes to the input. Similar effects on Text Simplification further support our claims.",
}

About

Code and Data release for the papre Inherent Biases in Grammatical Error Correction and Text Simplification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published