Israeli Learner Corpus of Written English
Learner corpora---datasets that reflect the language of non-native speakers---are instrumental for research of language learning and development, as well as for practical applications, mainly for teaching and education. Such corpora now exist for a plethora of native--foreign language pairs; but until recently, none of them reflected native Hebrew speakers, and very few reflected native Arabic speakers.
We introduce a recently-released corpus of English essays authored by learners in Israel. The corpus consists of two sub-corpora, one of them of Arabic native speakers and the other consisting mainly of Hebrew native speakers. We report on the composition and curation of the datasets; specifically, we processed the data so that both sub-corpora are now uniformly represented, facilitating seamless research and computational processing of the data. We provide statistical information on the corpora and outline a few research projects that had already used them. All the resources related to the corpus are freely available.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.