Skip to content

LoraineYoko/word_difficulty

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Classification and Comparison of Words by Difficulty

The paper Automatic Classification and Comparison of Words by Difficulty has accepted by the 27th International Conference on Neural Information Processing (ICONIP 2020).

In this repository, you can find all the resource we used and the main codes.

Dataset

A dataset is made up of three parts: a reliable corpus (Corpus), a pronunciation dictionary (Pdict) and a standard leveled word list (W). Corpus and Pdict are the resource for extracting features which are stated in Sec. 2, W is regarded as the ground truth.

The following tables show the resources and details of the Corpus and W.

Language Corpus
English NewYork Times (2005-2006)
Gutenberg
German Parallel Corpus for German
Chinese Wikipedia for Chinese

The details of our ground truth are listed as following:

Results

Due to the limitation of the paper, all the experimental results are shown here.

This table shows the classification and ranking results using two English Corpus and their combination to extract the features. The accuracy for baseline models and our feature engineering models is the average of ten runs. Test means the accuracy on test set and CV means the accuracy on cross validation. MFF is the multi-faceted features using Word2Vec to obtain word embeddings. (** indicates p-value ≤ 0.01 compared with Random, FC, FO, FPOS and FLSCP baselines.)

This table shows the classification and ranking results using German and Chinese Corpus to extract features. (** indicates p-value ≤ 0.01 compared with Random, FC, FO, FPOS and FLSCP baselines; †† indicates p-value ≤ 0.01 compared with Random, FC, FO and FPOS baselines; ‡‡ indicates p-value ≤ 0.01 compared with Random and FC baselines.)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages