Automatic Classification and Comparison of Words by Difficulty

The paper Automatic Classification and Comparison of Words by Difficulty has accepted by the 27th International Conference on Neural Information Processing (ICONIP 2020).

In this repository, you can find all the resource we used and the main codes.

Dataset

A dataset is made up of three parts: a reliable corpus (Corpus), a pronunciation dictionary (Pdict) and a standard leveled word list (W). Corpus and Pdict are the resource for extracting features which are stated in Sec. 2, W is regarded as the ground truth.

The following tables show the resources and details of the Corpus and W.

Language	Corpus
English	NewYork Times (2005-2006)
English	Gutenberg
German	Parallel Corpus for German
Chinese	Wikipedia for Chinese

The details of our ground truth are listed as following:

Results

Due to the limitation of the paper, all the experimental results are shown here.

This table shows the classification and ranking results using two English Corpus and their combination to extract the features. The accuracy for baseline models and our feature engineering models is the average of ten runs. Test means the accuracy on test set and CV means the accuracy on cross validation. MFF is the multi-faceted features using Word2Vec to obtain word embeddings. (** indicates p-value ≤ 0.01 compared with Random, FC, FO, FPOS and FLSCP baselines.)

This table shows the classification and ranking results using German and Chinese Corpus to extract features. (** indicates p-value ≤ 0.01 compared with Random, FC, FO, FPOS and FLSCP baselines; †† indicates p-value ≤ 0.01 compared with Random, FC, FO and FPOS baselines; ‡‡ indicates p-value ≤ 0.01 compared with Random and FC baselines.)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Classification		Classification
FeatureExtraction		FeatureExtraction
data		data
figure		figure
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification

Classification

FeatureExtraction

FeatureExtraction

data

data

figure

figure

README.md

README.md

Repository files navigation

Automatic Classification and Comparison of Words by Difficulty

Dataset

Results

About

Releases

Packages

Languages

LoraineYoko/word_difficulty

Folders and files

Latest commit

History

Repository files navigation

Automatic Classification and Comparison of Words by Difficulty

Dataset

Results

About

Resources

Stars

Watchers

Forks

Languages