Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
240.txt
297.txt
README.md
analogy.txt
non-composition.txt
skipgram.7z.001
skipgram.7z.002

README.md

Data

analogy

analogy.txt is the analogical reasoning dataset on Chinese.

wordsim

240.txt and 297.txt are wordsim-240 and wordsim-296 respectively.

Word pair OPEC 石油 in 297.txt is removed in the test.

These two datasets are conventional similarity test for Chinese. These files are uploaded for convenience and they are NOT created by the authors of the paper. However, it is hard to find the source. I will be willing to accept any suggestion about refining the reference.

Wordsim-296 is from SemEval-2012 task 4: evaluating Chinese word similarity. (Abstract) (pdf)

Non-compositional wordlist

The wordlist is uploaded as non-composition.txt.

You can’t perform that action at this time.