Transductive Classification of Chinese Hypernymy vs. Non-hypernymy Relations Based on Word Embeddings

By Chengyu Wang (https://chywang.github.io)

Introducion: This software classifies Chinese word pairs into hypernymy vs. non-hypernymy relations based on transductive non-linear projection learning. Two datasets of Chinese word pairs (i.e., a training set and a testing set), together with all the embedding vectors of associated Chinese word pairs should be provided as inputs. The software automatically trains the model and makes predictions over the testing set.

Paper: Wang et al. Transductive Non-linear Learning for Chinese Hypernym Prediction. ACL 2017

APIs

TransductLeaner: The main software entry-point, with five input arguments required.

w2vPath: The embeddings of all Chinese words in either the training set or the testing set. The start of each line of the file is the Chinese word, followed by the embedding vectors. All the values in a line are separated by a blank (' '). In practice, the embeddings can be learned by all deep neural language models.

NOTE: Due to the large size of neural language models, we only upload the embedding vectors of words in the training and testing sets. Please use your own neural language model instead, if you would like to try the algorithm over your datasets.

trainPath: The path of the training set in the format of "word1 \t word2 \t label" triples. As for the label, 1 is for the hypernymy relation and 0 is for the non-hypernymy relation.
testPath: The path of the testing set. The format of the testing set is the same as that of the training set.
outputPath: The path of the output file, containing the model prediction scores of all the pairs in the testing set. The output of each pair is a real value in (-1,1). (Please refer to the paper for detailed explanation.)
dimension: The dimensionality of the embedding vectors.

NOTE: The default values can be set as: "word_vectors.txt", "train.txt", "test.txt", "output.txt" and "50".

Eval: A simple evaluation script, with three input arguments required. It outputs Precision, Recall and F1-score as the evaluation scores.

truthPath: The path of the testing set, with human-labeled results.
predictPath: The path of the model output file,.
thres: A threshold in (-1,1) for the model to assign relation labels to Chinese word pairs. (Please refer to the parameter 'θ' in the paper.)

NOTE: The default values can be set as: "test.txt", "output.txt" and "0.1".

Dependencies

This software is run in the JaveSE-1.8 environment. With a large probability, it runs properly in other versions of JaveSE as well. However, there is no guarantee.
It requires the FudanNLP toolkit for Chinese NLP analysis (https://github.com/FudanNLP/fnlp/), and the JAMA library for matrix computation (https://math.nist.gov/javanumerics/jama/). We use Jama-1.0.3.jar in this project.

Citation

If you find this software useful for your research, please cite the following paper.

@inproceedings{acl2017,
author = {Chengyu Wang and Junchi Yan and Aoying Zhou and Xiaofeng He},
title = {Transductive Non-linear Learning for Chinese Hypernym Prediction},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
pages = {1394–1404},
year = {2017}
}

More research works can be found here: https://chywang.github.io.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.settings		.settings
bin		bin
fdnlp		fdnlp
src		src
.classpath		.classpath
.project		.project
README.md		README.md
blacklist.txt		blacklist.txt
initial.txt		initial.txt
output.txt		output.txt
test.txt		test.txt
train.txt		train.txt
word_vectors.txt		word_vectors.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.settings

.settings

bin

bin

fdnlp

fdnlp

src

src

.classpath

.classpath

.project

.project

README.md

README.md

blacklist.txt

blacklist.txt

initial.txt

initial.txt

output.txt

output.txt

test.txt

test.txt

train.txt

train.txt

word_vectors.txt

word_vectors.txt

Repository files navigation

Transductive Classification of Chinese Hypernymy vs. Non-hypernymy Relations Based on Word Embeddings

By Chengyu Wang (https://chywang.github.io)

About

Releases

Packages

Languages

chywang/CN-TransductIsALearner

Folders and files

Latest commit

History

Repository files navigation

Transductive Classification of Chinese Hypernymy vs. Non-hypernymy Relations Based on Word Embeddings

By Chengyu Wang (https://chywang.github.io)

About

Resources

Stars

Watchers

Forks

Languages