Skip to content

Junpliu/skip-gram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

skip-gram

A simple implementation of Skip-Gram model in PyTorch.

Files organization

main.py —— the training process

model.py —— model's definition

getData.py —— pre-processing and organizing data(import torch.utils.data.DataLoader to enable batch)

text8、simtext2 —— the data files, "simtext2" is smaller.

If you encounter the problem "RuntimeWarning: divide by zero encountered in true_divide sampling_p = (np.sqrt(fre_np / 0.001) + 1) * 0.001 / fre_np", you should probably consider decreasing the value of vacabulary_size(for example 1000), because you may be using smaller dataset.

Results

The results of english text are as follow, the chinese word vectors are still be training.

task this repo CCL2017 paper
word relatedness 69.88% 69.36%
syntactic question 16.84% 54.24%
semantic question 9.59% 45.59%

References

Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.

Li, Fang and Xiaojie Wang. “Improving Word Embeddings for Low Frequency Words by Pseudo Contexts.” CCL (2017).

About

A simple PyTorch's implementation of skip-gram model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published