Skip to content

deba-iitbh/word2vec-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wor2Vec Models

This repo contains the 3 popular word2Vec models implemented in pytorch. Implemeted models:

  • Skipgram
  • Continious Bag of Words(CBOW)
  • Global Vectors (GloVe)

Dataset

We tried to train it o0n 10% of the latest Wiki Dump, but were unable to process it, due to computational resources. Thus trained it on a small dataset, included in the repo. We downloaded the Wiki Xml file and preprocessed it to .txt file using the extractor script.

Run the Code

cd src
python3 main.py

Evaluation

The word vectors are evaluated on SimLex-999 dataset, as you can see in the notebook.
The word vectors are also generated using this notebook.
The word vector visualization can be seen here

References: