Tensor Decomp Embedding

Code for the paper Word Embedding via Tensor Decomposition.

This project is implemented in Python 3 only.

Training

First, to set up the data, create a tokenized data file where one sentence is on each line and the words are separated by spaces. Then edit the absolute (or relative) file path in test_gensim.py in the sentences_generator function of GensimSandbox. (These functions are named as such because this repository was originally a fork of Gensim, and we were going to modify the existing code, but the purpose of this repository has since changed)

To train a new embedding, look up a valid embedding method in the Makefile, then type "make ". It will prompt you to type in an experiment name (if you have one), but if you are not running a specific experiment, just press enter. Assuming you have all dependencies properly installed, it will train, evaluate, and save the learned embedding type.

After training, the program will save the embedding and its associated metadata to "runs//<num_sents><min_vocab_count><embedding_dim>/" for easy access and comparison.

Evaluation

To compare a list of embeddings trained via the Makefile, modify the end of embedding_comparison.py to include the names of the embeddings you wish to compare, and then just run "python3 embedding_comparison.py <compairson_type>"

CP Decomposition

A generic framework for online CP decomposition implemented in TensorFlow can be found in tensor_decomp.py. Included is also Joint Symmetric CP Decomposition, described in the paper.

BibTeX

@misc{1704.02686,
Author = {Eric Bailey and Shuchin Aeron},
Title = {Word Embeddings via Tensor Factorization},
Year = {2017},
Eprint = {arXiv:1704.02686},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
embedding_benchmarks		embedding_benchmarks
evaluation_data		evaluation_data
gensim		gensim
vectors		vectors
web		web
wikisem500		wikisem500
.gitignore		.gitignore
3D_pmi_gatherer		3D_pmi_gatherer
Makefile		Makefile
README.md		README.md
embedding_comparison.py		embedding_comparison.py
embedding_evaluation.py		embedding_evaluation.py
gatherer_100000_1000_3.pkl		gatherer_100000_1000_3.pkl
gensim_utils.py		gensim_utils.py
prereqs.txt		prereqs.txt
presquisites.txt		presquisites.txt
requirements.txt		requirements.txt
run_single.sh		run_single.sh
tensor_decomp.py		tensor_decomp.py
tensor_embedding.py		tensor_embedding.py
test_gensim.py		test_gensim.py
wikimodel_100000_5		wikimodel_100000_5
word_count.py		word_count.py
word_counts_to_pmi.py		word_counts_to_pmi.py

dnguyen1196/word-embedding-cp

Folders and files

Latest commit

History

Repository files navigation

Tensor Decomp Embedding

Training

Evaluation

CP Decomposition

BibTeX

About

Resources

Stars

Watchers

Forks

Languages