Python code for implementing embeddings in the Wasserstein space of elliptical distributions
Switch branches/tags
Nothing to show
Clone or download
Boris  Muzellec Boris  Muzellec
Boris Muzellec and Boris Muzellec cleaning
Latest commit 6c2b4e8 Oct 24, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
README.md
embeddings.py
product.py
sampling.pyx
setup.py
similarity_evaluation.py
skipgram_data.pyx
skipgram_train.py
softmax.py
utils.py
wordnet_data.pyx
wordnet_evaluation.py
wordnet_train.py

README.md

Elliptical Embeddings

This repository contains Python code for computing embeddings in the Wasserstein space of elliptical distributions, as in

Boris Muzellec and Marco Cuturi, Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions

While the code it contains is functional and allows to reproduce results form the paper, it is still under the process of being refactored. A final version will be made available shortly.

Dependencies

python 2.7.5, cupy, cython

Training Data

The skipgram model presented in the paper was trained on a concatenation of ukWaC and WaCkypedia_EN, both of which can be requested here.

We use all words appearing more than 100 times after tokenization (this is customisable).

The wordnet dataset can be obtained from nltk. The extraction of the hypernymy transitive closure can be performed using code from the Poincaré embeddingsrepository.

Usage

Prior to training your first embeddings, it is necessary to compile the cython files. You can do this by running the following command:

python setup.py build_ext --inplace