word2vec-api

Simple web service providing a word embedding API. The methods are based on Gensim Word2Vec implementation. Models are passed as parameters and must be in the Word2Vec text or binary format. Updated to run on Python 3.

Install Dependencies

pip install -r requirements.txt

Launching the service

python word2vec-api --model path/to/the/model [--host host --port 1234]

or

python word2vec-api.py --model /path/to/GoogleNews-vectors-negative300.bin --binary BINARY --path /word2vec --host 0.0.0.0 --port 5000

Example calls

curl http://127.0.0.1:5000/word2vec/n_similarity?ws1=Sushi&ws1=Shop&ws2=Japanese&ws2=Restaurant
curl http://127.0.0.1:5000/word2vec/similarity?w1=Sushi&w2=Japanese
curl http://127.0.0.1:5000/word2vec/most_similar?positive=indian&positive=food[&negative=][&topn=]
curl http://127.0.0.1:5000/word2vec/model?word=restaurant
curl http://127.0.0.1:5000/word2vec/model_word_set

Note: The "model" method returns a base64 encoding of the vector. "model_word_set" returns a base64 encoded pickle of the model's vocabulary.

Where to get a pretrained model

In case you do not have domain specific data to train, it can be convenient to use a pretrained model. Please feel free to submit additions to this list through a pull request.

Model file	Number of dimensions	Corpus (size)	Vocabulary size	Author	Architecture	Training Algorithm	Context window - size	Web page
Google News	300	Google News (100B)	3M	Google	word2vec	negative sampling	BoW - ~5	link
Freebase IDs	1000	Gooogle News (100B)	1.4M	Google	word2vec, skip-gram	?	BoW - ~10	link
Freebase names	1000	Gooogle News (100B)	1.4M	Google	word2vec, skip-gram	?	BoW - ~10	link
Wikipedia+Gigaword 5	50	Wikipedia+Gigaword 5 (6B)	400,000	GloVe	GloVe	AdaGrad	10+10	link
Wikipedia+Gigaword 5	100	Wikipedia+Gigaword 5 (6B)	400,000	GloVe	GloVe	AdaGrad	10+10	link
Wikipedia+Gigaword 5	200	Wikipedia+Gigaword 5 (6B)	400,000	GloVe	GloVe	AdaGrad	10+10	link
Wikipedia+Gigaword 5	300	Wikipedia+Gigaword 5 (6B)	400,000	GloVe	GloVe	AdaGrad	10+10	link
Common Crawl 42B	300	Common Crawl (42B)	1.9M	GloVe	GloVe	GloVe	AdaGrad	link
Common Crawl 840B	300	Common Crawl (840B)	2.2M	GloVe	GloVe	GloVe	AdaGrad	link
Twitter (2B Tweets)	25	Twitter (27B)	?	GloVe	GloVe	GloVe	AdaGrad	link
Twitter (2B Tweets)	50	Twitter (27B)	?	GloVe	GloVe	GloVe	AdaGrad	link
Twitter (2B Tweets)	100	Twitter (27B)	?	GloVe	GloVe	GloVe	AdaGrad	link
Twitter (2B Tweets)	200	Twitter (27B)	?	GloVe	GloVe	GloVe	AdaGrad	link
Wikipedia dependency	300	Wikipedia (?)	174,015	Levy & Goldberg	word2vec modified	word2vec	syntactic dependencies	link
DBPedia vectors (wiki2vec)	1000	Wikipedia (?)	?	Idio	word2vec	word2vec, skip-gram	BoW, 10	link
60 Wikipedia embeddings with 4 kinds of context	25,50,100,250,500	Wikipedia	varies	Li, Liu et al.	Skip-Gram, CBOW, GloVe	original and modified	2	link
German Wikipedia+News	300	Wikipedia + Statmt News 2013 (1.1B)	608.130	Andreas Müller	word2vec	Skip-Gram	5	link

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.gitignore		.gitignore
README.md		README.md
clean-word2vec-text-format.py		clean-word2vec-text-format.py
requirements.txt		requirements.txt
word2vec-api.py		word2vec-api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

clean-word2vec-text-format.py

clean-word2vec-text-format.py

requirements.txt

requirements.txt

word2vec-api.py

word2vec-api.py

Repository files navigation

word2vec-api

Where to get a pretrained model

About

Releases

Packages

Contributors 11

Languages

3Top/word2vec-api

Folders and files

Latest commit

History

Repository files navigation

word2vec-api

Where to get a pretrained model

About

Resources

Stars

Watchers

Forks

Languages