Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
assets 2x the data!!!!!!! Jan 25, 2017
dat run fasttext on two datasets Jul 27, 2017
doc add nytimes talk Feb 10, 2017
fit run fasttext on two datasets Jul 27, 2017
src run fasttext on two datasets Jul 27, 2017
.gitignore Adding pyc Feb 5, 2017
.gitmodules update preprocessing scripts for scraped data Jul 27, 2017
LICENSE Initial commit Jan 17, 2017
README.md update readme Mar 13, 2017
index.html fix typo Jan 22, 2017
pylintrc add pylintrc Feb 12, 2017



Food vectors. Live demo at https://altosaar.github.io/food2vec/, blog post with more information and pretty plots here: https://jaan.io/food2vec-augmented-cooking-machine-intelligence/


Train a model on the recipes dataset, replicate the results from the blog post:

git clone git@github.com:altosaar/food2vec.git
cd food2vec/src

Visualization & embedding exploration tools

# run t-sne and make the plots for the ingredient embeddings
jupyter notebook ./src/plot_ingredients_recipes.ipynb

Embedding plot.ly plots to host them yourself



Pull requests and all feedback welcome! Please file an issue if you run into problems replicating the results.


  • get more data
  • convert jupyter notebook for plotting into one python script
  • write scripts to figure out the right vocabulary
  • fit a better model (e.g. the exact multi-class regression implemented in this repo at https://github.com/altosaar/food2vec/blob/master/src/food2vec.py) -- if you manage to get better results than the live demo at https://altosaar.github.io/food2vec/ just submit a pull request with the new assets/data/wordVecs.js and I'll happily update it :)
  • compare the above model embeddings to the word2vec_optimized.py embeddings
  • make the UI of the website more user-friendly and mobile-friendly


Thanks to Anthony for open-sourcing a javascript embedding browser -- the one here is heavily based on it.