Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


We are a Machine Learning research team from Sorbonne University. Our goal for this project was to create a cross-modal retrieval system trained on the biggest dataset of cooking recipes. This kind of systems is able to retrieve the corresponding recipe given an image (food selfie), and the corresponding image from the recipe.

It was also the occasion to compare several state-of-the-art metric learning loss functions in a new context. This first analysis gave us some idea on how to improve the generalization of our model. Following this, we wrote two research papers on a new model, called Adamine after Adaptive Mining, that add structure in the retrieval space:



Recipe-to-Image retrieval task

Given a list of ingredients and a sequence of cooking instructions, the goal is to train a statistical model to retrieve the associated image. For each recipe, the top row indicates the top 5 images retrieved by our AdaMine model, and the bottom row, by a strong baseline.

Quick insight about AdaMine

Features embedding

  • The list of ingredients is embedded using a bi-LSTM.
  • The sequence of instructions is embedded using a hierarchical LSTM (a LSTM to embed sentences word-by-word, a second LSTM to embed the outputs of the first one).
  • Both ingredients and instructions representations are concatenated and embedded once again.
  • The image is embedded using a ResNet101.

Metric learning:

  • The cross-modal (texts and images) retrieval space is learned through a joint retrieval and classification loss.
  • Aligning items according to a retrieval task allows capturing the fine-grained semantics of items.
  • Aligning items according to class meta-data (ex: hamburger, pizza, cocktail, ice-cream) allows capturing the high-level semantic information.
  • Both retrieval and classification losses are based on a triplet loss (VSE), which is improved by our proposed Adaptive Mining (AdaMine) strategy for efficient negative sampling.

Negative sampling strategy

  • The classic triplet loss strategy takes all negative samples into account to calculate the error. However, this tends to produce a vanishing gradient.
  • The recent (VSE++) strategy only takes the hard negative sample. It is usually efficient, but does not allow the model to converge on this dataset.
  • Our AdaMine strategy takes into account informative samples only (i.e., non-zero loss). It corresponds to a smooth curriculum learning, starting with the classic strategy and ending with the hard samples, but without the burden of switching between strategies. AdaMine also controls the trade-off between the retrieval and classification losses along the training.


1. Install python 3

We don't provide support for python 2. We advise you to install python 3 with Anaconda. Then, you can create an environment.

conda create --name recipe1m python=3.7
source activate recipe1m

2. Clone & requirements

We use a high level framework to be able to focus on the model instead of boilerplate code.

cd $HOME
git clone
cd recipe1m.bootstrap.pytorch
pip install -r requirements.txt

3. Download dataset

Please, create an account on and agree to the terms of use. This dataset was made for research and not for commercial use.

mkdir data/recip1m
cd data/recip1m
tar -xvf data_lmdb.tar
rm data_lmdb.tar
tar -xzvf recipe1M.tar.gz
rm recipe1M.tar.gz
tar -xzvf text.tar.gz
rm text.tar.gz
cd text

Note: Features extracted from resnet50 are included in data_lmdb.

Quick start

Train a model on the train/val sets

The boostrap/ file load the options contained in a yaml file, create the corresponding experiment directory (in logs/recipe1m) and start the training procedure.

For instance, you can train our best model by running:

python -m -o recipe1m/options/adamine.yaml

Then, several files are going to be created:

  • options.yaml (copy of options)
  • logs.txt (history of print)
  • logs.json (batchs and epochs statistics)
  • view.html (learning curves)
  • ckpt_last_engine.pth.tar (checkpoints of last epoch)
  • ckpt_last_model.pth.tar
  • ckpt_last_optimizer.pth.tar
  • ckpt_best_eval_epoch.metric.recall_at_1_im2recipe_mean_engine.pth.tar (checkpoints of best epoch)
  • ckpt_best_eval_epoch.metric.recall_at_1_im2recipe_mean_model.pth.tar
  • ckpt_best_eval_epoch.metric.recall_at_1_im2recipe_mean_optimizer.pth.tar

Many loss functions are available in the recipe1m/options directory.

Evaluate a model on the test set

At the end of the training procedure, you can evaluate your model on the testing set. In this example, boostrap/ load the options from your experiment directory, resume the best checkpoint on the validation set and start an evaluation on the testing set instead of the validation set while skipping the training set (train_split is empty).

python -m \
-o logs/recipe1m/adamine/options.yaml \
--exp.resume best_eval_epoch.metric.recall_at_1_im2recipe_mean \
--dataset.train_split \
--dataset.eval_split test

Note: by default, the model is evaluated on the 1k setup; more info on the 10k setup here

Available (pretrained) models


Pairwise loss [paper]

python -m -o recipe1m/options/pairwise.yaml

PWC++ (Ours)

Pairwise with positive and negative margins loss

python -m -o recipe1m/options/pairwise_plus.yaml


Triplet loss (VSE) [paper]

python -m -o recipe1m/options/avg_nosem.yaml


Triplet loss with hard negative mining [paper]

python -m -o recipe1m/options/max.yaml

AdaMine_avg (Ours)

Triplet loss with semantic loss

python -m -o recipe1m/options/avg.yaml

AdaMine (Ours)

Triplet loss with semantic loss and adaptive sampling

python -m -o recipe1m/options/adamine.yaml

Features from testing set:

cd logs/recipe1m
tar -xzvf adamine.tar.gz

Lifted structure

Lifted structure loss [paper]

python -m -o recipe1m/options/lifted_struct.yaml



Useful commands

Compare experiments

python -m -d \
logs/recipe1m/adamine \
logs/recipe1m/avg \
-k eval_epoch.metric.recall_at_1_im2recipe_mean max


## eval_epoch.metric.recall_at_1_im2recipe_mean

  Place  Method      Score    Epoch
-------  --------  -------  -------
      1  adamine    0.3827       76
      2  avg        0.3201       51

Use a specific GPU

CUDA_VISIBLE_DEVICES=0 python -m -o options/recipe1m/adamine.yaml

Overwrite an option

The boostrap.pytorch framework makes it easy to overwrite a hyperparameter. In this example, I run an experiment with a non-default learning rate. Thus, I also overwrite the experiment directory path:

python -m -o recipe1m/options/adamine.yaml \ 0.0003 \
--exp.dir logs/recipe1m/adamine_lr,0.0003

Resume training

If a problem occurs, it is easy to resume the last epoch by specifying the options file from the experiment directory while overwritting the exp.resume option (default is None):

python -m -o logs/recipe1m/adamine/options.yaml \
--exp.resume last

Evaluate with the 10k setup

Just as with the 1k setup, we load the best checkpoint. This time we also overwrite some options. The metrics will be displayed on your terminal at the end of the evaluation.

python -m \
-o logs/recipe1m/adamine/options.yaml \
--exp.resume best_eval_epoch.metric.recall_at_1_im2recipe_mean \
--dataset.train_split \
--dataset.eval_split test \
--model.metric.nb_bags 5 \
--model.metric.nb_matchs_per_bag 10000

Note: Metrics can be stored in a json file by adding the --misc.logs_name eval,test10k option. It will create a logs_eval,test10k.json in your experiment directory.



Extract your own image features



  title={Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings},
  author={Carvalho, Micael and Cad{\`e}ne, R{\'e}mi and Picard, David and Soulier, Laure and Thome, Nicolas and Cord, Matthieu},
  booktitle={The ACM conference on Research and Development in Information Retrieval (SIGIR)},


Special thanks to the authors of im2recipe who developped Recip1M, the dataset used in this research project.


Retrieve recipes from foodie pictures using Deep Learning and Pytorch




No releases published


No packages published
You can’t perform that action at this time.