Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

EXPERT - Learning to Learn Words from Visual Scenes

Code from the paper Learning to Learn Words from Visual Scenes

Website of the project in

If you use the code, please cite the paper as:

  author = {D. Surís and D. Epstein and H. Ji and S. Chang and C. Vondrick},
  title = {Learning to Learn Words from Visual Scenes},
  journal = {European Conference on Computer Vision (ECCV)},
  year = {2020}

An example of command line execution can be found in scripts/ To reproduce the numbers from the paper, please use the released pretrained models, and the scripts/test_*.sh scripts.

Run python --help for information on arguments.

Be sure to have the external libraries in requirements.txt installed.


We work with the Epic Kitchens dataset for this project. To run our code, you will need to download their images and annotations.

Specifically, the annotations directory has to contain:

The path to this directory has to be introduced in the --annotation_root argument. A compressed .tar.gz file with these four files can be downloaded here.

The images directory has to be specified using --img_root. It contains all the images with the following subfolder structure: path_to_img_root/participant_id/vid_id/frame_{frame_id:010d}.jpg. This is the default structure if you download the data from the Epic Kitchens website (download from here). For this project we only use the RGB images, not flow information.

Pretrained models

The pretrained models reported in our paper can be found in the following links:

Each one of these is a .tar.gz file containing the files necessary to load the model (checkpoint_best.pth, config.json and tokenizer.pth).

To resume training or to test from one of these pretrained models, set the --resume to True. Extract the models under the /path/to/your/checkpoints directory you introduce in the --checkpoint_dir argument. Refer to the specific model using the --resume_name argument.


Code for Learning to Learn Language from Narrated Video







No releases published


No packages published