Skip to content

clairecyq/whos-waldo

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Who's Waldo? Linking People Across Text and Images

Download links and PyTorch implementation of "Who's Waldo? Linking People Across Text and Images", ICCV 2021.

Who's Waldo? Linking People Across Text and Images

Claire Yuqing Cui*, Apoorv Khandelwal*, Yoav Artzi, Noah Snavely, Hadar Averbuch-Elor ICCV 2021

Project Page | Paper

drawing drawing

Quick Start

1. Request access to the Who's Waldo dataset.

2. Create a new conda environment

conda create --name whos-waldo
conda activate whos-waldo
pip install -r requirements.txt

3. Data preprocessing

Run the following preprocessing scripts in the environment created above. First generate annotations:

python preprocess/generate_annotations.py --output {annotation-output-dir}

Process textual information for each split:

python preprocess/create_txtdb.py --ann {annotation-output-dir} --output {txtdb-name} --split {split}

Process visual information for each split:

python preprocess/create_imgdb.py --output {imgdb-name} --split {split}

Note that you will need to extract features for the images before creating the imgdb. We used this repo for feature extraction. But you may find this PyTorch re-implementation easier to use instead.

4. Set up Docker container

run launch_container.sh with the appropriate paths for each argument.

5. Training

Create a training config file as config/train-whos-waldo.json Inside the container, run

python train.py --config {path to training config}

6. Inference (evaluation and visualizations)

Inside the container, run

python infer.py

with the appropriate arguments which can be found in infer.py.

Datasheet

We provide a datasheet for our dataset here.

License

The images in our dataset are provided by Wikimedia Commons under various free licenses. These licenses permit the use, study, derivation, and redistribution of these images—sometimes with restrictions, e.g. requiring attribution and with copyleft. We provide source links, full license text, and attribution (when available) for all images, make no modifications to any image, and release these images under their original licenses. The associated captions are provided as a part of unstructured text in Wikimedia Commons, with rights to the original writers under the CC BY-SA 3.0 license. We modify these (as specified in our paper) and release such derivatives under the same license. We provide the rest of our dataset (i.e. detections, coreferences, and ground truth correspondences) under a CC BY-NC-SA 4.0 license. We provide our code under an MIT license.

Citation

@InProceedings{Cui_2021_ICCV,
    author    = {Cui, Yuqing and Khandelwal, Apoorv and Artzi, Yoav and Snavely, Noah and Averbuch-Elor, Hadar},
    title     = {Who's Waldo? Linking People Across Text and Images},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {1374-1384}
}

Acknowledgement

Our code is based on the implementation of UNITER.

Releases

No releases published

Packages

No packages published