Code for the VizWiz API and evaluation of generated captions.
- python 3
- java 1.8.0 (for caption evaluation)
./
demo_vizwiz_caption_evaluation.ipynb
(tutorial notebook)
./vizwiz_api
vizwiz.py
: This file contains theVizWiz
API class that can be used to load VizWiz dataset JSON files and analyze them.
./annotations
train.json
(VizWiz-Captions training set)val.json
(VizWiz-Captions validation set)- Dataset shares the same data format as MS COCO.
./results
fake_caption_val.json
(an example of fake results for running demo)- Dataset shares the same data format as MS COCO.
./vizwiz_eval_cap
: The folder where all caption evaluation codes are stored.
evals.py
: This file includes theVizWizEvalCap
class that can be used to evaluate results on VizWiz.tokenizer
: Python wrapper of Stanford CoreNLP PTBTokenizerbleu
: Bleu evalutation codesrouge
: Rouge-L evaluation codescider
: CIDEr evaluation codesspice
: SPICE evaluation codes
- The primary VizWiz API is standalone.
- Download annotation files.
- For caption evaluation, you will first need to download the Stanford CoreNLP 3.6.0 code and models for use by SPICE. To do this, run:
./get_stanford_models.sh
- To run shell scripts in Windows, you can setup Windows Subsystem for Linux.
- The command for Windows will then be
bash get_stanford_models.sh
- Note: SPICE will try to create a cache of parsed sentences in ./vizwiz_eval_cap/spice/cache/. This dramatically speeds up repeated evaluations. The cache directory can be moved by setting 'CACHE_DIR' in ./vizwiz_eval_cap/spice. In the same file, caching can be turned off by removing the '-cache' argument to 'spice_cmd'.
- VizWiz Project
- PTBTokenizer: We use the Stanford Tokenizer which is included in Stanford CoreNLP 3.4.1.
- BLEU: BLEU: a Method for Automatic Evaluation of Machine Translation
- Meteor: Project page with related publications. We use the latest version (1.5) of the Code. Changes have been made to the source code to properly aggreate the statistics for the entire corpus.
- Rouge-L: ROUGE: A Package for Automatic Evaluation of Summaries
- CIDEr: CIDEr: Consensus-based Image Description Evaluation
- SPICE: SPICE: Semantic Propositional Image Caption Evaluation
This work is closely adapted from MS COCO API and MS COCO Caption Evaluation API.
- Xinlei Chen (CMU)
- Hao Fang (University of Washington)
- Tsung-Yi Lin (Cornell)
- Ramakrishna Vedantam (Virgina Tech)
- David Chiang (University of Norte Dame)
- Michael Denkowski (CMU)
- Alexander Rush (Harvard University)