STAIR captions: large-scale Japanese image caption dataset
Switch branches/tags
Nothing to show
Clone or download
Latest commit 6ac656e Jul 4, 2018
Failed to load latest commit information. Update Jul 4, 2018
stair_captions_v1.2.tar.gz Clean captions Jan 26, 2018

STAIR Captions

We developed a large-scale Japanese image caption dataset, named STAIR Captions. STAIR Captions website is .

Annotation Format

STAIR Captions dataset is provided as JSON files. The annotation format of STAIR Captions follows the one of MS-COCO:

  "id"                : int,
  "image_id"          : int,
  "caption"           : str,
  "tokenized_caption" : str,

For the details of the annotation format, please see MS-COCO download page.


  • Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi, ``STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset,'' Annual Meeting of the Association for Computational Linguistics (ACL), Short Paper, 2017. [arXiv]
  • 吉川友也, 重藤優太郎, 竹内彰一, ``STAIR Captions: 大規模日本語画像キャプションデータセット'', 言語処理学会第23回年次大会 (NLP2017), 2017. (In Japanese) [PDF]


Creative Commons Attribution 4.0 License.