SketchyScene: Richly-Annotated Scene Sketches. (ECCV 2018)
Switch branches/tags
Nothing to show
Clone or download
MarkMoHR Merge pull request #3 from MarkMoHR/master
Minor modification of the webpage
Latest commit e68e7dc Sep 14, 2018

SketchyScene: Richly-Annotated Scene Sketches.

We contribute the first large-scale dataset of scene sketches, SketchyScene, with the goal of advancing research on sketch understanding at both the object and scene level. The dataset is created through a novel and carefully designed crowdsourcing pipeline, enabling users to efficiently generate large quantities of realistic and diverse scene sketches. SketchyScene contains more than 29,000 scene-level sketches, 7,000+ pairs of scene templates and photos, and 11,000+ object sketches. All objects in the scene sketches have ground-truth semantic and instance masks. The dataset is also highly scalable and extensible, easily allowing augmenting and/or changing scene composition. We demonstrate the potential impact of SketchyScene by training new computational models for semantic segmentation of scene sketches and showing how the new dataset enables several applications including image retrieval, sketch colorization, editing, and captioning, etc. The dataset and code can be found at

Please cite the corresponding paper if you found our datasets or code useful:

SketchyScene: Richly-Annotated Scene Sketches. Changqing Zou, Qian Yu, Ruofei Du, Haoran Mo, Yi-Zhe Song, Tao Xiang, Chengying Gao, Baoquan Chen, Hao Zhang. In Proceedings of European Conference on Computer Vision (ECCV), 2018.


7265 (train 5617 + val 535 + test 1113) Downloading

(Further data will come soon)


USketch is a web-driven tool for crowdsourcing the SketchyScene dataset. It is open-sourced at A live demo is located at

USketch Interface


In preparation for the reference data in USketch, we have developed a custom image crawler to acquire 7,500 cartoon scenes and 9,290 sketchy objects in 45 categories for academic use. Due to the length limitation, we explain the detailed crawling process and source code in the supplementary materials.

Object instance frequency

We selected 45 categories for our dataset, including objects and stuff classes. Specifically, we first considered several common scenes (e.g., garden, farm, dinning room, and park) and extracted 100 objects/stuff classes from them as raw candidates. Then we defined three super-classes, i.e. Weather, Object, and Field (Environment), and assigned the candidates into each super-class. Finally, we selected 45 from them by considering their combinations and commonness in real life.

Instead of asking workers to draw each object, we provided them with plenty of object sketches (each object candidate is also refer to a ``component") as candidates. In order to have enough variations in the object appearance in terms of pose and appearance, we searched and downloaded around 1,500 components for each category.