Code used to generate synthetic scenes for Cut, Paste and Learn paper
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.


This code is used to generate synthetic scenes for the task of instance/object detection. Given images of objects in isolation from multiple views and some background scenes, it generates full scenes with multiple objects and annotations files which can be used to train an object detector. The approach used for generation works welll with region based object detection methods like Faster R-CNN.


  1. OpenCV (pip install opencv-python)
  2. PIL (pip install Pillow)
  3. Poisson Blending (Follow instructions here
  4. PyBlur (pip install pyblur)

To be able to generate scenes this code assumes you have the object masks for all images. There is no pre-requisite on what algorithm is used to generate these masks as for different applications different algorithms might end up doing a good job. However, we recommend Pixel Objectness with Bilinear Pooling to automatically generate these masks. If you want to annotate the image manually we recommend GrabCut algorithms(here, here, here)

Setting up Defaults

The first section in the file contains paths to various files and libraries. Set them up accordingly.

The other defaults refer to different image generating parameters that might be varied to produce scenes with different levels of clutter, occlusion, data augmentation etc.

Running the Script

python [-h] [--selected] [--scale] [--rotation]
                            [--num NUM] [--dontocclude] [--add_distractors]
                            root exp

Create dataset with different augmentations

positional arguments:
  root               The root directory which contains the images and
  exp                The directory where images and annotation lists will be

optional arguments:
  -h, --help         show this help message and exit
  --selected         Keep only selected instances in the test dataset. Default
                     is to keep all instances in the roo directory.
  --scale            Add scale augmentation.Default is to not add scale
  --rotation         Add rotation augmentation.Default is to not add rotation
  --num NUM          Number of times each image will be in dataset
  --dontocclude      Add objects without occlusion. Default is to produce
  --add_distractors  Add distractors objects. Default is to not use

Training an object detector

The code produces all the files required to train an object detector. The format is directly useful for Faster R-CNN but might be adapted for different object detectors too. The different files produced are:

  1. labels.txt - Contains the labels of the objects being trained
  2. annotations/*.xml - Contains annotation files in XML format which contain bounding box annotations for various scenes
  3. images/*.jpg - Contain image files of the synthetic scenes in JPEG format
  4. train.txt - Contains list of synthetic image files and corresponding annotation files

There are tutorials describing how one can adapt Faster R-CNN code to run on a custom dataset like:



The code was used to generate synthetic scenes for the paper Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection.

If you find our code useful in your research, please consider citing:

author = {Dwibedi, Debidatta and Misra, Ishan and Hebert, Martial},
title = {Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}