Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

AutoSweep: Recovering 3D Editable Objects from a Single Photograph

ProjectPage | Paper | Video | Dataset

Xin Chen, Yuwei Li, Xi Luo, Tianjia Shao, Jingyi Yu, Kun Zhou, Youyi Zheng.

This repository contains the official implementation for the paper: AutoSweep: Recovering 3D Editable Objects from a Single Photograph (TVCG 2018). Our work is capable of automatically generating 3D models from a single photograph which can then be used for editing and rearranging.


This paper presents a fully automatic framework for extracting editable 3D objects directly from a single photograph. Unlike previous methods which recover either depth maps, point clouds, or mesh surfaces, we aim to recover 3D objects with semantic parts and can be directly edited. We base our work on the assumption that most human-made objects are constituted by parts and these parts can be well represented by generalized primitives. Our work makes an attempt towards recovering two types of primitive-shaped objects, namely, generalized cuboids and generalized cylinders. To this end, we build a novel instance-aware segmentation network for accurate part separation. Our GeoNet outputs a set of smooth part-level masks labeled as profiles and bodies. Then in a key stage, we simultaneously identify profile-body relations and recover 3D parts by sweeping the recognized profile along their body contour and jointly optimize the geometry to align with the recovered masks. Qualitative and quantitative experiments show that our algorithm can recover high quality 3D models and outperforms existing methods in both instance segmentation and 3D reconstruction.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

All material is made available under Creative Commons BY-NC-SA 4.0 license. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicating any changes that you've made.

Quick Start

This is root folder for all parts of AutoSweep. There are several modules in the below:

  • AutoSweep
    • FCIS: Modified fcis to be suit for AutoSweep dataset.
    • MaskRCNN: Modified maskrcnn to be suit for AutoSweep dataset.
    • DCN: Code from Deformable Deconvolutional Network.
    • AxisClassifier: The property of curve or straight is estimated by this classifier.
    • CircleReconstruction: C++ version of Profile fitting.
    • GeonetEnd2End: A attempt for connectting FCIS and DCN.
    • OneKeyGeonet: The framework based on Matlab, for the experiments and the demo.
    • AnnatationTool: Our annatiation tool for AutoSweep dataset.
  • Unity Part:
    • ObjectSnap: Modeling Part of AutoSweep Project based on Unity3D.
  • AutoSweepMatl
    • FasterRCNN: The referred segementaion tool. We finally select MaskRCNN, the instances segmentation.
    • Deeplab: The referred segementaion tool. Not suit for instance task.
    • PointSetGeneration:Code for "A Point Set Generation Network for 3D Object Reconstruction from a Single Image".
    • 3D-R2N2: Single/multi view image(s) to voxel reconstruction using a recurrent neural network.

Data Preparation


You can download our dataset (2.8 GB) from GoogleDrive or Onedrive.


Our dataset includes 11657 images with cubes and cylinders. The real dataset contains about 6000 unannotated images from ImageNet, 774 annotated images from Xiao et al., and 4883 images collected from the Internet. This dataset is further separated into 8183 training images and 3474 testing images. We use color to encode the instance and label information.

Red channel: {10,20,...} represents {instance 1, instance 2,...}.
Blue channel: zero represents body, nonzero represents top face.
Red channel: 150 represents grip, 200 represents cylinder, 255 represents cube.

Example is like below:

Label Color instance ID
Cylinder - top face (10, 10, 200) 1
Cylinder - top face (20, 20, 200) 2
Cylinder - body (10, 0, 200) 1
Cube - top face (10, 10, 255) 1
Cube - body (10, 0, 255) 1
Grip (10, 0, 150) 1


The code consists of two modules, as mentioned in our paper, the learning module (image to mask) and the graphics module (mask to 3d mesh). The first module follows the framework of FCIS and Mask RCNN. A common learning framework with Python. The second module is built based on Unity3D and our own framework. The purpose of the second module is to sweep the profiles with a dynamic demo.

If you have any questions, feel free to ask ( Please refer to the code scripts for second module:



If you find our code or paper useful, please consider citing:

  title={AutoSweep: Recovering 3D Editable Objects from a Single Photograph},
  author={Xin, Chen and Li, Yuwei and Luo, Xi and Shao, Tianjia and Yu, Jingyi and Zhou, Kun and Zheng, Youyi},
  journal={IEEE transactions on visualization and computer graphics},

Relevant Works

3-Sweep: Extracting Editable Objects from a Single Photo (SIGGRAPH ASIA 2013)
Tao Chen, Zhe Zhu, Ariel Shamir, Shi-Min Hu, Daniel Cohen-Or

Mask R-CNN (ICCV 2017)
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick

TightCap: 3D Human Shape Capture with Clothing Tightness Field (TOG 2021)
Xin Chen, Anqi Pang, Peihao Wang, Yang Wei, Lan Xui, Jingyi Yu


No releases published


No packages published