SGC-Net is a scene-graph driven image editing method. This repository contains the implementation of our paper. If you find this code useful in your research, please consider citing:
@InProceedings{ZhangSGCNet2023,
author={Zhongping Zhang and Huiwen He and Bryan A. Plummer and Zhenyu Liao and Huayan Wang},
title={Complex Scene Image Editing by Scene Graph Comprehension},
booktitle={British Machine Vision Conference (BMVC)},
year={2023}}
The first stage of SGC-Net is trained based on PyTorch 1.12.1 with Cuda 11.3.
The second stage of SGC-Net is mainly based on ControlNet. Please follow the instruction of ControlNet to set up the environment.
We performed our experiments on two public datasets, CLEVR-SIMSG and Visual Genome. We provided the versions we employed for model training and evaluation through the following links.
Datasets | Google Drive Link |
---|---|
CLEVR-SIMSG | CLEVR-SIMSG Link |
Visual Genome | Visual Genome Link |
If you would like to obtain the original data, please consider collect the data from their official websites: CLEVR-SIMSG & Visual Genome
Note: In this repo, we mainly provide the code of the RoI Prediction module. The region-based image editing module is largely employed in ControlNet environment. We haven't integrated the second stage in this repo.
Run the following script to train and evaluate RoI Prediction:
python triples2roi/train_clevr_triples2roi.py
Run the following script to generated predicted bounding boxes:
python triples2roi/generate_clevr_target_box.py
Run the following script to generate masked images, which will be provided as input to the region-based image editing module:
python image_editing_CLEVR.py
This code is partially based on the SIMSG repository.
The early versions of our model relied on Mask-RCNN pre-trained on CLEVR.