[preprint][intro][demo:Youtube,Bilibili]
This repo contains the proposed dataset ReS in our paper "Repositioning The Subject Within Image" .
Subject repositioning aims to relocate a user-specified subject within a single image. Our proposed SEELE effectively addresses the generative sub-tasks within a unified prompt-guided inpainting task, all powered by a single diffusion generative model.
We curated a benchmark dataset called ReS. This dataset includes 100 paired images, featuring a repositioned subject while the other elements remain constant. These images were collected from over 20 indoor and outdoor scenes, showcasing subjects from more than 50 categories. This variety enables effective simulation of real-world open-vocabulary applications.
The Res Dataset is available at Google Drive, Baidu Netdisk.
Unzip the file, and you will get a folder including:
pi_1.jpg # The first view of the scene i
pi_2.jpg # The second view of the scene i
pi_1_mask.png # The visiable mask of subject in the first view
pi_1_amodal.png # The full mask of subject in the first view
pi_2_mask.png # The visiable mask of subject in the second view
pi_2_amodal.png # The full mask of subject in the second view
The images were taken using two different mobile devices. Some are sized 1702x1276, while others are 4032x3024. Each pair has the same resolution.
The masks corresponding to these images are annotated based on SAM, with a maximum length of 1024.
We provide an example script Res.py for loading the ReS dataset.
In the script, we define a class ReS that is initialized with:
res = ReS(root_dir, img_size, load_square)
The first parameter is the folder path, the img_size
is the minimum side length you want. If you set load_square
to true, the images will be resized as square images.
Paired images represent two tasks in this context, with each task starting from one side. If an image is occluded, we only use it as the source image.
The __getitem__
function processes a specific task and outputs a dict with
'image': the source image
'mask': the remove mask of the subject in the source location
'gt': the target image
'amodal': the complete mask of the subject in the target location
'size': resolution of the image
'masked_image': masked image
We assume the results are inputed to the SD. Please adjust the function as needed for your convenience.
The data are intended for research purposes to advance the progess of subject repositioning.
Due to the perspective shift, the size and the view of the subject after repositioning will change. We don't provide annotations for this, so using the target image directly for quantitative analysis may not be accurate.
If you found the provided dataset useful, please cite our work.
@article{wang2024repositioning,
title={Repositioning the Subject within Image},
author={Wang, Yikai and Cao, Chenjie and Dong, Qiaole and Li, Yifan and Fu, Yanwei},
journal={arXiv preprint arXiv:2401.16861},
year={2024}
}