Skip to content

JCATCV/MCD-Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

MCD-Net: Towards RGB-D Video Inpainting in Real-World Scenes

Towards the goal of RGB-D video inpainting in real-world scenes, we make two major contributions including a new model and a new dataset.

1. MCD-Net: using color and depth to mutually and implicitly inpaint each other model
We integrate our proposed Depth Completion Network (i.e., the JSCA and SRTA), Depth-Activated Transformer and Fusion Network into one framework for joint color and depth inpainting in RGB-D videos to achieve SOTA accuracy and runtime.

2. VID Dataset: real RGB-D videos with dense annotated masks dataset
We propose the first RGB-D video inpainting dataset (VID) with authentic RGB-D data and elaborately-made masks to support RGB-D video inpainting. A part of videos and masks in our VID dataset are placed at Baidu or Google Drive. masks
We manually refine the object masks automatically generated by Track-Anything. The row 1 and 3 display raw masks. Raw 2 and 4 display our manually-corrected masks to ensure visually-pleasing object removal. We hope our VID dataset with our prepared object masks and occlusion masks can provide a more comprehensive, accurate and close-to-practice evaluation for RGB-D video inpainting.

📰 Abstract

Video inpainting gains an increasing amount of attention ascribed to its wide applications in intelligent video editing. However, despite tremendous progress made in RGB video inpainting, the existing RGB-D video inpainting models are still incompetent to inpaint real-world RGB-D videos, as they simply fuse color and depth via explicit feature concatenation, neglecting the natural modality gap between color and depth. Moreover, current RGB-D video inpainting datasets are synthesized with homogeneous and delusive RGB-D data, which is far from real-world application and cannot provide comprehensive evaluation. To alleviate these problems and achieve real-world RGB-D video inpainting, on one hand, we propose a Mutually-guided Color and Depth Inpainting Network (MCD-Net), where color and depth are reciprocally leveraged to inpaint each other implicitly, mitigating the modal gap and fully exploiting cross-modal association for inpainting. On the other hand, we build a Video Inpainting with Depth (VID) dataset to supply diverse and authentic RGB-D video data with various object annotation masks to enable comprehensive evaluation for RGB-D video inpainting under real-world scenes. Experimental results on the DynaFill benchmark and our collected VID dataset demonstrate our MCD-Net not only yields the state-of-the-art quantitative performance but successfully achieves high-quality RGB-D video inpainting under real-world scenes.

⭐ A video example of in-the-wild RGB-D video inpainting

teaser

We feed an in-the-wild video (captured in SUSTech) to our model, and the model can be well-generalized to that, making flawless inpainted results.

🌟 More video examples

We deliver several real-world video examples inpainted by our MCD-Net here.

teaser

teaser

teaser

teaser

👏 Acknowledgement

We acknowledge Track-Anything, DynaFill, E2FGVI, Fuseformer, STTN, OPN, FGVC, CAP, DSTT, SPGAN and CompletionFormer for their awesome works and the spirit of open source !

About

official repository of Mutual-guided Color and Depth Network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published