MCD-Net: Towards RGB-D Video Inpainting in Real-World Scenes

Towards the goal of RGB-D video inpainting in real-world scenes, we make two major contributions including a new model and a new dataset.

1. MCD-Net: using color and depth to mutually and implicitly inpaint each other
We integrate our proposed Depth Completion Network (i.e., the JSCA and SRTA), Depth-Activated Transformer and Fusion Network into one framework for joint color and depth inpainting in RGB-D videos to achieve SOTA accuracy and runtime.

2. VID Dataset: real RGB-D videos with dense annotated masks
We propose the first RGB-D video inpainting dataset (VID) with authentic RGB-D data and elaborately-made masks to support RGB-D video inpainting. A part of videos and masks in our VID dataset are placed at Baidu or Google Drive.
We manually refine the object masks automatically generated by Track-Anything. The row 1 and 3 display raw masks. Raw 2 and 4 display our manually-corrected masks to ensure visually-pleasing object removal. We hope our VID dataset with our prepared object masks and occlusion masks can provide a more comprehensive, accurate and close-to-practice evaluation for RGB-D video inpainting.

📰 Abstract

Video inpainting gains an increasing amount of attention ascribed to its wide applications in intelligent video editing. However, despite tremendous progress made in RGB video inpainting, the existing RGB-D video inpainting models are still incompetent to inpaint real-world RGB-D videos, as they simply fuse color and depth via explicit feature concatenation, neglecting the natural modality gap between color and depth. Moreover, current RGB-D video inpainting datasets are synthesized with homogeneous and delusive RGB-D data, which is far from real-world application and cannot provide comprehensive evaluation. To alleviate these problems and achieve real-world RGB-D video inpainting, on one hand, we propose a Mutually-guided Color and Depth Inpainting Network (MCD-Net), where color and depth are reciprocally leveraged to inpaint each other implicitly, mitigating the modal gap and fully exploiting cross-modal association for inpainting. On the other hand, we build a Video Inpainting with Depth (VID) dataset to supply diverse and authentic RGB-D video data with various object annotation masks to enable comprehensive evaluation for RGB-D video inpainting under real-world scenes. Experimental results on the DynaFill benchmark and our collected VID dataset demonstrate our MCD-Net not only yields the state-of-the-art quantitative performance but successfully achieves high-quality RGB-D video inpainting under real-world scenes.

⭐ A video example of in-the-wild RGB-D video inpainting

We feed an in-the-wild video (captured in SUSTech) to our model, and the model can be well-generalized to that, making flawless inpainted results.

🌟 More video examples

We deliver several real-world video examples inpainted by our MCD-Net here.

👏 Acknowledgement

We acknowledge Track-Anything, DynaFill, E2FGVI, Fuseformer, STTN, OPN, FGVC, CAP, DSTT, SPGAN and CompletionFormer for their awesome works and the spirit of open source !

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
demo		demo
figs		figs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo

demo

figs

figs

README.md

README.md

Repository files navigation

MCD-Net: Towards RGB-D Video Inpainting in Real-World Scenes

📰 Abstract

⭐ A video example of in-the-wild RGB-D video inpainting

🌟 More video examples

👏 Acknowledgement

About

Releases

Packages

Contributors 2

JCATCV/MCD-Net

Folders and files

Latest commit

History

Repository files navigation

MCD-Net: Towards RGB-D Video Inpainting in Real-World Scenes

📰 Abstract

⭐ A video example of in-the-wild RGB-D video inpainting

🌟 More video examples

👏 Acknowledgement

About

Resources

Stars

Watchers

Forks