HINT

HINT: High-quality INpainting Transformer with Enhanced Attention and Mask-aware Encoding

Existing image inpainting methods leverage convolution-based downsampling approaches to reduce spatial dimensions. This may result in information loss from corrupted images where the available information is inherently sparse, especially for the scenario of large missing regions. Recent advances in self-attention mechanisms within transformers have led to significant improvements in many computer vision tasks including inpainting. However, limited by the computational costs, existing methods cannot fully exploit the efficacy of long-range modelling capabilities of such models. In this paper, we propose an end-to-end High-quality INpainting Transformer, abbreviated as HINT, which consists of a novel mask-aware pixel-shuffle downsampling module (MPD) to preserve the visible information extracted from the corrupted image while maintaining the integrity of the information available for high-level inferences made within the model. Moreover, we propose a Spatially-activated Channel Attention Layer (SCAL), an efficient self-attention mechanism interpreting spatial awareness to model the corrupted image at multiple scales. To further enhance the effectiveness of SCAL, motivated by recent advanced in speech recognition, we introduce a sandwich structure that places feed-forward networks before and after the SCAL module. We demonstrate the superior performance of HINT compared to contemporary state-of-the-art models on four datasets, CelebA, CelebA-HQ, Places2, and Dunhuang.

This paper is accepted by IEEE Transactions on Multimedia (TMM)

Paper Download:HINT: High-quality INpainting Transformer with Enhanced Attention and Mask-aware Encoding

Overview

Mask-aware Pixel-shuffle Downsampling module (MPD)

News

Training Code
Pre-trained Models
Demo Video (coming soon)

Dataset

For the full CelebA-HQ dataset, please refer to http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

For the full Places2 dataset, please refer to http://places2.csail.mit.edu/download.html

For the irrgular mask dataset, please refer to http://masc.cs.gmu.edu/wiki/partialconv

Please use script/flist.py to create .flist file for training and testing.

Initialization

Clone this repo:

git clone https://github.com/ChrisChen1023/HINT
cd HINT-main

Python >=3.7

pytorch

Pre-trained model

We released the pre-trained model Google Drive

For each pretrained model:

Getting Started

Download the pre-trained model to ./checkpoints

Set your own config.yml with the corresponding filst paths, and copy it to corresponding checkpoint folder. Set the --MAKS 3 for the mixed mask index,

run:

python train.py

For testing, in config.yml, set the --MAKS 6 for the fixed mask index, then run:

python test.py

Citation

If you find this work helpful, please cite us.

@ARTICLE{10458430,
  author={Chen, Shuang and Atapour-Abarghouei, Amir and Shum, Hubert P. H.},
  journal={IEEE Transactions on Multimedia}, 
  title={HINT: High-quality INpainting Transformer with Mask-Aware Encoding and Enhanced Attention}, 
  year={2024},
  volume={},
  number={},
  pages={1-12},
  keywords={Transformers;Feature extraction;Image reconstruction;Computational modeling;Task analysis;Data mining;Computer vision;Image Inpainting;Transformer;Representation Learning},
  doi={10.1109/TMM.2024.3369897}}

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
checkpoints		checkpoints
script		script
src		src
.gitignore		.gitignore
LICENSE		LICENSE
MPD.png		MPD.png
README.md		README.md
SCAL.png		SCAL.png
main.py		main.py
overview.png		overview.png
test.py		test.py
train.py		train.py

License

ChrisChen1023/HINT

Folders and files

Latest commit

History

Repository files navigation

HINT

HINT: High-quality INpainting Transformer with Enhanced Attention and Mask-aware Encoding

Overview

Mask-aware Pixel-shuffle Downsampling module (MPD)

News

Dataset

Initialization

Pre-trained model

Getting Started

About

Resources

License

Stars

Watchers

Forks

Languages