You can find our paper "Here"
Image inpainting has been researched for years. From deeper and larger models to models that focus on global information, all of them aim to obtain results closer to reality. In this paper, we combine the stripe window and line-by-line feature shift to modify the Vision Transformer (ViT) to reduce the computation cost and obtain global information from the oblique attention. In addition, we design a new loss function to enhance the texture and colors for inpainting. At last, to validate the efficacy of our proposed model, we conduct extensive experiments on commonly seen datasets (Places2 and CelebA) compared with other state-of-the-art methods.
- Python 3.7.0
- pytorch
- opencv
- PIL
- colorama
or see the requirements.txt
Edit txt/xxx.txt (set path in config)
data_path = './txt/train_path.txt'
mask_path = './txt/train_mask_path.txt'
val_path = './txt/val_path.txt'
val_mask_path = ./txt/val_mask_path.txt'
test_path: './txt/test_path.txt'
test_mask_1_60_path: ./txt/test_mask_1+10.txt'
txt example
E:/Places2/data_256/00000001.jpg
E:/Places2/data_256/00000002.jpg
E:/Places2/data_256/00000003.jpg
E:/Places2/data_256/00000004.jpg
E:/Places2/data_256/00000005.jpg
⋮
In this implementation, masks are automatically generated by ourself. stroke masks mixed randomly to generate proportion from 1% to 60%.
strokes (from left to right 20%-30% 30%-40% 40%-50% 50%-60%)
python train.py (main setting data_path/mask_path/val_path/val_mask_path/batch_size/train_epoch)
- set the config path ('./config/model_config.yml')
- Set path and parameter details in model_config.yml
Note: If the training is interrupted and you need to resume training, you can set resume_ckpt and resume_D_ckpt.
python test.py (main setting test_ckpt/test_path/test_mask_1_60_path/save_img_path)
- set the config path ('./config/model_config.yml')
- Set path and parameter details in model_config.yml
- Places2 & CelebA
Quantitative evaluation of inpainting on Places2 and CelebA datasets. We report Peak signal-to-noise ratio (PSNR), structural similarity (SSIM), Learned Perceptual Image Patch Similarity (LPIPS) and Frechet inception distance ´ (FID) metrics. The ▲ denotes more, and ▼ denotes less of the parameters compared to our proposed model. (Bold means the 1st best; Underline means the 2nd best)
All training and testing base on same 3060.
- Places2 & CelebA
Qualitative results of Places2 dataset among all compared models. From left to right: Masked image, CA, PC, RW, DeepFill-v2, Iconv, AOT-GAN, CRFill, TFill, SWMH-Net, and Ours. Zoom-in for details.
Ablation study table of GC, RDC, MSCSWin Transformer, and HSV loss. We report Peak signal-to-noise ratio (PSNR), structural similarity (SSIM), Learned Perceptual Image Patch Similarity (LPIPS) and Frechet inception distance ´ (FID) metrics.
Object removal (size 256×256) results. From left to right: Original image, mask, object removal result.
This repository utilizes the codes of following impressive repositories
@inproceedings{chen2023image,
title={Image Inpainting by Mscswin Transformer Adversarial Autoencoder},
author={Chen, Bo-Wei and Liu, Tsung-Jung and Liu, Kuan-Hsien},
booktitle={2023 IEEE International Conference on Image Processing (ICIP)},
pages={2040--2044},
year={2023},
organization={IEEE} }
If you have any question, feel free to contact wiwi61666166@gmail.com