This repository contains the official implementation of the following paper:
SRFormer: Permuted Self-Attention for Single Image Super-Resolution
Yupeng Zhou 1, Zhen Li 1, Chun-Le Guo 1, Song Bai 2, Ming-Ming Cheng 1, Qibin Hou 1
1TMCC, School of Computer Science, Nankai University
2ByteDance, Singapore
In ICCV 2023
[Paper] [Code] [Pretrained Model] [Visual Results] [Demo]
SRFormer is a new image SR backbone with SOTA performance. The core of SRFormer is PSA, a simple, efficient and effective attention mechanism, allowing to build large range pairwise correlations with even less computational burden than original WSA of SwinIR. SRFormer (ICCV open access link) achieves state-of-the-art performance in
- classical image SR
- lightweight image SR
- real-world image SR
The table below are performance comparison with SwinIR under same training strategy on DIV2K dataset (X2 SR), SRFormer greatly outperform SwinIR with less Paramaters(10.40M vs 11.75M) and Flops(2741G vs 2868G), More results can be found here.
model | Set5 | Set14 | B100 | Urban100 | Manga109 |
---|---|---|---|---|---|
SwinIR | 38.35 | 34.14 | 32.44 | 33.40 | 39.60 |
SRFormer(ours) | 38.45 | 34.21 | 32.51 | 33.86 | 39.69 |
Abstract: In this paper, we introduce SRFormer, a simple yet effective Transformer-based model for single image super-resolution. We rethink the design of the popular shifted window self-attention, expose and analyze several characteristic issues of it, and present permuted self-attention (PSA). PSA strikes an appropriate balance between the channel and spatial information for self-attention, allowing each Transformer block to build pairwise correlations within large windows with even less computational burden. Our permuted self-attention is simple and can be easily applied to existing super-resolution networks based on Transformers. Without any bells and whistles, we show that our SRFormer achieves a 33.86dB PSNR score on the Urban100 dataset, which is 0.46dB higher than that of SwinIR but uses fewer parameters and computations. We hope our simple and effective approach can serve as a useful tool for future research in super-resolution model design. Our code is publicly available at https://github.com/HVision-NKU/SRFormer.
You can apply PSA with just a few lines of code, significantly reducing computational complexity. We omit head_number, relative position encoding for simplicity, you can visit here to view more detailed code.
## Original MSA in SwinIR:
## qkv = self.qkv(x).reshape(B_, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
# PSA compress the channel dimension of KV :(num_windows*b, n//4, c):
kv = self.kv(x).reshape(b_,self.permuted_window_size[0],2,self.permuted_window_size[1],2,2,c//4).permute(0,1,3,5,2,4,6).reshape(b_, n//4, 2,-1).permute(2, 0, 1, 3)
# PSA keep the channel dimension of Q: (num_windows*b, n, c)
q = self.q(x).reshape(b_, n,-1).permute(2, 0, 1)
attn = (q @ k.transpose(-2, -1)) # (num_windows*b, num_heads, n, n//4)
x = (attn @ v).transpose(1, 2).reshape(b_, n, c) # (num_windows*b, n, c)
x = self.proj(x)
- Installation & Dataset
- Training
- Testing
- Upscaling your own pictutres
- Results
- Pretrain Models
- Citations
- License
- Acknowledgement
- python 3.8
- pyTorch >= 1.7.0
cd SRFormer
pip install -r requirements.txt
python setup.py develop
We use the same training and testing sets as SwinIR, the following datasets need to be downloaded for training.
Task | Training Set | Testing Set |
---|---|---|
classical image SR | DIV2K (800 training images) or DIV2K +Flickr2K (2650 images) | Set5 + Set14 + BSD100 + Urban100 + Manga109 Download all |
lightweight image SR | DIV2K (800 training images) | Set5 + Set14 + BSD100 + Urban100 + Manga109 Download all |
real-world image SR | DIV2K (800 training images) +Flickr2K (2650 images) + OST (10324 images for sky,water,grass,mountain,building,plant,animal) | RealSRSet+5images |
- If you do not use lmdb datasets, you may need to crop the training images into sub_images for reducing I/O times. Please follow here.
- After downloading the test datasets you need, you maybe need to get the downsample LR image, Please follow here.
- Our Code requires the HR image and the corresponding LR image to have the same name. (e.g 001_HR.img and 001_LR.img is invalid. Please rename both of them to 001.img and save them in HR dir and LR dir specified in config.) Use
rename
command of linux can easily do it.
- Please download the dataset corresponding to the task and place them in the folder specified by the training option in folder
/options/train/SRFormer
- Follow the instructions below to train our SRFormer.
Please note: "4" in the following instructions means four GPUs. Please modify it according to your configuration. You are also encouraged to modify the YAML file in "options/train/SRFormer/" to set more training settings.
# train SRFormer for classical SR task
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_SRx2_scratch.yml
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_SRx3_scratch.yml
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_SRx4_scratch.yml
# train SRFormer for lightweight SR task
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_light_SRx2_scratch.yml
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_light_SRx3_scratch.yml
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_light_SRx4_scratch.yml
# test SRFormer for classical SR task
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx2.yml
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx3.yml
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx4.yml
# test SRFormer for lightweight SR task
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx2.yml
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx3.yml
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx4.yml
We provide a script which you can use our pretrained models to upscale your own pictures. We will also release our realworld pretrained models soon.
# use SRFormer for classical SR task
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx2.yml --input_dir {dir of your pictures} --output_dir {dir of output}
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx3.yml --input_dir {dir of your pictures} --output_dir {dir of output}
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx4.yml --input_dir {dir of your pictures} --output_dir {dir of output}
# use SRFormer for lightweight SR task
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx2.yml --input_dir {dir of your pictures} --output_dir {dir of output}
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx3.yml --input_dir {dir of your pictures} --output_dir {dir of output}
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx4.yml --input_dir {dir of your pictures} --output_dir {dir of output}
We provide the results on classical image SR, lightweight image SR, realworld image SR. More results can be found in the paper. The visual results of SRFormer can be found in [Visual Results].
Classical image SR
- Results of Table 4 in the paper
- Results of Figure 4 in the paper
Lightweight image SR
- Results of Table 5 in the paper
- Results of Figure 5 in the paper
Model size comparison
- Results of Table 1 and Table 2 in the Supplementary Material
Realworld image SR
- Results of Figure 8 in the paper
Official pretrain models can be download from google drive.
To reproduce the results in the article, you can download them and put them in the /PretrainModel
folder.
Also, we thank @Phhofm for training a third-party pretrain model, you can visit here to learn more.
You may want to cite:
@article{zhou2023srformer,
title={SRFormer: Permuted Self-Attention for Single Image Super-Resolution},
author={Zhou, Yupeng and Li, Zhen and Guo, Chun-Le and Bai, Song and Cheng, Ming-Ming and Hou, Qibin},
journal={arXiv preprint arXiv:2303.09735},
year={2023}
}
This code is licensed under the Creative Commons Attribution-NonCommercial 4.0 International for non-commercial use only. Please note that any commercial use of this code requires formal permission prior to use.
The codes are based on BasicSR, Swin Transformer, and SwinIR. Please also follow their licenses. Thanks for their awesome works.