Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on network architecture #6

Closed
codeslake opened this issue Sep 27, 2021 · 4 comments
Closed

Question on network architecture #6

codeslake opened this issue Sep 27, 2021 · 4 comments

Comments

@codeslake
Copy link

codeslake commented Sep 27, 2021

Hi,

I've gone through the code and have some questions about the network architecture described in the paper and in the code.

  1. In Figure 2 in the main paper, the network has 4 Aligned Attention(aa) modules, but the code has only 3. Is there a performance decrease when you use 4 aa modules?

  2. For the aa modules defined in dcsr.py, the scale and align arguments are different based on self.flag_8k.

    • Is it appropriate to assume that the scale values are higher when self.flag_8k==True for better feature matching between features of higher resolution?
    • for an aa2 module, why does it get align==False? I can see that alignment is not necessary when scale==1, as patches become 1*1 tensors. However, for aa2 when self.flag_8k==True, why is align set to False?
  3. Would you please elaborate on the intuition of having coarse==True? I have not detailly checked the patch coordinates used for the evaluation, but I assume coarse is set to True when the LR patch is outside of FOV of ref patch. When coarse==True, the DCSR model downsamples the LR and ref images with factors of 1/16 and 1/8 respectively. Is this for roughly matching the structure as those patches might not share a common context?

Thanks in advance,

@Tengfei-Wang
Copy link
Owner

Hi,

  1. The number of attention modules depends on the resolution factor in our experiments. Fig.2 demonstrates the general architecture of the approach. Specifically, it shows the 4X SR network (also see customized dataset #2 ). You may increase it to 5 for 8X SR and decrease it to 3 for 2X SR (the released code).
    • Yes. To achieve a compelling matching on 8k cases, it is intuitive to increase the kernel size for a proper receptive field.
    • In SRA, we need to load the pre-trained model in 4k cases, where `align=false in aa2'. We thus simply set it to false in 8k cases for consistent loading. I suppose it also works if we set it to True for 8k cases.

@codeslake
Copy link
Author

Thanks for the quick answer!
I've added 3rd question in the original comment. Would you please leave the comment for that one too?

@Tengfei-Wang
Copy link
Owner

Thanks for the quick answer!
I've added 3rd question in the original comment. Would you please leave the comment for that one too?

Yes, you're correct. For patches within the overlapped FoV (near the central region), we perform feature matching locally in a neighboring region. For other patches (outside the overlapped region), we search the whole ref image for reference information. To improve the searching efficiency, we first coarsely find a candidate region, and then see it as the reference patch.

@codeslake
Copy link
Author

Thanks, Wang. The answers really helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants