Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about your code #7

Closed
hetranger opened this issue Aug 15, 2021 · 8 comments
Closed

Some questions about your code #7

hetranger opened this issue Aug 15, 2021 · 8 comments

Comments

@hetranger
Copy link

hetranger commented Aug 15, 2021

Hi, thank you for your work. But I've noticed some inconsistencies with your paper:

  1. It seems like the features extracted by the self-attention module are fed into the UNet decoder directly instead of the cross-attention module.
  2. Proxy embedding is added to each input of the cross-attention layer, which seems not mentioned in the paper.

But maybe I misunderstood something? Look forward to your reply.

@JiYuanFeng
Copy link
Owner

Hi,
For the first problem, as shown in Figure 3(b), the features extracted by the SA module serve as the key summation of the CA module, and the gradient of the auxiliary loss does not affect the decoder part, so we use it as an auxiliary header for better integration into the modular framework.
The second item, also shown in Figure 3(b), is that the proxy embedding is initialized at the beginning and reused iteratively in subsequent CA layers. Thank you.

@hetranger
Copy link
Author

But Figure 2 shows that the input of the decoder part should be the output of the CA module?

@JiYuanFeng
Copy link
Owner

Hi, if the CA still preserve the features from SA, it will have two outputs, one for Decoder and one for Aux head. Anyway, they are equal.

@hetranger
Copy link
Author

hetranger commented Aug 15, 2021

Hi, in your code, I don't see the CA output for Decoder, only for Aux head. And the CA module is not used at test time.

@JiYuanFeng
Copy link
Owner

Hi, kindly mentioned again, there are equal implementation and gain equal performance, we object it as an aux head just for better design for the overall framework. You can stack them in series if need, meanwhile, as an aux head, it will not cause any computation during testing.

@hetranger
Copy link
Author

hetranger commented Aug 15, 2021

My apologies. Sec. 3.2 mentions that:

Finally, the encoded tokens T^{K_s} is fold back to 2D features and append the uninvolved features to form the pyramid features { X_0, X_1, X_2′, X_3′, X_4′}.

Thanks for your time.

@JiYuanFeng
Copy link
Owner

Hi, the encoded tokens T^{K_s} is the output of SA modules, except as the input of decoder head, it's also the input of CA module (see Figure 3(b)).

@hetranger
Copy link
Author

Yes, I missed that sentence and misunderstood Figure 2. Sorry again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants