Some questions about your code #7

hetranger · 2021-08-15T06:18:12Z

Hi, thank you for your work. But I've noticed some inconsistencies with your paper:

It seems like the features extracted by the self-attention module are fed into the UNet decoder directly instead of the cross-attention module.
Proxy embedding is added to each input of the cross-attention layer, which seems not mentioned in the paper.

But maybe I misunderstood something? Look forward to your reply.

JiYuanFeng · 2021-08-15T06:51:50Z

Hi,
For the first problem, as shown in Figure 3(b), the features extracted by the SA module serve as the key summation of the CA module, and the gradient of the auxiliary loss does not affect the decoder part, so we use it as an auxiliary header for better integration into the modular framework.
The second item, also shown in Figure 3(b), is that the proxy embedding is initialized at the beginning and reused iteratively in subsequent CA layers. Thank you.

hetranger · 2021-08-15T07:05:48Z

But Figure 2 shows that the input of the decoder part should be the output of the CA module?

JiYuanFeng · 2021-08-15T07:12:48Z

Hi, if the CA still preserve the features from SA, it will have two outputs, one for Decoder and one for Aux head. Anyway, they are equal.

hetranger · 2021-08-15T07:35:39Z

Hi, in your code, I don't see the CA output for Decoder, only for Aux head. And the CA module is not used at test time.

JiYuanFeng · 2021-08-15T07:42:40Z

Hi, kindly mentioned again, there are equal implementation and gain equal performance, we object it as an aux head just for better design for the overall framework. You can stack them in series if need, meanwhile, as an aux head, it will not cause any computation during testing.

hetranger · 2021-08-15T08:26:08Z

My apologies. Sec. 3.2 mentions that:

Finally, the encoded tokens T^{K_s} is fold back to 2D features and append the uninvolved features to form the pyramid features { X_0, X_1, X_2′, X_3′, X_4′}.

Thanks for your time.

JiYuanFeng · 2021-08-15T08:33:46Z

Hi, the encoded tokens T^{K_s} is the output of SA modules, except as the input of decoder head, it's also the input of CA module (see Figure 3(b)).

hetranger · 2021-08-15T08:37:06Z

Yes, I missed that sentence and misunderstood Figure 2. Sorry again.

JiYuanFeng closed this as completed Aug 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about your code #7

Some questions about your code #7

hetranger commented Aug 15, 2021 •

edited

JiYuanFeng commented Aug 15, 2021

hetranger commented Aug 15, 2021

JiYuanFeng commented Aug 15, 2021

hetranger commented Aug 15, 2021 •

edited

JiYuanFeng commented Aug 15, 2021

hetranger commented Aug 15, 2021 •

edited

JiYuanFeng commented Aug 15, 2021

hetranger commented Aug 15, 2021

Some questions about your code #7

Some questions about your code #7

Comments

hetranger commented Aug 15, 2021 • edited

JiYuanFeng commented Aug 15, 2021

hetranger commented Aug 15, 2021

JiYuanFeng commented Aug 15, 2021

hetranger commented Aug 15, 2021 • edited

JiYuanFeng commented Aug 15, 2021

hetranger commented Aug 15, 2021 • edited

JiYuanFeng commented Aug 15, 2021

hetranger commented Aug 15, 2021

hetranger commented Aug 15, 2021 •

edited

hetranger commented Aug 15, 2021 •

edited

hetranger commented Aug 15, 2021 •

edited