How does the attention loss work #8

zeabin · 2022-01-09T14:22:22Z

Hi, thanks for sharing the code.

I notice that detach() is called before backward() for attention loss in train_step and the back propagation should not go through attention loss. So how can the attention loss work?

NAD/main.py

Lines 30 to 34 in d61e4d7

    
           cls_loss = criterionCls(output_s, target) 
        
           at3_loss = criterionAT(activation3_s, activation3_t).detach() * opt.beta3 
        
           at2_loss = criterionAT(activation2_s, activation2_t).detach() * opt.beta2 
        
           at1_loss = criterionAT(activation1_s, activation1_t).detach() * opt.beta1 
        
           at_loss = at1_loss + at2_loss + at3_loss + cls_loss

The text was updated successfully, but these errors were encountered:

bboylyg · 2022-01-10T07:09:54Z

Hi, thank you very much for your reminder. This error has been fixed now.

bboylyg closed this as completed May 7, 2022

neiljohn1990 mentioned this issue Aug 16, 2022

The reproducibility of experiments in the paper #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the attention loss work #8

How does the attention loss work #8

zeabin commented Jan 9, 2022

bboylyg commented Jan 10, 2022

How does the attention loss work #8

How does the attention loss work #8

Comments

zeabin commented Jan 9, 2022

bboylyg commented Jan 10, 2022