You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that detach() is called before backward() for attention loss in train_step and the back propagation should not go through attention loss. So how can the attention loss work?
Hi, thanks for sharing the code.
I notice that
detach()
is called beforebackward()
for attention loss intrain_step
and the back propagation should not go through attention loss. So how can the attention loss work?NAD/main.py
Lines 30 to 34 in d61e4d7
The text was updated successfully, but these errors were encountered: