You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the forward() for MultiHeadAttention class in assignment3/cs231n/transformer_layers.py People can only get the provided expected_self_attn_output if people do attention weights --- dropout --- attention weights after dropout X value matrix. However, your assignment instruction explicitly instructed people to follow a different order, namely, attention weights --- attention weights X value matrix --- dropout. If people follow the order you actually instructed, their self_attn_output will be different from the provided expected_self_attn_output. So the check you provided in your Transformer_Captioning.ipynb is wrong.