You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I ran the unit test "test_flash_atten.py" for the function "flash_attn_varlen_func", I noticed that the values of dq in "flash_attn_varlen_func" during the backward process has slight distinct in different test(with a fixed random seed and the same input), but dk and dv are exactly same. During the forward pass, the output for the same input remained consistent. Does this indicate that in FA2 the forward is deterministic, but the backward is not? What could be the possible reasons for this uncertainty?
The text was updated successfully, but these errors were encountered:
When I ran the unit test "test_flash_atten.py" for the function "flash_attn_varlen_func", I noticed that the values of dq in "flash_attn_varlen_func" during the backward process has slight distinct in different test(with a fixed random seed and the same input), but dk and dv are exactly same. During the forward pass, the output for the same input remained consistent. Does this indicate that in FA2 the forward is deterministic, but the backward is not? What could be the possible reasons for this uncertainty?
The text was updated successfully, but these errors were encountered: