You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?
The text was updated successfully, but these errors were encountered:
From the readme: "Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)". However I can train it with 12GB with 8b quantization and a segment size of 400.
I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?
The text was updated successfully, but these errors were encountered: