About memory missing location information #23

LzhinFdu · 2024-05-10T09:44:45Z

I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?

Lazy3valuation · 2024-05-28T08:53:55Z

From the readme: "Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)". However I can train it with 12GB with 8b quantization and a segment size of 400.

LzhinFdu · 2024-05-28T09:02:21Z

I can also run through training. However, the current training results are not very good. I'm trying to train further

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About memory missing location information #23

About memory missing location information #23

LzhinFdu commented May 10, 2024 •

edited

Lazy3valuation commented May 28, 2024

LzhinFdu commented May 28, 2024

About memory missing location information #23

About memory missing location information #23

Comments

LzhinFdu commented May 10, 2024 • edited

Lazy3valuation commented May 28, 2024

LzhinFdu commented May 28, 2024

LzhinFdu commented May 10, 2024 •

edited