Evaluate 8 bit optimisers #95

Abe404 · 2023-03-29T12:55:08Z

If I can get RootPainter working with an 8 bit optimiser it could reduce memory requirements and speed up training.

See:
https://arxiv.org/pdf/2303.10181.pdf who state: "The use of 8-bit optimiser reduces the GPU memory utilised and the convergence time. The more interesting observation is that in almost all cases (except
ViT), it also converges to a better solution, yielding a small performance improvement."
One concern is training stability. They also mention: "One reason for the degradation in
performance in transformers when using the 8-bit optimiser could be instability during training."

See:
Dettmers, T., Lewis, M., Shleifer, S., Zettlemoyer, L.: 8-bit optimizers via
block-wise quantization. In: International Conference on Learning
Representations (2022), https://openreview.net/forum?id=shpkpVXzo3h

https://github.com/TimDettmers/bitsandbytes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate 8 bit optimisers #95

Evaluate 8 bit optimisers #95

Abe404 commented Mar 29, 2023 •

edited

Loading

Evaluate 8 bit optimisers #95

Evaluate 8 bit optimisers #95

Comments

Abe404 commented Mar 29, 2023 • edited Loading

Abe404 commented Mar 29, 2023 •

edited

Loading