You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I can get RootPainter working with an 8 bit optimiser it could reduce memory requirements and speed up training.
See: https://arxiv.org/pdf/2303.10181.pdf who state: "The use of 8-bit optimiser reduces the GPU memory utilised and the convergence time. The more interesting observation is that in almost all cases (except
ViT), it also converges to a better solution, yielding a small performance improvement."
One concern is training stability. They also mention: "One reason for the degradation in
performance in transformers when using the 8-bit optimiser could be instability during training."
See:
Dettmers, T., Lewis, M., Shleifer, S., Zettlemoyer, L.: 8-bit optimizers via
block-wise quantization. In: International Conference on Learning
Representations (2022), https://openreview.net/forum?id=shpkpVXzo3h
If I can get RootPainter working with an 8 bit optimiser it could reduce memory requirements and speed up training.
See:
https://arxiv.org/pdf/2303.10181.pdf who state: "The use of 8-bit optimiser reduces the GPU memory utilised and the convergence time. The more interesting observation is that in almost all cases (except
ViT), it also converges to a better solution, yielding a small performance improvement."
One concern is training stability. They also mention: "One reason for the degradation in
performance in transformers when using the 8-bit optimiser could be instability during training."
See:
Dettmers, T., Lewis, M., Shleifer, S., Zettlemoyer, L.: 8-bit optimizers via
block-wise quantization. In: International Conference on Learning
Representations (2022), https://openreview.net/forum?id=shpkpVXzo3h
https://github.com/TimDettmers/bitsandbytes
The text was updated successfully, but these errors were encountered: