Adam (Adaptive moment estimation) is an adaptive learning rate optimizer, combining ideas from [SGD
] with momentum and [RMSprop
] to automatically scale the learning rate:
- a weighted average of the past gradients to provide direction (first-moment)
- a weighted average of the squared past gradients to adapt the learning rate to each parameter (second-moment)
bitsandbytes also supports paged optimizers which take advantage of CUDAs unified memory to transfer memory from the GPU to the CPU when GPU memory is exhausted.
[[autodoc]] bitsandbytes.optim.Adam - init
[[autodoc]] bitsandbytes.optim.Adam8bit - init
[[autodoc]] bitsandbytes.optim.Adam32bit - init
[[autodoc]] bitsandbytes.optim.PagedAdam - init
[[autodoc]] bitsandbytes.optim.PagedAdam8bit - init
[[autodoc]] bitsandbytes.optim.PagedAdam32bit - init