Adam

Adam (Adaptive moment estimation) is an adaptive learning rate optimizer, combining ideas from [SGD] with momentum and [RMSprop] to automatically scale the learning rate:

a weighted average of the past gradients to provide direction (first-moment)
a weighted average of the squared past gradients to adapt the learning rate to each parameter (second-moment)

bitsandbytes also supports paged optimizers which take advantage of CUDAs unified memory to transfer memory from the GPU to the CPU when GPU memory is exhausted.

Adam[[api-class]]

[[autodoc]] bitsandbytes.optim.Adam - init

Adam8bit

[[autodoc]] bitsandbytes.optim.Adam8bit - init

Adam32bit

[[autodoc]] bitsandbytes.optim.Adam32bit - init

PagedAdam

[[autodoc]] bitsandbytes.optim.PagedAdam - init

PagedAdam8bit

[[autodoc]] bitsandbytes.optim.PagedAdam8bit - init

PagedAdam32bit

[[autodoc]] bitsandbytes.optim.PagedAdam32bit - init

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adam.mdx

adam.mdx

Adam

Adam[[api-class]]

Adam8bit

Adam32bit

PagedAdam

PagedAdam8bit

PagedAdam32bit

Files

adam.mdx

Latest commit

History

adam.mdx

File metadata and controls

Adam

Adam[[api-class]]

Adam8bit

Adam32bit

PagedAdam

PagedAdam8bit

PagedAdam32bit