Skip to content

Latest commit

 

History

History
38 lines (23 loc) · 1019 Bytes

adam.mdx

File metadata and controls

38 lines (23 loc) · 1019 Bytes

Adam

Adam (Adaptive moment estimation) is an adaptive learning rate optimizer, combining ideas from [SGD] with momentum and [RMSprop] to automatically scale the learning rate:

  • a weighted average of the past gradients to provide direction (first-moment)
  • a weighted average of the squared past gradients to adapt the learning rate to each parameter (second-moment)

bitsandbytes also supports paged optimizers which take advantage of CUDAs unified memory to transfer memory from the GPU to the CPU when GPU memory is exhausted.

Adam[[api-class]]

[[autodoc]] bitsandbytes.optim.Adam - init

Adam8bit

[[autodoc]] bitsandbytes.optim.Adam8bit - init

Adam32bit

[[autodoc]] bitsandbytes.optim.Adam32bit - init

PagedAdam

[[autodoc]] bitsandbytes.optim.PagedAdam - init

PagedAdam8bit

[[autodoc]] bitsandbytes.optim.PagedAdam8bit - init

PagedAdam32bit

[[autodoc]] bitsandbytes.optim.PagedAdam32bit - init