My use case for this library is mostly for BERT models, as opposed to Megatron+ sized LMs. ZeRO in that context is mainly useful for fitting larger batch sizes and increasing throughput. For that reason, I'm wondering if/when you are planning on adding a ZeRO compatible LAMB optimizer.