> ZeRO stage 2 does not support gradient accumulation, if you need gradient accumulation please use stage 1 Stage 2 significantly reduces the memory footprint. Is there a plan to support gradient accumulation for stage 2? Thanks.
Stage 2 significantly reduces the memory footprint. Is there a plan to support gradient accumulation for stage 2? Thanks.