Skip to content

gradient accumulation for stage 2 #373

@eric-haibin-lin

Description

@eric-haibin-lin

ZeRO stage 2 does not support gradient accumulation, if you need gradient accumulation please use stage 1

Stage 2 significantly reduces the memory footprint. Is there a plan to support gradient accumulation for stage 2? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    user-questionQuestions about DeepSpeed.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions