Skip to content

Allow flexible and easy to configure HSDP #19502

Closed
@Liyang90

Description

@Liyang90

Description & Motivation

The FSDPStrategy can use hybrid sharding strategy to shard across smaller sets of ranks in the global dist group. However, it is not flexible enough to let user easily specify the sharding scale.

Pitch

The FSDPStrategy can use hybrid sharding strategy to shard across smaller sets of ranks in the global dist group. Currently there are two path to use it in Lightning:

  1. Specify sharding_strategy as one of the hybrid sharding strategies. This will shard within one node, and replicate across nodes.
  2. Specify sharding_strategy as one of the hybrid sharding strategies, and provide process_group as kwards to FSDPStrategy. This let user specify how large the sharding scale is. However, it is not easy for user to insert torch dist groups creation code and prepare the process_group ahead of time, because Lightning handles torch dist init_process_group automatically in trainer, or the fabric launcher.

So I'm looking forward to a easier way to use HSDP within Lightning, like:
FSDPStrategy(..., sharding_strategy="HYBRID_SHARD", fsdp_size=16)
to easily shard at specified scale, and let Lightning handle process_group preparation for PyTorch FSPD wrapper.

Alternatives

No response

Additional context

No response

cc @Borda @awaelchli @carmocca

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionIn a discussion stagefeatureIs an improvement or enhancementstrategy: fsdpFully Sharded Data Parallel

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions