Make DDP/FSDP a regular transform #122

t-vi · 2024-04-03T09:32:41Z

🚀 Feature

Make DDP/FSDP a regular transform (to a large part including making transforms flexible enough to support this).

Motivation

Currently DDP/FSDP is not a regular transform, leading to things like #94 and limiting composability / sequencing.
One of the key bits is that DDP/FSDP would need to do the adjustments we currently do to the prologue during tracing with DDP/FSDP in the transform, so we need to allow mutation of prologues through transforms. This is also in line with similar needs for other transforms (lora, quantization, but also value-and-grad-things) that change prologue signatures, so this generalization should happen.

cc @carmocca @awaelchli @crcrpar

IvanYashchuk · 2024-04-11T10:40:54Z

What is meant by making DDP/FSDP a regular transform? What are you planning to do?
Today it's not a transform at all, as I commented here #94 (comment). thunder.distributed.ddp/fsdp only annotate parameters for tracing. It's also described in the tutorial https://github.com/Lightning-AI/lightning-thunder/blob/main/notebooks/dev_tutorials/fsdp_tutorial.ipynb

I don't see any other way for sharding happen somewhen after the thunder.jit(model) call. What ideas do you have?
The current workflow is

Shard the model
Done with thunder.distributed.fsdp(model) or with torch.distributed.FullyShardedDataParallel in PyTorch
Set up the optimizer using the sharded model so that the optimizer state is a shard
Call thunder.jit(sharded_model) or torch.compile(sharded_model) in PyTorch.

t-vi · 2024-04-11T11:25:58Z

I would like to move 3 up (for thunder.jit).

IvanYashchuk · 2024-04-15T18:48:40Z

Is the preferred order then 3 -> 1 -> 2?

t-vi · 2024-04-22T11:55:19Z

So per discussions with @crcrpar and @IvanYashchuk (thank you!)

This needs to modify model & prologue and compute trace (and invalidate / modify old cached entries).
Needs to come before autograd and stay compatible.
We need a good way to represent the changes to model state, the current goal is to put this on the ThunderModule and leave the user modules intact. This is why Change prologue details to prepare for fsdp as a transform #228 simplifies parameter access and also we want to access the thunder module in the prologue (which has also come up before).

(obviously good ideas from Masaki and Ivan, not so good ones my own)

mruberry · 2024-04-29T19:11:18Z

triage review — let's start design review with draft PR to discuss

t-vi added enhancement New feature or request help wanted Extra attention is needed labels Apr 3, 2024

t-vi self-assigned this Apr 3, 2024

t-vi mentioned this issue Apr 3, 2024

Support accessing the module reference for the process group #96

Merged

IvanYashchuk added the distributed label Apr 10, 2024

crcrpar mentioned this issue Apr 19, 2024

Change prologue details to prepare for fsdp as a transform #228

Merged

mruberry added triage review and removed triage review labels Apr 22, 2024

mruberry removed the triage review label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make DDP/FSDP a regular transform #122

Make DDP/FSDP a regular transform #122

t-vi commented Apr 3, 2024 •

edited by github-actions bot

IvanYashchuk commented Apr 11, 2024

t-vi commented Apr 11, 2024

IvanYashchuk commented Apr 15, 2024

t-vi commented Apr 22, 2024 •

edited

mruberry commented Apr 29, 2024

Make DDP/FSDP a regular transform #122

Make DDP/FSDP a regular transform #122

Comments

t-vi commented Apr 3, 2024 • edited by github-actions bot

🚀 Feature

Motivation

IvanYashchuk commented Apr 11, 2024

t-vi commented Apr 11, 2024

IvanYashchuk commented Apr 15, 2024

t-vi commented Apr 22, 2024 • edited

mruberry commented Apr 29, 2024

t-vi commented Apr 3, 2024 •

edited by github-actions bot

t-vi commented Apr 22, 2024 •

edited