[DTensor] Add torch symbol and prim for _grouped_mm #2503

kshitij12345 · 2025-09-05T12:13:34Z

Related #2338

TODO:

Figure out why we see Error from segmentation group 1: The singleton Communicator isn't available. This is most likely because the instance wasn't successfully initialized due to lack of a multi-process running (e.g. mpirun or torchrun). only when running this primitive. Need to set environment variables for nvFuser multi-device to work. See changes to helper.py

into dtensor-prims._grouped_mm

t-vi · 2025-10-02T11:10:24Z

We need to make the access to torch._grouped_mm conditional or bump the min torch version.

kshitij12345 · 2025-10-02T12:59:00Z

We need to make the access to torch._grouped_mm conditional or bump the min torch version.

Have made the access conditional, thanks @t-vi

kshitij12345 · 2025-10-02T14:06:32Z

I have pushed a couple of commits after changing PR status from draft to ready, but the Lit Job haven't been triggered.

crcrpar · 2025-10-02T16:14:40Z

thunder/tests/distributed/test_dtensor.py

+        "input_shardings",
+        [
+            (
+                [
+                    Shard(
+                        -1,
+                    )
+                ],
+                [
+                    Shard(1),
+                ],
+                [Replicate()],
+            ),
+        ],


QQ: what's the type of input_shardings? Tuple of two list of Shard/Replicates?

It is

[ ([Shard(-1)], [Shard(1)], [Replicate()]), ]

NOTE: Each elements of the tuple is Sequence[Placement] as expected by distribute_tensor

Doc: https://docs.pytorch.org/docs/stable/distributed.tensor.html#torch.distributed.tensor.distribute_tensor

DTensor: Add _grouped_mm torch and prim

d7e8a56

kshitij12345 changed the title ~~[WIP] DTensor: Add _grouped_mm torch and prim~~ [WIP] DTensor: Add torch symbol and prim for _grouped_mm Sep 5, 2025

github-actions bot added the ci label Sep 24, 2025

This was referenced Sep 26, 2025

Run test_dtensor with torchrun #2526

Open

[DTensor] Execute Llama4 DecorderLayer with DTensor via nvFuser #2338

Open

update test

9221690

kshitij12345 force-pushed the dtensor-prims._grouped_mm branch from 7d785d7 to 9221690 Compare October 2, 2025 10:21

github-actions bot removed the ci label Oct 2, 2025

kshitij12345 added 2 commits October 2, 2025 03:28

Merge branch 'main' of https://github.com/Lightning-AI/lightning-thunder

a797f6b

into dtensor-prims._grouped_mm

undo a stray change

e15451d

kshitij12345 changed the title ~~[WIP] DTensor: Add torch symbol and prim for _grouped_mm~~ [DTensor] Add torch symbol and prim for _grouped_mm Oct 2, 2025

kshitij12345 self-assigned this Oct 2, 2025

kshitij12345 added the DTensor Issues about DTensor support in Thunder label Oct 2, 2025

kshitij12345 marked this pull request as ready for review October 2, 2025 10:32

kshitij12345 requested review from mruberry, lantiga, t-vi and KaelanDt as code owners October 2, 2025 10:32

guard torch version for _grouped_mm access

196fc1c

kshitij12345 requested a review from crcrpar October 2, 2025 12:31

add clarifying comment and update

dc98fc1

crcrpar reviewed Oct 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DTensor] Add torch symbol and prim for _grouped_mm #2503

[DTensor] Add torch symbol and prim for _grouped_mm #2503

kshitij12345 commented Sep 5, 2025 •

edited

Loading

Uh oh!

t-vi commented Oct 2, 2025

Uh oh!

kshitij12345 commented Oct 2, 2025

Uh oh!

kshitij12345 commented Oct 2, 2025 •

edited

Loading

Uh oh!

crcrpar Oct 2, 2025

Uh oh!

kshitij12345 Oct 2, 2025

Uh oh!

Uh oh!

[DTensor] Add torch symbol and prim for _grouped_mm #2503

Are you sure you want to change the base?

[DTensor] Add torch symbol and prim for _grouped_mm #2503

Conversation

kshitij12345 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-vi commented Oct 2, 2025

Uh oh!

kshitij12345 commented Oct 2, 2025

Uh oh!

kshitij12345 commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crcrpar Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

kshitij12345 Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kshitij12345 commented Sep 5, 2025 •

edited

Loading

kshitij12345 commented Oct 2, 2025 •

edited

Loading