Add plugins documentation #2207

KaelanDt · 2025-06-10T10:55:53Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

This PR helps first-time users understanding plugins better by adding documentation for plugins, namely the DDP, FSDP, QuantizeInt4, FP8 and ReduceOverhead plugins.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda @lantiga

t-vi

Supergood to have those documented. Thank you!

crcrpar · 2025-06-10T12:09:00Z

thunder/plugins/distributed.py

+    This plugin applies the necessary transforms to bucket and synchronize gradients across
+    multiple processes, using a specified process group for communication.
+
+    See https://github.com/pytorch/pytorch/blob/v2.7.0/torch/nn/parallel/distributed.py#L326 for more details.


How about cross-referencing DDP instead of referencing pytorch doc of a certain version?

e.g. :class:~torch.nn.parallel.distributed.DistributedDataParallel would work

crcrpar · 2025-06-10T12:10:15Z

thunder/plugins/distributed.py

+    Args:
+        bucket_size_in_mb: float, default 25.0
+            Size in megabytes of the gradient bucket in DDP.
+        broadcast_from: int | None, default None
+            Global rank ID to broadcast model parameters from at initialization. If None, no explicit broadcast is performed.
+        process_group: Optional[ProcessGroup], default is the current default process group


type annotation and default values can be obviated looking at

crcrpar · 2025-06-10T12:40:17Z

thunder/plugins/distributed.py

+    Args:
+        device: torch.device | None, default None
+            Device on which to place sharded modules. If None, modules remain on their existing devices.
+        broadcast_from: int | None, default None
+            Global rank ID to broadcast parameters from before sharding. If None, no broadcast is performed.
+        sharding_strategy: FSDPType, default FSDPType.ZERO2
+            Strategy for parameter sharding (e.g., ZERO2 for sharding both parameters and optimizer state).
+        bucketing_strategy: FSDPBucketingStrategy, default FSDPBucketingStrategy.NONE
+            Bucketing strategy to use when saving or loading FSDP checkpoints.
+        move_state_dict_to_cpu: bool, default False
+            Whether to move the state dict parameters to CPU after serialization to reduce GPU memory usage.
+        ddp_bucket_size_in_mb: float, default 25.0
+            Bucket size in megabytes for the DDP transform when used in a combined mesh with FSDP.
+        process_group: Optional[ProcessGroup or DeviceMesh], default is the current default process group
+            The process group or device mesh to use for distributed communication. If None, uses the default process group.


same. I think we can skip type annotations and default values

crcrpar · 2025-06-10T14:21:18Z

thunder/plugins/fp8.py

+    """
+    Plugin for enabling FP8 precision via NVIDIA Transformer Engine, enabling higher throughput of matrix operations in FP8.
+
+    See `lightning-thunder/thunder/executors/transformer_engineex.py` for implementation details.


Can we reference this file with a relative path?

crcrpar · 2025-06-10T14:21:33Z

thunder/plugins/quantization.py

+    model weights, reducing memory footprint and improving
+    throughput for both training and inference.
+
+    See https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L889 for more details.


it'd be better if this link is a permalink

add plugins documentation

a158c49

KaelanDt requested review from lantiga, mruberry and t-vi as code owners June 10, 2025 10:55

github-actions bot added the documentation Improvements or additions to documentation label Jun 10, 2025

KaelanDt added the lightning-l1 label Jun 10, 2025

add to index

9c31a14

Borda approved these changes Jun 10, 2025

View reviewed changes

k223kim approved these changes Jun 10, 2025

View reviewed changes

fix index typo

c242265

ethanwharris approved these changes Jun 10, 2025

View reviewed changes

t-vi approved these changes Jun 10, 2025

View reviewed changes

t-vi merged commit 9656af4 into main Jun 10, 2025
49 checks passed

t-vi deleted the kaelan/plugins-docstrings branch June 10, 2025 14:03

crcrpar reviewed Jun 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add plugins documentation #2207

Add plugins documentation #2207

Uh oh!

KaelanDt commented Jun 10, 2025 •

edited

Loading

Uh oh!

t-vi left a comment

Uh oh!

Uh oh!

crcrpar Jun 10, 2025

Uh oh!

crcrpar Jun 10, 2025

Uh oh!

crcrpar Jun 10, 2025

Uh oh!

crcrpar Jun 10, 2025

Uh oh!

crcrpar Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add plugins documentation #2207

Add plugins documentation #2207

Uh oh!

Conversation

KaelanDt commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Did you have fun?

Uh oh!

t-vi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

crcrpar Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

crcrpar Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

crcrpar Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

crcrpar Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

crcrpar Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

KaelanDt commented Jun 10, 2025 •

edited

Loading