Add 3D video patch embedding for Diffusion Transformers #2809

CodersAcademy006 · 2026-01-05T09:40:12Z

This PR adds a 3D patch embedding module for video inputs, enabling
Diffusion Transformer (DiT)–style video models in Megatron-LM.

The module converts video tensors [B, C, T, H, W] into a sequence of
tokens [B, N, D] using a single Conv3D projection, consistent with ViT
and DiT architectures.

Introduces VideoPatchEmbed under megatron/core/vision
Uses Conv3D for efficient linear projection over spatiotemporal patches
Produces Transformer-ready token sequences
Includes unit test validating shape correctness
No impact on existing models or training paths

This PR is intentionally minimal and self-contained.

Fixes #2796 - Part - 3 DiT reference wiring

copy-pr-bot · 2026-01-05T09:40:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Add 3D video patch embedding for Diffusion Transformers

ad67619

CodersAcademy006 requested review from a team as code owners January 5, 2026 09:40

github-actions bot requested a review from Phlip79 January 5, 2026 09:40

github-actions bot added the community-request label Jan 5, 2026

Merge branch 'main' into feature/video-3d-patch-embedding

91f191e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 3D video patch embedding for Diffusion Transformers #2809

Add 3D video patch embedding for Diffusion Transformers #2809

Uh oh!

CodersAcademy006 commented Jan 5, 2026

Uh oh!

copy-pr-bot bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add 3D video patch embedding for Diffusion Transformers #2809

Are you sure you want to change the base?

Add 3D video patch embedding for Diffusion Transformers #2809

Uh oh!

Conversation

CodersAcademy006 commented Jan 5, 2026

Uh oh!

copy-pr-bot bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant