Prevent double serialization inside Flask server by tdene · Pull Request #3653 · NVIDIA/Megatron-LM

tdene · 2026-03-02T11:00:02Z

What does this PR do ?

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

copy-pr-bot · 2026-03-02T11:00:05Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

tests/unit_tests/inference/test_data_parallel_inference_coordinator.py

.../core/inference/text_generation_server/dynamic_text_gen_server/endpoints/chat_completions.py

tests/unit_tests/inference/test_data_parallel_inference_coordinator.py

megatron/core/inference/inference_client.py

tests/unit_tests/inference/test_data_parallel_inference_coordinator.py

examples/inference/gpt/gpt_dynamic_inference_with_coordinator.py

megatron/core/inference/text_generation_server/dynamic_text_gen_server/endpoints/completions.py

santhnm2 · 2026-03-11T16:20:25Z

megatron/core/inference/text_generation_server/dynamic_text_gen_server/endpoints/completions.py

+            )
+            # Unwrap ("tensor", [...]) tuples from serialize() into plain lists.
+            result = {
+                k: v[1] if isinstance(v, (list, tuple)) and len(v) == 2 and v[0] == "tensor" else v


This conversion seems like it's potentially error-prone, can we move this to a small helper function like _unwrap_completed_request so that it will be a bit easier to debug if necessary?

santhnm2 · 2026-03-11T16:22:13Z

tests/unit_tests/inference/test_data_parallel_inference_coordinator.py

+        prompt_tokens = (
+            torch.arange(len(prompt.split())) if isinstance(prompt, str) else torch.tensor(prompt)
+        )


Can you add a comment that you're doing this to mock tokenization? Looked wrong to me at first glance but then I released this was a test file

tdene · 2026-03-11T18:02:59Z

/claude review

claude

LGTM

svcnvidia-nemo-ci · 2026-03-12T15:34:41Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23010112598

This reverts commit e08dc9d. Signed-off-by: Charlie Truong <chtruong@nvidia.com>

…DIA#3653)"" This reverts commit 37280fa. Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Fix optional chat_completions returnables

1402637

tdene force-pushed the tde/fix_double_serialize branch from 97dd320 to beac7eb Compare March 2, 2026 11:18

Prevent double serialization

52e3d10

tdene force-pushed the tde/fix_double_serialize branch from beac7eb to 52e3d10 Compare March 2, 2026 12:32

lmcafee-nvidia reviewed Mar 2, 2026

View reviewed changes

examples/inference/gpt/gpt_dynamic_inference_with_coordinator.py Outdated Show resolved Hide resolved

examples/inference/gpt/gpt_dynamic_inference_with_coordinator.py Outdated Show resolved Hide resolved

lmcafee-nvidia reviewed Mar 2, 2026

View reviewed changes

megatron/core/inference/text_generation_server/dynamic_text_gen_server/endpoints/completions.py Outdated Show resolved Hide resolved

lmcafee-nvidia reviewed Mar 2, 2026

View reviewed changes

megatron/core/inference/text_generation_server/dynamic_text_gen_server/endpoints/completions.py Outdated Show resolved Hide resolved

Address reviewer comments

75d863d

tdene marked this pull request as ready for review March 2, 2026 18:28

tdene requested review from a team as code owners March 2, 2026 18:28

svcnvidia-nemo-ci requested a review from a team March 2, 2026 18:29

svcnvidia-nemo-ci added this to the Core 0.16 milestone Mar 2, 2026

copy-pr-bot bot temporarily deployed to test March 2, 2026 18:29 Inactive

tdene added the Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. label Mar 2, 2026

tdene added 3 commits March 3, 2026 14:49

Merge commit '98495afaf5433403e8351e696ba6cefa94024b50' into gh/main

3141d85

Merge remote-tracking branch 'gh/main' into HEAD

41d70f4

Merge branch 'gh/main' into tde/fix_double_serialize

fb4da49

copy-pr-bot bot temporarily deployed to test March 4, 2026 10:23 Inactive

lmcafee-nvidia approved these changes Mar 4, 2026

View reviewed changes

Merge remote-tracking branch 'gh/main' into tde/fix_double_serialize

f3077ee

tdene force-pushed the tde/fix_double_serialize branch from 23160db to f3077ee Compare March 6, 2026 16:16

lint the fixed merge conflict

39aeacd

copy-pr-bot bot temporarily deployed to test March 6, 2026 17:47 Inactive

Merge remote-tracking branch 'gh/main' into tde/fix_double_serialize

6d0d21a

svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Mar 6, 2026

copy-pr-bot bot temporarily deployed to test March 6, 2026 22:28 Inactive

copy-pr-bot bot temporarily deployed to test March 11, 2026 13:46 Inactive

santhnm2 approved these changes Mar 11, 2026

View reviewed changes

Address reviewer comments

0827047

copy-pr-bot bot temporarily deployed to test March 11, 2026 16:39 Inactive

kvareddy approved these changes Mar 11, 2026

View reviewed changes

Merge remote-tracking branch 'gh/main' into tde/fix_double_serialize

b32c548

copy-pr-bot bot temporarily deployed to test March 11, 2026 17:40 Inactive

claude bot reviewed Mar 11, 2026

View reviewed changes

shanmugamr1992 approved these changes Mar 11, 2026

View reviewed changes

svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels Mar 11, 2026

tdene added 2 commits March 11, 2026 18:28

Fix another merge; how did upstream not fail?

62e664f

Merge remote-tracking branch 'gh/main' into tde/fix_double_serialize

ec8e2b9

copy-pr-bot bot temporarily deployed to test March 11, 2026 23:31 Inactive

Fix unit test

87364cb

copy-pr-bot bot temporarily deployed to test March 12, 2026 01:50 Inactive

I hope this doesn't break next merge

01d43e5

copy-pr-bot bot temporarily deployed to test March 12, 2026 03:23 Inactive

Merge remote-tracking branch 'gh/main' into tde/fix_double_serialize

f1dfe8a

copy-pr-bot bot temporarily deployed to test March 12, 2026 14:13 Inactive

tdene enabled auto-merge March 12, 2026 14:22

tdene added this pull request to the merge queue Mar 12, 2026

Merged via the queue into NVIDIA:main with commit e08dc9d Mar 12, 2026
52 of 53 checks passed

tdene deleted the tde/fix_double_serialize branch March 12, 2026 16:17

chtruong814 added a commit to chtruong814/Megatron-LM that referenced this pull request Mar 13, 2026

Revert "Prevent double serialization inside Flask server (NVIDIA#3653)"

37280fa

This reverts commit e08dc9d. Signed-off-by: Charlie Truong <chtruong@nvidia.com>

chtruong814 mentioned this pull request Mar 13, 2026

ci: Mark TestCoordinator.test_throughput as flaky #3849

Merged

5 tasks

chtruong814 added a commit to chtruong814/Megatron-LM that referenced this pull request Mar 13, 2026

Revert "Revert "Prevent double serialization inside Flask server (NVI…

c276cde

…DIA#3653)"" This reverts commit 37280fa. Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Conversation

tdene commented Mar 2, 2026

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

copy-pr-bot bot commented Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

santhnm2 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

tdene Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

santhnm2 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

tdene Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

tdene commented Mar 11, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Uh oh!

svcnvidia-nemo-ci commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

(Step 1): Add PR label `Expert Review`