Track errors through the inference return path#3776
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| entry = self.requests[request_id] | ||
| request = entry.record[-1] |
There was a problem hiding this comment.
Should this be
request = self.requests[request_id]
entry = request.record[-1]
There was a problem hiding this comment.
I understand what you mean.
But unfortunately, self.requests is a misnomer. self.requests is a Dict[int, RequestEntry], where each RequestEntry contains a DynamicInferenceRequestRecord, and each DynamicInferenceRequestRecord contains a list[DynamicInferenceRequest].
So if anything, we should be changing the name of self.requests to self.request_entries.
There was a problem hiding this comment.
I see, can you make it this then?
request_entry = self.requests[request_id]
request = request_entry.record[-1]
|
/claude test |
|
/claude review |
| if failed_errors: | ||
| error_detail = "; ".join(failed_errors) | ||
| logger.error(f"Inference request(s) failed: {error_detail}") | ||
| return Response(f"Inference request(s) failed: {error_detail}", status=400) |
There was a problem hiding this comment.
Bug: HTTP 400 indicates a client error ("bad request"), but inference failures can also be server-side (e.g. ERROR_TRANSIENT). Returning 400 for transient/server errors is misleading to clients — they may not retry when they should. Consider using 500 (or 503 for transient errors) instead, or at minimum differentiating based on the event type since you already distinguish ERROR_TRANSIENT vs ERROR_NONTRANSIENT.
There was a problem hiding this comment.
Hi Claude, thank you for your feedback. I have addressed this concern!
| if failed_errors: | ||
| error_detail = "; ".join(failed_errors) | ||
| logger.error(f"Inference request(s) failed: {error_detail}") | ||
| return f"Inference request(s) failed: {error_detail}", 400 |
There was a problem hiding this comment.
Same HTTP 400 concern as in chat_completions.py — transient/server-side errors should not be reported as 400.
Also minor inconsistency: chat_completions.py returns Response(msg, status=400) while this file returns a tuple (msg, 400). Both work in Flask/Quart, but it would be cleaner to use the same style (the existing exception handler on line 120 already uses the tuple form, so the tuple is fine here — just noting the cross-file inconsistency).
|
/claude review |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23253541987 |
This reverts commit ee00a70.
What does this PR do ?
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.