perf(is-seg): swap pydantic response for dataclass twins on workflow path (stacked on #22)#29
Open
aseembits93 wants to merge 1 commit into
Open
Conversation
…path
Stacks on top of PR#22 (optimize-rfdetr-seg: Triton fusion + CUDA graphs
+ scratch caching). See PR#28 for the same change against main.
`InferenceModelsInstanceSegmentationAdapter.postprocess` built a full
pydantic tree per frame — `Point × V` per polygon vertex,
`InstanceSegmentationPrediction × N`, then
`InstanceSegmentationInferenceResponse`. The workflow block then called
`response.model_dump(by_alias=True, exclude_none=True)` to get a plain
dict for `sv.Detections.from_inference`. Neither validation nor the
serializer is needed on that path — the block only consumes the dict.
This change adds slotted dataclass twins (`PointDC`,
`InferenceResponseImageDC`, `InstanceSegmentationPredictionDC`,
`InstanceSegmentationInferenceResponseDC`) plus `_is_pred_dc_to_dict`
and `_is_response_dc_to_dict` helpers that emit the exact dict
`model_dump(by_alias=True, exclude_none=True)` produces (same keys,
same `class` alias, same None-omission).
The adapter gates on `kwargs.get("source") == "workflow-execution"`
and returns the dataclass response on that path. Every other caller —
HTTP `response_model` at `http_api.py:1640`, `isinstance`-based cache
dispatch at `cache/serializers.py:71`, `draw_predictions`
visualization — keeps the pydantic path untouched.
The v3 workflow block detects the dataclass via `isinstance` and calls
`_is_response_dc_to_dict`; falls back to `model_dump` for any other
response type.
Microbench (4 dets × 6-vertex polygon, construct + dump):
* pydantic: ~81 us/frame
* dataclass: ~34 us/frame (2.43x faster)
End-to-end (rfdetr-seg-nano TRT + Triton preproc + Triton fullpost +
CUDA graphs, vehicles_312px.mp4, 538 frames, 4 runs each, on top of
optimize-rfdetr-seg HEAD c1406a8):
* baseline (pydantic): 152.93 FPS mean
* dataclass: 156.54 FPS mean (+3.6 FPS, +2.4%)
Bit-exact parity verified: `_is_response_dc_to_dict(dc)` byte-equals
`pyd.model_dump(by_alias=True, exclude_none=True)` for mixed inputs
(varying polygon lengths, empty list, mutation of .time/.inference_id
post-construct).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #22. Non-stacked version against
mainis #28. Open this PR once #22 lands; then its diff againstmainwill be identical to #28's (modulo target-branch differences).InferenceModelsInstanceSegmentationAdapter.postprocessbuilt a full pydantic tree per frame —Point × Vper polygon vertex,InstanceSegmentationPrediction × N, thenInstanceSegmentationInferenceResponse. The workflow block then calledresponse.model_dump(by_alias=True, exclude_none=True)to get a plain dict forsv.Detections.from_inference. Neither the validation nor the serializer is needed on that path — the block only consumes the dict.This change adds slotted dataclass twins (
PointDC,InferenceResponseImageDC,InstanceSegmentationPredictionDC,InstanceSegmentationInferenceResponseDC) plus_is_pred_dc_to_dict/_is_response_dc_to_dicthelpers that emit the exact dictmodel_dump(by_alias=True, exclude_none=True)produces.The adapter gates on
kwargs.get("source") == "workflow-execution"and returns the dataclass response on that path. Every other caller — HTTPresponse_modelathttp_api.py:1640,isinstance-based cache dispatch atcache/serializers.py:71,draw_predictionsvisualization — keeps the pydantic path untouched.The v3 workflow block detects the dataclass via
isinstanceand calls_is_response_dc_to_dict; falls back tomodel_dumpfor any other response type.Numbers
Microbench (4 dets × 6-vertex polygon, construct + dump):
Δ ≈ 2.4× faster.
End-to-end on top of #22's full stack (rfdetr-seg-nano TRT + Triton preproc + Triton fullpost + CUDA graphs,
vehicles_312px.mp4, 538 frames, 4 runs each):Δ ≈ +3.6 FPS (+2.4%).
Correctness
Bit-exact parity:
_is_response_dc_to_dict(dc) == pyd.model_dump(by_alias=True, exclude_none=True)on mixed inputs including varying polygon lengths, empty polygons, and post-construction mutation of.time/.inference_id(assigned byModel.infer_from_requestatinference/core/models/base.py:154-157).@dataclass(slots=True)permits the reassignment because those fields are declared.Test plan
pytest tests/workflows/unit_tests/core_steps/models/roboflow/instance_segmentation/test_v3.py— 23/23 pass_is_response_dc_to_dict(dc) == pyd.model_dump(by_alias=True, exclude_none=True))/infer/instance_segmentationlocally and confirm JSON is byte-identical to pre-change (expected — adapter falls through to pydantic whensource != "workflow-execution")🤖 Generated with Claude Code