perf(is-seg): swap pydantic response for dataclass twins on workflow path (stacked on #22) by aseembits93 · Pull Request #29 · aseembits93/inference

aseembits93 · 2026-04-30T22:27:19Z

Summary

Stacked on #22. Non-stacked version against main is #28. Open this PR once #22 lands; then its diff against main will be identical to #28's (modulo target-branch differences).

InferenceModelsInstanceSegmentationAdapter.postprocess built a full pydantic tree per frame — Point × V per polygon vertex, InstanceSegmentationPrediction × N, then InstanceSegmentationInferenceResponse. The workflow block then called response.model_dump(by_alias=True, exclude_none=True) to get a plain dict for sv.Detections.from_inference. Neither the validation nor the serializer is needed on that path — the block only consumes the dict.

This change adds slotted dataclass twins (PointDC, InferenceResponseImageDC, InstanceSegmentationPredictionDC, InstanceSegmentationInferenceResponseDC) plus _is_pred_dc_to_dict / _is_response_dc_to_dict helpers that emit the exact dict model_dump(by_alias=True, exclude_none=True) produces.

The adapter gates on kwargs.get("source") == "workflow-execution" and returns the dataclass response on that path. Every other caller — HTTP response_model at http_api.py:1640, isinstance-based cache dispatch at cache/serializers.py:71, draw_predictions visualization — keeps the pydantic path untouched.

The v3 workflow block detects the dataclass via isinstance and calls _is_response_dc_to_dict; falls back to model_dump for any other response type.

Numbers

Microbench (4 dets × 6-vertex polygon, construct + dump):

	µs/frame
pydantic	~81
dataclass	~34

Δ ≈ 2.4× faster.

End-to-end on top of #22's full stack (rfdetr-seg-nano TRT + Triton preproc + Triton fullpost + CUDA graphs, vehicles_312px.mp4, 538 frames, 4 runs each):

	run 1	run 2	run 3	run 4	mean
baseline (optimize-rfdetr-seg HEAD)	152.55	152.62	153.23	153.33	152.93
this change	156.33	156.88	157.10	155.86	156.54

Δ ≈ +3.6 FPS (+2.4%).

Correctness

Bit-exact parity: _is_response_dc_to_dict(dc) == pyd.model_dump(by_alias=True, exclude_none=True) on mixed inputs including varying polygon lengths, empty polygons, and post-construction mutation of .time / .inference_id (assigned by Model.infer_from_request at inference/core/models/base.py:154-157). @dataclass(slots=True) permits the reassignment because those fields are declared.

Test plan

pytest tests/workflows/unit_tests/core_steps/models/roboflow/instance_segmentation/test_v3.py — 23/23 pass
Parity test (_is_response_dc_to_dict(dc) == pyd.model_dump(by_alias=True, exclude_none=True))
Microbench
End-to-end FPS benchmark on optimize-rfdetr-seg's full Triton stack
HTTP regression: hit /infer/instance_segmentation locally and confirm JSON is byte-identical to pre-change (expected — adapter falls through to pydantic when source != "workflow-execution")

🤖 Generated with Claude Code

…path Stacks on top of PR#22 (optimize-rfdetr-seg: Triton fusion + CUDA graphs + scratch caching). See PR#28 for the same change against main. `InferenceModelsInstanceSegmentationAdapter.postprocess` built a full pydantic tree per frame — `Point × V` per polygon vertex, `InstanceSegmentationPrediction × N`, then `InstanceSegmentationInferenceResponse`. The workflow block then called `response.model_dump(by_alias=True, exclude_none=True)` to get a plain dict for `sv.Detections.from_inference`. Neither validation nor the serializer is needed on that path — the block only consumes the dict. This change adds slotted dataclass twins (`PointDC`, `InferenceResponseImageDC`, `InstanceSegmentationPredictionDC`, `InstanceSegmentationInferenceResponseDC`) plus `_is_pred_dc_to_dict` and `_is_response_dc_to_dict` helpers that emit the exact dict `model_dump(by_alias=True, exclude_none=True)` produces (same keys, same `class` alias, same None-omission). The adapter gates on `kwargs.get("source") == "workflow-execution"` and returns the dataclass response on that path. Every other caller — HTTP `response_model` at `http_api.py:1640`, `isinstance`-based cache dispatch at `cache/serializers.py:71`, `draw_predictions` visualization — keeps the pydantic path untouched. The v3 workflow block detects the dataclass via `isinstance` and calls `_is_response_dc_to_dict`; falls back to `model_dump` for any other response type. Microbench (4 dets × 6-vertex polygon, construct + dump): * pydantic: ~81 us/frame * dataclass: ~34 us/frame (2.43x faster) End-to-end (rfdetr-seg-nano TRT + Triton preproc + Triton fullpost + CUDA graphs, vehicles_312px.mp4, 538 frames, 4 runs each, on top of optimize-rfdetr-seg HEAD c1406a8): * baseline (pydantic): 152.93 FPS mean * dataclass: 156.54 FPS mean (+3.6 FPS, +2.4%) Bit-exact parity verified: `_is_response_dc_to_dict(dc)` byte-equals `pyd.model_dump(by_alias=True, exclude_none=True)` for mixed inputs (varying polygon lengths, empty list, mutation of .time/.inference_id post-construct). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aseembits93 mentioned this pull request Apr 30, 2026

perf(rfdetr-seg): Triton fusion + CUDA graphs + scratch caching + dataclass response (109 → 156 FPS) #30

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(is-seg): swap pydantic response for dataclass twins on workflow path (stacked on #22)#29

perf(is-seg): swap pydantic response for dataclass twins on workflow path (stacked on #22)#29
aseembits93 wants to merge 1 commit into
optimize-rfdetr-segfrom
perf/is-seg-workflow-dataclasses-stacked

aseembits93 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aseembits93 commented Apr 30, 2026

Summary

Numbers

Correctness

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants