Skip to content

perf(is-seg): swap pydantic response for dataclass twins on workflow path#28

Open
aseembits93 wants to merge 1 commit into
mainfrom
perf/is-seg-workflow-dataclasses
Open

perf(is-seg): swap pydantic response for dataclass twins on workflow path#28
aseembits93 wants to merge 1 commit into
mainfrom
perf/is-seg-workflow-dataclasses

Conversation

@aseembits93
Copy link
Copy Markdown
Owner

Summary

InferenceModelsInstanceSegmentationAdapter.postprocess built a full pydantic tree per frame — Point × V per polygon vertex, InstanceSegmentationPrediction × N, then InstanceSegmentationInferenceResponse. The workflow block then called response.model_dump(by_alias=True, exclude_none=True) to get a plain dict for sv.Detections.from_inference. Neither the validation nor the serializer is needed on that path — the block only consumes the dict.

This change adds slotted dataclass twins (PointDC, InferenceResponseImageDC, InstanceSegmentationPredictionDC, InstanceSegmentationInferenceResponseDC) plus _is_pred_dc_to_dict / _is_response_dc_to_dict helpers that emit the exact dict model_dump(by_alias=True, exclude_none=True) produces (same keys, same alias \"class\", same exclude_none behavior, same mask_format=\"polygon\" constant).

The adapter gates on kwargs.get(\"source\") == \"workflow-execution\" (and not return_in_rle) and returns the dataclass response on that path. Every other caller — HTTP response_model at http_api.py:1640, isinstance-based cache dispatch at cache/serializers.py:71, draw_predictions visualization, RLE response mode — keeps the pydantic path untouched.

The v3 workflow block detects the dataclass via isinstance and calls _is_response_dc_to_dict; it falls back to model_dump for any other response type (e.g. if a non-rfdetr backend is ever bound to the same block).

Why bother

We already tried model_construct on this branch's ancestor; it was 2× slower than pydantic's Rust-validated __init__. Swapping to dataclasses works because the saving isn't in construction alone — it's construction + model_dump combined. Pydantic v2's serializer is Python-heavy for nested types with aliases + exclude_none, whereas the hand-rolled dict walk is a dozen dict literals.

Numbers

Microbench (4 dets × 6-vertex polygon, construct + dump):

µs/frame
pydantic 81
dataclass 33

Δ = −48 µs/frame (2.43× faster).

End-to-end (rfdetr-seg-nano TRT + Triton preproc + Triton fullpost + CUDA graphs, vehicles_312px.mp4, 538 frames, 4 runs each):

run 1 run 2 run 3 run 4 mean
baseline 153.72 154.27 153.11 153.63 153.68
this change 159.00 157.62 157.18 157.03 157.71

Δ = +4.03 FPS (+2.6%).

Correctness

Parity tested on real inputs: _is_response_dc_to_dict(dc) byte-equals pyd.model_dump(by_alias=True, exclude_none=True) (modulo detection_id UUIDs from default_factory, which both paths generate). Tests cover:

  • Mixed detection set with varying polygon lengths (0, 3, 6 vertices)
  • Post-construction mutation of .time / .inference_id by Model.infer_from_request
  • Empty-predictions edge case

Model.infer_from_request assigns response.time and response.inference_id at inference/core/models/base.py:154,157. Those two fields are declared in InstanceSegmentationInferenceResponseDC, so the slotted dataclass permits the reassignment.

Test plan

  • pytest tests/workflows/unit_tests/core_steps/models/roboflow/instance_segmentation/test_v3.py — 23/23 pass
  • Parity test (_is_response_dc_to_dict(dc) == pyd.model_dump(by_alias=True, exclude_none=True))
  • Microbench
  • End-to-end FPS benchmark
  • HTTP regression: hit /infer/instance_segmentation against a local RF-DETR seg model, confirm the JSON response is byte-identical to pre-change (should be — adapter falls through to the pydantic branch because the HTTP request doesn't set source=\"workflow-execution\")
  • RLE path: send a request with response_mask_format=\"rle\" via workflows, confirm it still goes through the pydantic InstanceSegmentationRLEPrediction branch (gate excludes RLE)

🤖 Generated with Claude Code

…path

`InferenceModelsInstanceSegmentationAdapter.postprocess` built a full
pydantic tree per frame:

    Point × V    (per polygon vertex)
    InstanceSegmentationPrediction × N
    InstanceSegmentationInferenceResponse

The workflow block then called `response.model_dump(by_alias=True,
exclude_none=True)` to get a plain dict for `sv.Detections.from_inference`.
Neither the pydantic validation nor the serializer machinery is needed on
that path; the block only consumes the dict.

This change adds slotted dataclass twins in `inference.py`:

    PointDC
    InferenceResponseImageDC
    InstanceSegmentationPredictionDC
    InstanceSegmentationInferenceResponseDC

plus `_is_pred_dc_to_dict` / `_is_response_dc_to_dict` helpers that emit
the exact dict `model_dump(by_alias=True, exclude_none=True)` would
produce (same keys, same alias `"class"`, same `exclude_none` behavior,
same `mask_format="polygon"` constant).

The adapter gates on `kwargs.get("source") == "workflow-execution"` (and
`not return_in_rle`) and returns the dataclass response on that path.
Every other caller (HTTP `response_model`, `isinstance`-based cache
dispatch, `draw_predictions` visualization, RLE response mode) keeps the
pydantic path untouched.

The v3 workflow block detects the dataclass via isinstance and calls
`_is_response_dc_to_dict` instead of `model_dump`; it falls back to
`model_dump` for any other response type.

Microbench (4 dets × 6-vertex polygon, construct + dump):
  * pydantic:  81 us/frame
  * dataclass: 33 us/frame  (2.43x faster)

End-to-end benchmark (rfdetr-seg-nano TRT + Triton preproc + Triton
fullpost + CUDA graphs, vehicles_312px.mp4, 538 frames, 4 runs each;
measured on branch optimize-rfdetr-seg but the gain composes
identically on main):

  * baseline: 153.68 FPS mean
  * this change: 157.71 FPS mean  (+4.0 FPS, +2.6%)

Bit-exact parity verified: `_is_response_dc_to_dict(dc)` equals
`pyd.model_dump(by_alias=True, exclude_none=True)` for all test inputs;
mutation of `.time` / `.inference_id` by `Model.infer_from_request`
works on the dataclass because those fields are declared and slotted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants