Skip to content

feat: introduce MaxJsonSize trait for JSON encode buffer sizing#126

Merged
MathiasKoch merged 6 commits into
masterfrom
fix/jobs-commands-defender-qos
May 6, 2026
Merged

feat: introduce MaxJsonSize trait for JSON encode buffer sizing#126
MathiasKoch merged 6 commits into
masterfrom
fix/jobs-commands-defender-qos

Conversation

@MathiasKoch
Copy link
Copy Markdown
Member

@MathiasKoch MathiasKoch commented May 6, 2026

Summary

Update::max_size() (jobs and commands) and the provisioning register buffer were hardcoded values that overflow on non-trivial payloads — serde_json_core::to_slice returns PayloadError::BufferSize and the publish surfaces as JobError::Mqtt / Error::BufferSize. What's really a buffer-size bug looks like an MQTT/broker problem.

Observed on factbird-edge factory_reset: the post-cleanup IN_PROGRESS update carries a 3-participant report (~700 bytes JSON) and overflows the hardcoded 512-byte buffer every time.

Bumping the constants high penalises mqttrust/no_std users who pay for an embedded buffer they don't need. Make the size caller-known via the type system instead.

What's new

/// Upper bound on the JSON-serialized size of `Self`. Used by `ToPayload`
/// implementations on generic-Serialize wrapper types to size their encode
/// buffers.
pub trait MaxJsonSize: serde::Serialize {
    const MAX_JSON_SIZE: usize;
}

impl MaxJsonSize for () {
    const MAX_JSON_SIZE: usize = 4;
}

Serialize is a super-trait — a MAX_JSON_SIZE only makes sense for a JSON-encodable type, and use sites get a single tighter bound (S: MaxJsonSize) instead of the compound S: Serialize + MaxJsonSize.

Sites updated

Site Before After
jobs::Update<'a, S> (and report_progress / succeed_job / fail_job / publish_and_wait) S: Serialize, hardcoded 512 S: MaxJsonSize, S::MAX_JSON_SIZE + framing
commands::Update<'a, R> (and succeed / publish_and_wait) R: Serialize, hardcoded 2048 R: MaxJsonSize, R::MAX_JSON_SIZE + framing
provisioning::FleetProvisioner::provision* parameters impl Serialize, hardcoded 1024 impl MaxJsonSize, P::MAX_JSON_SIZE + framing
transfer::status_details::StatusDetailsExt (no size info) gains const MAX_EXTRA_JSON_SIZE: usize
StatusDetails, CombinedStatusDetails<E> (no MaxJsonSize) impl MaxJsonSize; combined delegates to E::MAX_EXTRA_JSON_SIZE

Breaking change

Callers passing a custom status_details (jobs), result (commands), or provisioning parameters type need to add an impl:

impl MaxJsonSize for MyStatusDetails {
    const MAX_JSON_SIZE: usize = 1024;
}

StatusDetailsExt impls likewise need to declare MAX_EXTRA_JSON_SIZE.

Test plan

  • cargo check (default features)
  • cargo check --no-default-features --features std,mqtt_greengrass
  • cargo test --no-default-features --features std,mqtt_mqttrust,commands_cbor,metric_cbor --lib — 88/88 still passing
  • Hardware: factbird-edge factory_reset reaches a terminal state in cloud (CleanupCompleted progress reaches the cloud after the caller's status_details type declares an appropriate MAX_JSON_SIZE)

Follow-ups (not in this PR)

  • JobError::Mqtt mapping in jobs/stream.rs discards the underlying error type. Surfacing the typed error would prevent future buffer-overflow / serialization issues from being misdiagnosed as broker problems.
  • shadows::ShadowRoot::MAX_PAYLOAD_SIZE follows a similar per-trait const pattern. Could optionally be unified under MaxJsonSize later, but that's a wider refactor.

Same pattern as #122 — bump remaining request/response topic pairs
from AtMostOnce to AtLeastOnce so a transient broker disconnect
between SUBSCRIBE and the broker's accepted/rejected reply doesn't
silently drop the response, and so QoS-1-spooled publishes survive
a brief reconnect window instead of being dropped.

* jobs/stream.rs: bump JobAgent::subscribe (notify-next +
  describe-accepted), the describe publish, report_progress publish,
  and publish_and_wait's update/accepted+rejected subscriptions.
* commands/stream.rs: bump CommandAgent::subscribe (executions/+/request)
  and report_in_progress publish.
* defender_metrics/mod.rs: bump publish_and_subscribe's accepted+rejected
  subs and the metric publish.

Transfer's data_interface stays at QoS 0 — those are high-volume OTA
data blocks with their own retry semantics, same as before #122.

Discovered during factbird-edge factory_reset end-to-end testing:
job-manager's report_progress publish ~5–10s after a deployment-driven
cleanup returned `JobError::Mqtt` because the broker had briefly
dropped the connection during the cleanup, and QoS 0 publishes were
silently lost.
@MathiasKoch MathiasKoch force-pushed the fix/jobs-commands-defender-qos branch from 724feb5 to 83b4700 Compare May 6, 2026 09:04
@MathiasKoch MathiasKoch changed the title fix(jobs): bump Update::max_size from 512 to 8192 feat(jobs): make Update encode-buffer size caller-configurable May 6, 2026
`Update::max_size()` (jobs and commands) and the provisioning register
buffer were hardcoded values that overflow on non-trivial payloads —
`serde_json_core::to_slice` returns `PayloadError::BufferSize` and the
publish surfaces as `JobError::Mqtt` / `Error::BufferSize`, so what's
really a buffer-size bug looks like an MQTT/broker problem.

Observed on factbird-edge factory_reset: the post-cleanup `IN_PROGRESS`
update carries a 3-participant report (~700 bytes JSON) and overflows
the hardcoded 512-byte buffer in `jobs::Update::max_size()` every time.

Bumping the constants high penalises resource-constrained mqttrust
users who pay for an embedded buffer they don't need. Make the size
caller-known via the type system instead:

* New `MaxJsonSize: Serialize` super-trait in `mqtt/mod.rs`. Implementer
  declares `const MAX_JSON_SIZE: usize` — the worst-case JSON-encoded
  size of `Self`. `Serialize` is a super-trait because a max-size
  hint only makes sense for a JSON-encodable type; this gives use sites
  a single tighter bound (`S: MaxJsonSize`) instead of the compound
  `S: Serialize + MaxJsonSize`.

* `jobs::Update<'a, S>` bound changes `S: Serialize` → `S: MaxJsonSize`.
  `ToPayload::max_size` returns `S::MAX_JSON_SIZE + framing`. Same for
  `report_progress`, `succeed_job`, `fail_job`, `publish_and_wait`.

* `commands::Update<'a, R>` bound changes `R: Serialize` →
  `R: MaxJsonSize`. Same for `succeed`, `publish_and_wait`. Replaces
  the hardcoded 2048 with `R::MAX_JSON_SIZE + framing`.

* `provisioning::FleetProvisioner::provision*` parameters bound changes
  `impl Serialize` → `impl MaxJsonSize`. The DeferredPayload buffer is
  sized from `P::MAX_JSON_SIZE + framing` instead of a hardcoded 1024.

* `transfer::status_details::StatusDetailsExt` gains
  `const MAX_EXTRA_JSON_SIZE: usize`. `StatusDetails` and
  `CombinedStatusDetails<E>` impl `MaxJsonSize` — the latter delegates
  to `E::MAX_EXTRA_JSON_SIZE` so users only declare the size of their
  own contributed fields.

* Built-in impls: `() => 4` (serializes to "null"). Test types and
  `RejectDetails` get explicit impls.

Breaking change for callers who pass a custom `status_details` /
command `result` / provisioning `parameters` type — they need to add
`impl MaxJsonSize for MyType { const MAX_JSON_SIZE: usize = N; }`.
@MathiasKoch MathiasKoch force-pushed the fix/jobs-commands-defender-qos branch from 83b4700 to b14526c Compare May 6, 2026 09:45
@MathiasKoch MathiasKoch changed the title feat(jobs): make Update encode-buffer size caller-configurable feat: introduce MaxJsonSize trait for JSON encode buffer sizing May 6, 2026
MathiasKoch and others added 4 commits May 6, 2026 22:00
The bound bump on `Update<S>` / `provision*` ripples through the
integration tests, which use their own custom Serialize types.

* `tests/common/file_handler.rs`: TestStatusDetails gains
  MAX_EXTRA_JSON_SIZE (firmware_version is short).
* `tests/provisioning.rs`: Parameters<'a> gets MaxJsonSize, and the
  FleetProvisioner::provision[_cbor] turbofish gains a third inferred
  generic for the new P type parameter.
* commands::ResultMap moves its MaxJsonSize impl from a private
  test-only block to data_types.rs so integration tests can use it.
`Update<'_, ResultMap>::max_size()` is now `R::MAX_JSON_SIZE +
UPDATE_FRAMING_OVERHEAD` = 4096 + 1280 = 5376, which exceeded the test
client's 4096-byte TX ring buffer. mqttrust's `grant_async` future stays
pending forever when the requested size exceeds buffer capacity, so the
publish would silently retry 3× (5+10+15s) before returning
`Error::Timeout` — surfaced to the test as `succeed: Mqtt`.
@MathiasKoch MathiasKoch merged commit 55bdf96 into master May 6, 2026
5 checks passed
@MathiasKoch MathiasKoch deleted the fix/jobs-commands-defender-qos branch May 6, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants