feat(#1068): deprecate per-task timeout_seconds override (task-shape demotion PR 1)#1070
Merged
Conversation
…demotion PR 1) First of the six task-shape demotions from docs/planning/ACTOR_MODEL_TASK_DEMOTION_MAP.md. The per-task ParallelTaskRequest.timeout_seconds override is deprecated — the agent's execution_timeout_seconds (#665) / schedule cap (#913, clamped to the agent cap by #929) is the authoritative deadline. - routers/chat.py: _resolve_deprecated_task_timeout() normalizes the override once at /task entry — clamps it to the agent cap (closing the prior unclamped escape around the #929 invariant that schedules already respect) and emits a deprecation warning on any use. The common no-override path is unchanged and skips the cap lookup. - models.py / MCP chat_with_agent: mark the field DEPRECATED in its surface. - tests: hermetic unit coverage of the clamp / warn / pass-through branches (AST-extracts the pure helper to avoid the router's heavy import graph). Honored-but-clamped this release; field removed in a follow-up after one release of soak (additive-first). Refs #945, #991.
2711697 to
d2a1006
Compare
…n soak gate
Logs rotate; the deletion soak gate ("has the per-task timeout override gone
quiet?") shouldn't depend on log retention. On each deprecated use, bump a
Redis HASH (deprecation:task_timeout_seconds) keyed by UTC day plus a
last_seen stamp, self-expiring 180d after traffic stops. The gate becomes a
cheap `redis-cli HGETALL deprecation:task_timeout_seconds` before the
field-deletion PR.
Best-effort — wrapped so a Redis hiccup never affects task dispatch; only fires
on the warning path, so no-override requests don't touch Redis.
Audit confirmed zero in-repo originators of the field: event_subscriptions
doesn't send timeout_seconds, and backlog_service only replays the
already-clamped persisted value. So any counted hit attributes to an
external/out-of-repo caller — the exact soak target.
10 tasks
|
Resolve by running |
…-timeout # Conflicts: # src/backend/routers/chat.py
Contributor
|
Merged to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First of the six task-shape demotions from
docs/planning/ACTOR_MODEL_TASK_DEMOTION_MAP.md(shipped in #991). Deprecates the per-tasktimeout_secondsoverride onParallelTaskRequest— the agent'sexecution_timeout_seconds(#665) / schedule cap (#913, clamped to the agent cap by #929) is the authoritative deadline.Additive-first: honored-but-clamped this release, field removed in a follow-up after one release of soak.
Closes #1068. Part of #945.
What changed
routers/chat.py— new pure helper_resolve_deprecated_task_timeout()normalizes the override once at/taskentry (mutating the request in place so every downstream site — capacity acquire, sync/asyncexecute_task, persisted backlog payload — sees one resolved value and the warning fires exactly once). It:models.py/ MCPchat_with_agent— field markedDEPRECATED (#1068)in its surface.tests/unit/test_1068_task_timeout_demotion.py— hermetic unit coverage of clamp / warn / pass-through / boundary / never-exceeds-cap. AST-extracts the pure helper to avoid the router's heavy import graph (same convention astest_backlog.py).Scope / audit
git grep timeout_secondsaudited. OnlyParallelTaskRequest.timeout_seconds+ the MCPchat_with_agentper-task param are in scope. The scheduler (src/scheduler/service.py, usesschedule.timeout_seconds), fan-out (own outer-deadline model),internal.py, andsessions.pyall use separate models/owners — untouched.Test plan
pytest tests/unit/test_1068_task_timeout_demotion.py)py_compileclean (chat.py, models.py)uvicorn --reloadrunning the edit):99999→ clamped to 3600, logs[#1068] … exceeds the agent cap 3600s; clamping30→ honored, logs[#1068] … deprecated … agent cap (3600s) is authoritative[#1068]line (hot path untouched)node_modules); MCP change is a.describe()string only, no type change.Follow-up
After one release of soak: delete the field from
ParallelTaskRequest, drop the MCP param, and updatebacklog_service.pyreconstruction. That's the second half of demotion PR 1.🤖 Generated with Claude Code
Soak gate (how the follow-up deletion PR knows it's safe)
The field is deleted in a follow-up, but only once it's gone quiet. Two facts make that gate trustworthy:
Zero in-repo originators. Audited: nothing in Trinity originates
ParallelTaskRequest.timeout_seconds—event_subscriptions.pydoesn't send it,backlog_service.pyonly replays the already-clamped persisted value. The field is set exclusively by external/out-of-repo callers (MCPchat_with_agent, agent-authored skills in their own template repos, external API scripts, per-instance eu2 configs). Those are structurally invisible to a static audit of this repo — the runtime signal is the only way to find them.Durable usage counter. Each deprecated use bumps a Redis hash that survives log rotation:
Before opening the deletion PR, check there are no hits for the trailing soak window (≥1 release). Self-expires 180d after traffic stops.
Fail-safe on deletion:
ParallelTaskRequesthas nomodel_config, so Pydantic defaults toextra="ignore". A straggler that still sendstimeout_secondsafter deletion has it silently dropped and falls back to the agent cap — degrades, doesn't 422. So even a caller both the audit and the soak miss fails safe.Optional upgrade: a
B-04-style canary invariant could push a Slack alert if the counter is still non-zero near a planned deletion date, replacing the manual HGETALL check. Not built here.