Skip to content

[Feat] Automation V1 — Scheduled Agent Tasks, Created via Chat (HITL) or JSON#1443

Merged
MODSetter merged 87 commits into
MODSetter:devfrom
CREDO23:feature-automations
May 28, 2026
Merged

[Feat] Automation V1 — Scheduled Agent Tasks, Created via Chat (HITL) or JSON#1443
MODSetter merged 87 commits into
MODSetter:devfrom
CREDO23:feature-automations

Conversation

@CREDO23
Copy link
Copy Markdown
Contributor

@CREDO23 CREDO23 commented May 28, 2026

Summary

Adds Automations v1: persisted, schedule-triggered agent_task runs created from chat via a single create_automation tool that drafts an AutomationCreate JSON, surfaces it in a HITL approval card (with inline JSON edit), and saves on approve. A direct create-via-JSON path is also available for technical users.

What's in v1

  • Persistence: automations, automation_triggers, automation_runs (+ Alembic 144 / 145).
  • Scheduled trigger: cron + IANA timezone, claimed via Celery beat with FOR UPDATE SKIP LOCKED.
  • Action: agent_task — invokes multi_agent_chat with auto-decision HITL resume loop.
  • RBAC: read / create / update / delete / execute gated end-to-end.
  • API: vertical-sliced /api/v1/automations + nested triggers and runs.
  • Web: list, empty state, detail (definition + triggers + runs), create-via-chat HITL card with inline JSON edit, create-via-JSON path, sidebar entry under Inbox.

Out of scope (deferred)

  • Event-based triggers (v2)
  • Manual / "Run now" trigger (parked, pending redesign)
  • In-process event bus

Testing

  • End-to-end: chat → drafter → approval card → save → list → detail.
  • One scheduled fire validated locally after the _validate_inputs passthrough fix (was silently dropping fired_at / last_fired_at / static_inputs when no schema was declared).

High-level PR Summary

This PR adds Automations v1: a complete system for creating, managing, and executing scheduled agent tasks. Users can create automations via chat using a single create_automation tool (with human-in-the-loop approval) or directly via JSON. The system includes persistence (3 new tables: automations, triggers, runs), a Celery-based scheduler that claims due triggers using FOR UPDATE SKIP LOCKED, a sequential plan executor with retries, and full RBAC integration. The only v1 action is agent_task (invokes multi_agent_chat with auto-approval/rejection loops), and the only v1 trigger is schedule (cron + IANA timezone). The web UI offers list/detail views, inline JSON editing in the approval card, and a raw-JSON creation path. All templates are rendered in a sandboxed Jinja2 environment with an audited 15-filter allowlist. Event triggers, MCP integration, and tight single-purpose actions are deferred to later phases.

⏱️ Estimated Review Time: 3+ hours

💡 Review Order Suggestion
Order File Path
1 automation-design-plan.md
2 surfsense_backend/alembic/versions/144_add_automation_tables.py
3 surfsense_backend/alembic/versions/145_add_automations_permissions_to_roles.py
4 surfsense_backend/app/automations/__init__.py
5 surfsense_backend/app/automations/persistence/enums/automation_status.py
6 surfsense_backend/app/automations/persistence/enums/run_status.py
7 surfsense_backend/app/automations/persistence/enums/trigger_type.py
8 surfsense_backend/app/automations/persistence/models/automation.py
9 surfsense_backend/app/automations/persistence/models/trigger.py
10 surfsense_backend/app/automations/persistence/models/run.py
11 surfsense_backend/app/automations/actions/types.py
12 surfsense_backend/app/automations/actions/store.py
13 surfsense_backend/app/automations/triggers/types.py
14 surfsense_backend/app/automations/triggers/store.py
15 surfsense_backend/app/automations/schemas/definition/envelope.py
16 surfsense_backend/app/automations/schemas/api/automation.py
17 surfsense_backend/app/automations/schemas/api/trigger.py
18 surfsense_backend/app/automations/schemas/api/run.py
19 surfsense_backend/app/automations/templating/allowlist.py
20 surfsense_backend/app/automations/templating/environment.py
21 surfsense_backend/app/automations/templating/render.py
22 surfsense_backend/app/automations/actions/agent_task/params.py
23 surfsense_backend/app/automations/actions/agent_task/dependencies.py
24 surfsense_backend/app/automations/actions/agent_task/invoke.py
25 surfsense_backend/app/automations/actions/agent_task/definition.py
26 surfsense_backend/app/automations/triggers/schedule/cron.py
27 surfsense_backend/app/automations/triggers/schedule/dispatch.py
28 surfsense_backend/app/automations/triggers/schedule/definition.py
29 surfsense_backend/app/automations/dispatch/run.py
30 surfsense_backend/app/automations/runtime/step.py
31 surfsense_backend/app/automations/runtime/executor.py
32 surfsense_backend/app/automations/tasks/execute_run.py
33 surfsense_backend/app/automations/tasks/schedule_tick.py
34 surfsense_backend/app/automations/services/automation.py
35 surfsense_backend/app/automations/services/trigger.py
36 surfsense_backend/app/automations/services/run.py
37 surfsense_backend/app/automations/api/automation.py
38 surfsense_backend/app/automations/api/trigger.py
39 surfsense_backend/app/automations/api/run.py
40 surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/automation/prompt.py
41 surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/automation/create.py
42 surfsense_backend/app/db.py
43 surfsense_backend/app/routes/__init__.py
44 surfsense_backend/app/celery_app.py
45 surfsense_web/contracts/types/automation.types.ts
46 surfsense_web/lib/apis/automations-api.service.ts
47 surfsense_web/lib/automations/describe-cron.ts
48 surfsense_web/atoms/automations/automations-query.atoms.ts
49 surfsense_web/atoms/automations/automations-mutation.atoms.ts
50 surfsense_web/hooks/use-automations.ts
51 surfsense_web/hooks/use-automation.ts
52 surfsense_web/hooks/use-automation-runs.ts
53 surfsense_web/app/dashboard/[search_space_id]/automations/page.tsx
54 surfsense_web/app/dashboard/[search_space_id]/automations/automations-content.tsx
55 surfsense_web/app/dashboard/[search_space_id]/automations/components/automations-table.tsx
56 surfsense_web/app/dashboard/[search_space_id]/automations/[automation_id]/page.tsx
57 surfsense_web/app/dashboard/[search_space_id]/automations/[automation_id]/automation-detail-content.tsx
58 surfsense_web/app/dashboard/[search_space_id]/automations/new/page.tsx
59 surfsense_web/components/tool-ui/automation/create-automation.tsx
60 surfsense_web/components/layout/providers/LayoutDataProvider.tsx
61 surfsense_backend/app/agents/new_chat/tools/registry.py

Need help? Join our Discord

CREDO23 added 30 commits May 25, 2026 21:48
Foundation layer for the parallel refactor of stream_new_chat.py.
Extracts the StreamResult dataclass (tracks per-turn streaming state)
and a small set of shared utilities (resume_step_prefix, safe_float).

Add-only; no existing code imports from this package yet. Existing
stream_new_chat.py keeps its inline equivalents until cutover.
…ents todos

Extracts two pure context helpers used during input-state assembly:

* mentioned_docs.format_mentioned_surfsense_docs_as_context: renders the
  user's @-mentioned SurfSense docs into the LLM context block.
* deepagents_todos.extract_todos_from_deepagents: pulls the in-progress
  todo list from a deep-agents state snapshot for the title generator.

Add-only; existing call sites in stream_new_chat.py remain untouched
until cutover.
…cement

Extracts the desktop_local_folder file-operation contract helpers:

* contract_enforcement_active: gates the contract on filesystem mode.
* evaluate_file_contract_outcome: scores tool outputs as success/no-op.
* log_file_contract: structured logging of contract verdicts.

This is the unit responsible for catching agents that claim to have
written/edited a file without actually invoking the filesystem tool.

Add-only; stream_new_chat.py keeps its inline duplicates until cutover.
…r_thread

Extracts the agent-construction wrapper that the chat streamers call to
materialize the LangGraph agent for a given thread. Centralizes how we
pass the agent factory plus checkpointer, runtime context, and the
in-memory content builder.

Add-only; pre-existing inline equivalent in stream_new_chat.py stays
until cutover.
Extracts the inner agent-streaming driver previously inlined as
_stream_agent_events in stream_new_chat.py.

stream_agent_events drives graph_stream.event_stream.stream_output and,
after the agent finishes, performs the post-stream safety-net work:

* commit any pending content the agent never explicitly finished
* evaluate file-operation contract outcomes and emit the appropriate
  contract verdict for desktop_local_folder turns

This unit is what flows/shared/stream_loop.py wraps in the rate-limit
recovery while-loop. Add-only; no existing wiring uses it yet.
Six small, single-purpose modules shared by the upcoming new_chat and
resume_chat orchestrators:

* llm_bundle: dispatches negative config_id to the YAML loader and
  non-negative config_id to the DB loader, returning (llm, AgentConfig).
* pre_stream_setup: builds the connector service, resolves the
  Firecrawl API key, and returns the chat checkpointer.
* first_frames: iter_initial_frames + iter_final_frames emit the canonical
  message-start / step-start / idle / finish / done SSE envelope.
* finalize_emit: iter_token_usage_frame emits the per-turn usage frame
  from a TokenAccumulator summary.
* finally_cleanup: close_session_and_clear_ai_responding and run_gc_pass
  centralize the finally-block bookkeeping.
* span: open_chat_request_span / set_agent_mode / close_chat_request_span /
  record_outcome_attrs wrap the OpenTelemetry chat_request span.

Add-only; these are not yet wired into stream_new_chat.py.
Centralizes the premium-credits lifecycle for chat turns:

* needs_premium_quota: gate check (premium user + non-fallback config).
* PremiumReservation: dataclass capturing reservation state + token totals.
* reserve_premium / finalize_premium / release_premium: idempotent
  reservation, commit, and rollback used by the orchestrators.

Add-only; legacy stream_new_chat.py keeps its inline quota handling
until cutover.
Extracts handle_terminal_exception: the shared except-branch behavior for
the chat orchestrators. Classifies the raised exception, logs the
structured chat_stream error event, and emits the terminal-error SSE
frame + done sentinel via the streaming service.

Add-only; nothing imports it yet.
…eam loop

Two cooperating modules that wrap stream_agent_events with in-stream
recovery from provider 429s:

* rate_limit_recovery: can_recover_provider_rate_limit truth-table
  guard, reroute_to_next_auto_pin (selects the next eligible auto-pin
  config and reloads the LLM bundle), log_rate_limit_recovered.
* stream_loop: run_stream_loop drives stream_agent_events in a
  while-True loop, delegating recovery to a flow-supplied RecoverFn
  callback so new_chat and resume_chat can share the same loop while
  keeping their own nonlocal state.

Add-only; not yet wired into any orchestrator.
Extracts finalize_assistant_message: the post-stream server-side write
of the final assistant message (with content parts + token usage)
guarded by asyncio.shield + shielded_async_session so a client
disconnect cannot abort the persist.

Add-only; legacy stream_new_chat.py keeps its inline finalize block
until cutover.
Seven focused modules that the upcoming new_chat orchestrator
composes:

* auto_pin: resolve_initial_auto_pin selects the initial config (with
  vision-capable filtering and error classification).
* llm_capability: check_image_input_capability blocks routing an
  image-bearing turn to a known text-only model.
* runtime_context: build_new_chat_runtime_context assembles the
  SurfSenseContextSchema for a new-chat turn.
* persistence_spawn: spawn_set_ai_responding_bg, spawn_persist_user_task,
  spawn_persist_assistant_shell_task, and await_persist_task background
  the four pre-stream DB writes so they overlap with agent build.
* initial_thinking_step: build_initial_thinking_step +
  iter_initial_thinking_step_frame produce the very first thinking-1 SSE
  step ("Understanding your request" / "Analyzing referenced content").
* title_gen: spawn_title_task + maybe_emit_title_update +
  await_pending_title_update background the thread-title generator and
  interleave its update into the stream when ready.
* input_state: build_new_chat_input_state assembles the LangGraph
  input_state (history bootstrap, mentions resolution, context blocks,
  human-message construction). The heavy one.

Add-only; no orchestrator yet (next commit).
…chat

Slim composition root for the new-chat streaming flow. Sequences:

1. validate inputs and load the LLM bundle (negative id => YAML)
2. open the OTEL chat_request span; set agent_mode tag
3. spawn the four pre-stream DB writes (set-ai-responding, persist
   user turn, persist assistant shell, first-assistant probe)
4. reserve premium quota (with free-fallback retry on denial)
5. build connector + checkpointer + agent + input_state
6. emit first frames (message-start, step-start, initial thinking step)
7. spawn the background title generator
8. run the shared stream_loop with a flow-local _recover closure that
   reroutes to the next auto-pin config on provider 429s
9. finalize: emit terminal title/token frames, shielded assistant
   finalize, release-or-finalize premium quota, close session, GC,
   record OTEL outcome

Public entry-point flows/new_chat/__init__ re-exports stream_new_chat.

Existing wiring (routes, tests) still imports the legacy function from
app.tasks.chat.stream_new_chat. Cutover is a later commit.
…ules

Three focused modules used by the upcoming resume-chat orchestrator:

* runtime_context: build_resume_chat_runtime_context assembles the
  SurfSenseContextSchema for a resume turn (handles empty mention
  lists, since resume requests do not carry fresh @-mentions).
* assistant_shell: persist_resume_assistant_shell writes a fresh
  assistant row for the resumed turn so the post-stream finalize
  has a target.
* resume_routing: build_resume_routing collects the pending
  interrupts across paused subagents and slices the flat list of
  ResumeDecision[] into the correct (thread, subagent) buckets so
  LangGraph routes each decision back to the right paused tool call.

Add-only; no orchestrator yet (next commit).
…public API

Slim composition root for the resume-chat streaming flow. Mirrors the
new_chat orchestrator but specialized for resumed turns:

* no fresh user turn, no title generation, no image-capability gate
* persists a fresh assistant shell for the resumed turn
* applies build_resume_routing to dispatch user decisions to the
  correct paused subagent before invoking the agent
* shares the same stream_loop + flow-local _recover closure for in-
  stream provider rate-limit recovery

Also lands flows/__init__.py, which becomes the public chat-flow API:

    from app.tasks.chat.streaming.flows import stream_new_chat, stream_resume_chat

Existing wiring (routes, contract test) still imports from the legacy
app.tasks.chat.stream_new_chat module. Cutover is the next phase.
Adds 34 tests under tests/unit/tasks/chat/streaming/ that cover the
new flows tree against the legacy stream_new_chat.py module to gate
the upcoming cutover. Coverage:

* Public entry points: stream_new_chat and stream_resume_chat are
  async generator functions whose parameter signatures (name, kind,
  annotation, default) match the legacy versions one-for-one. Uses a
  normalized-annotation comparison so PEP-563 vs eager-annotation
  representation differences are tolerated.
* Extracted helpers: image-capability gate, runtime-context builders
  for new-chat and resume-chat, LLM-bundle dispatcher, premium-quota
  needs check + reservation dataclass, rate-limit recovery truth
  table, persistence-spawn registration/self-unregistration, await
  helpers.
* SSE frame iterators: iter_initial_frames + iter_final_frames emit
  the canonical sequence; iter_token_usage_frame skips on None.
* Initial thinking step: 4 parametrized branches (text, image-only,
  empty, mentioned-docs), long-query truncation, many-docs collapse.

These tests are scaffolding for the cutover and will be removed once
the legacy module is deleted.
Track the initial v2 design document for the SurfSense automation feature.
This is the baseline snapshot of the design before applying the v1-minimum
scope narrowing (capability trimming, MCP deferral, queue-routing deferral).
Subsequent commits trim this down to the v1 scope.
Reduce the §3 Capability dataclass from ten fields to five:
id, description, input_schema, output_schema, handler. Removed
fields (name, required_credentials, side_effects,
expected_duration_seconds, cost_estimate) are reintroduced only when a
concrete consumer feature demands them. The v1 invariant is that a
Capability is a typed, named, callable unit and every consumer
(executor, agent tool layer, future HTTP API) sees the same five-field
shape.
Remove the two-tier registry, MCP database schema, harvester pseudocode,
and the lazy per-worker closure cache from §3. v1 ships with a single
in-memory native registry; the MCP design is reintroduced in Phase 4
along with the rest of the integration-tooling surface.

The deferral is additive: the v1 registry interface is the same callable
surface a Phase-4 MCP harvester will register into. No design rewrite
between phases.
Update §3 (Credentials), §7.1 (Dispatcher common path), §8 (Duration
classes and queue routing), and §13 (Decisions locked) to reflect the
v1-minimum scope:

- Credentials block in §3 collapses to a deferred-to-Phase-2 note. The
  three guarantees (no creds in definition, no creds in LLM context,
  per-call resolution) return unchanged when Phase 2 ships external
  capabilities.
- Cost-estimate pre-check in the dispatcher's common path is removed.
  Mid-flight budget kill in the executor still enforces budget_cap_usd.
- Queue routing by expected_duration_seconds is deferred. Single
  automations_default queue in v1.
- Decisions 24, 25, 26, 32-37, 38-41 marked deferred with explicit
  return phase. Three new v1-minimum decisions added (5-field
  Capability, measured-not-declared cost, single queue).

All deferrals are additive: the original designs return as-is when
warranted; nothing is rewritten between phases.
§9 (Data model): drop from six tables to three. v1 ships automations,
automation_triggers, automation_runs only. domain_events deferred to
Phase 3 (event trigger); mcp_connections/mcp_tools deferred to Phase 4
(MCP integration). Remove the table definitions for the deferred ones
and replace with a deferred-tables note pointing to the consuming
phase.

automation_triggers.type enum narrowed to schedule|manual for v1.
Webhook and event types ship with their respective phases. secret_hash
column deferred to Phase 2 alongside the webhook trigger.

automation_runs.cost_usd column deferred until at least one v1
capability records token-level cost — additive when reintroduced.

§14 (Phase 1) reorganized into four explicit steps matching the work
we're about to do: scaffolding + schemas + empty registries (step 1),
then registry population (step 2), then executor (step 3), then NL
authoring + UI (step 4). The current commit batch lands step 1 only.
Create app/automations/ with the SRP-per-file / grouped-folders layout
that mirrors app/agents/multi_agent_chat/. Twelve __init__.py files,
each a thin re-export with a single-line docstring describing the
subpackage's role, no exports yet (filled in subsequent commits).

Tree:
  app/automations/
  ├── persistence/
  │   ├── enums/      (status / type enums; one per file)
  │   └── models/     (SQLAlchemy tables; one per file)
  ├── schemas/
  │   ├── definition/ (the JSON envelope, broken by concern)
  │   ├── triggers/   (per-trigger config schemas)
  │   └── actions/    (per-action config schemas)
  └── registries/
      ├── capabilities/  (types.py + store.py)
      ├── actions/       (types.py + store.py)
      └── triggers/      (types.py + store.py)

The persistence/ folder is named to avoid surfsense_backend/.gitignore's
data/ ignore rule, which silently masked the original data/ name and
its contents from version control.

Isolation invariant: the module imports only from app.db (foundational
Base + FK targets, unavoidable) and stdlib / SQLAlchemy / Pydantic.
No imports from app.agents.*, app.services.*, app.tasks.*, app.routes.*
or any other business-logic module. Confirmed importable with no side
effects.
Three enums (one file each) plus three models (one file each), all
under app/automations/persistence/. The module imports from app.db
only (Base/BaseModel/TimestampMixin and FK targets searchspaces.id /
user.id); no business-logic imports.

Enums:
  - AutomationStatus: active | paused | archived
  - RunStatus: pending | running | succeeded | failed | cancelled
    | timed_out
  - TriggerType: schedule | manual (Phase-2/3 add webhook | event)

Models:
  - Automation: search_space-scoped, created_by_user_id (SET NULL),
    name + description, status enum, definition JSONB, version int,
    updated_at with onupdate.
  - AutomationTrigger: FK → automations (CASCADE), type enum, config
    JSONB, enabled bool, last_fired_at. Webhook secret_hash is omitted
    until Phase 2.
  - AutomationRun: FK → automations (CASCADE), nullable trigger_id
    (SET NULL — null = manual via UI), status enum,
    definition_snapshot for immutable history, trigger_payload /
    resolved_inputs / step_results / output / artifacts / error JSONB
    columns, started_at / finished_at timestamps, agent_session_id for
    linking to the LangGraph trace. cost_usd column omitted until at
    least one v1 capability records token-level cost.

Verified: Base.metadata exposes all three table names; columns and
enums introspect as documented; no linter errors.
Migration 144 -> 143. Matches the SQLAlchemy models added in commit 7
and the v1 data model in automation-design-plan.md §9.

Up:
  - CREATE TYPE automation_status / automation_trigger_type /
    automation_run_status (PostgreSQL ENUMs created first because the
    tables reference them).
  - CREATE TABLE automations with FK to searchspaces (CASCADE) and
    user (SET NULL); five indexes matching the SQLAlchemy model.
  - CREATE TABLE automation_triggers with FK to automations
    (CASCADE); four indexes.
  - CREATE TABLE automation_runs with FK to automations (CASCADE) and
    automation_triggers (SET NULL — null trigger_id == manual via UI);
    four indexes.

Down: drops every index, table, and ENUM in reverse-dependency order
so the migration is reversible without ON DELETE side effects.

Verified: `alembic history` resolves 143 -> 144 (head) cleanly.

domain_events (Phase 3) and mcp_connections / mcp_tools (Phase 4) ship
in their own migrations when the consuming feature lands; this
migration only covers the three v1 tables.
Three layers of Pydantic models under app/automations/schemas/, one
file per concern (SRP), matching the envelope in
automation-design-plan.md §5.

definition/ — the editable envelope persisted in
automations.definition:
  - envelope.py       AutomationDefinition (top-level shape)
  - plan_step.py      PlanStep (one step in the sequential plan)
  - inputs.py         InputsBlock (the inputs JSON Schema wrapper)
  - execution.py      ExecutionBlock (timeouts, retries, concurrency,
                                      budget cap, on_failure plan)
  - metadata.py       MetadataBlock (tags + created_from_nl + extras)
  - trigger_spec.py   TriggerSpec (one entry in triggers[])

triggers/ — per-trigger config schemas, dispatched by registry on the
TriggerSpec.type discriminator:
  - schedule.py       ScheduleTriggerConfig(cron, timezone)
  - manual.py         ManualTriggerConfig() — empty in v1

actions/ — per-action config schemas, dispatched by registry on the
PlanStep.action discriminator:
  - agent_task.py     AgentTaskActionConfig(prompt, tools, model,
                                            output_schema)

Design properties verified by an inline smoke test:
  - The §5 worked example round-trips through model_validate_json /
    model_dump_json byte-for-byte (InputsBlock uses
    serialize_by_alias so the JSON key stays "schema" not
    "schema_").
  - Envelope rejects unknown top-level keys (extra="forbid").
  - MetadataBlock tolerates unknown keys (extra="allow").
  - ExecutionBlock defaults apply when the block is omitted.
  - retry_backoff and concurrency are typed as Literal — bogus
    values rejected at validation time.
  - Per-type configs enforce their required fields (cron + timezone
    on schedule; non-empty prompt on agent_task).

The envelope keeps trigger and action configs as untyped dicts on
purpose — per-type validation is a registry-driven dispatch (commit
10), keeping the envelope free of every-type-knows-every-type
coupling.
Three registries under app/automations/registries/, each as its own
folder with the same SRP-per-file split (types.py for the dataclass,
store.py for the in-memory dict + register/get/all functions). All
three start empty; concrete entries land when the user signs off on
which capabilities / actions / triggers to include (step 2).

Capability (locked at v1-minimum five fields — see commit 2):
  - id, description, input_schema, output_schema, handler
  - CapabilityHandler = Callable[[dict[str, Any]], Awaitable[Any]]
  - Frozen, slotted dataclass (immutable post-registration).

ActionDefinition (v1-trim of design plan §4):
  - type, name, description, config_schema, handler
  - Defers output_contract (handled per-step by agent_task's
    config.output_schema), uses_capabilities (no static analysis
    needed until >1 action ships), and produces_artifacts (deferred
    alongside the artifact pipeline).

TriggerDefinition (declarative, no handler):
  - type, description, config_schema, payload_schema
  - No handler field — firing is a single dispatcher's
    responsibility, not a per-trigger one.

store.py contract for all three:
  - register_*: idempotent at process startup, raises on duplicate
  - get_*: returns None on miss
  - all_*: returns a defensive copy of the registry dict

Verified by an inline smoke test (10 checks): empty initial state,
registration and lookup work, duplicates raise, frozen dataclasses
reject mutation, snapshots are copies, handlers are awaitable.

Isolation invariant audit: grep across the full app/automations/
tree shows only three app.* imports, all of them
``from app.db import BaseModel, TimestampMixin`` in the model files.
No imports from app.agents.*, app.services.*, app.tasks.*,
app.routes.*, or any other business-logic module.
Cut the docstrings and Field(description=...) text across the entire
automations/ tree down to single-line intent statements, matching the
multi_agent_chat conciseness style:

- Module docstrings: one line stating what the file is.
- Class docstrings: deleted when the class name + module docstring
  already cover intent; kept only where they add a constraint or
  rationale not visible in the signature.
- Pydantic Field descriptions: short noun phrases / clauses, not
  full sentences. Reasoning that belonged in the design plan moved
  out of the code.
- Enum values: per-value docstrings replaced with terse inline
  comments where the meaning isn't obvious from the name.

Behaviour is unchanged. The same 33 files, same public surface, same
imports — verified by re-running the 10-point registry smoke test and
the 8-point schema round-trip / constraint suite from commits 9 and
10.

LOC: 1180 → 691 (-42%).
A run can contain zero, one, or N agent_task steps. A single
agent_session_id at the run level holds at most one of them, so the
column is the wrong shape for the data.

Per-step session ids (LangGraph thread/checkpoint reference for an
agent_task step) live inside step_results[i] alongside the rest of
the per-step bag (status, timings, output). Each agent step records
its own; non-agent steps record nothing. Run-level "primary session"
is a UI concern, not a schema concern.

Trade-off: trace -> run reverse lookup is now a JSONB query, not an
index hit. Usually traversal goes run -> trace; if the reverse
becomes hot we add a GIN index on step_results or a generated
column — both additive.

Changes:
- AutomationRun: drop the agent_session_id column; module docstring
  notes where per-step session ids now live.
- Migration 144: drop the column from the CREATE TABLE; downgrade
  unchanged.

Safe to edit migration 144 in place (vs. add 145 with ALTER ... DROP):
this branch has not shipped and the table has never existed in any
deployed database.
Re-apply the trim style after the prior refactor commit re-introduced
a multi-line docstring on AutomationRun.

- AutomationRun: drop the four-line docstring explaining where
  per-step session ids live; move the note to a single-line inline
  comment right above ``step_results`` where it's actionable.
- AutomationDefinition: drop the design-plan cross-reference; the
  module docstring already establishes what the file is.

No behaviour change.
CREDO23 added 13 commits May 27, 2026 22:29
Manual-as-a-standalone-trigger conflates "user clicks Run now" with the
trigger model and forces ad-hoc input plumbing on the caller. Remove the
unreachable surface so the tree reflects reality (schedule is the only
v1 trigger).

- Unregister `manual`: drop import from triggers/__init__.py
- Delete `app/automations/triggers/manual/`
- Drop `RunService.dispatch_manual` (RunService is now read-only)
- Drop `POST /automations/{id}/run` and `RunDispatched` schema
- Keep `TriggerType.MANUAL` Python + PG enum value (reserved, documented)
  to avoid an Alembic round-trip when Run-now is redesigned
…ove → save)

Single tool exposed to the main agent. The main agent passes a natural-language
`intent`; a focused drafter sub-LLM turns it into a full AutomationCreate JSON;
that JSON is surfaced via request_approval (action_type "automation_create") so
the user can edit/approve it on a frontend card; on approval the tool persists
via AutomationService. Three phases, one tool call.

Scope split:
- main agent sees only `intent: str` (no schema knowledge leaks into the calling
  graph) — prompt fragments scoped accordingly.
- drafter sub-LLM owns the schema + few-shot intent→JSON examples — lives in
  the generating graph's prompt (tools/automation/prompt.py).

Files:
- main_agent/tools/automation/{create.py, prompt.py, __init__.py}: new tool
  + drafter system prompt with two few-shot intent→JSON examples.
- system_prompt/prompts/tools/create_automation/{description.md, example.md}:
  intent-only guidance for the main agent.
- main_agent/tools/index.py: add create_automation to the main-agent allowlist.
- new_chat/tools/registry.py: deferred-import factory to break the
  multi_agent_chat ↔ registry cycle; one ToolDefinition entry.
Backend already defined automations:create/read/update/delete/execute and
seeded them on Owner/Editor/Viewer roles, but the Settings → Roles UI was
missing the metadata to render them properly.

- backend: add PERMISSION_DESCRIPTIONS entries for the 5 automations perms so
  the role editor stops falling back to "Permission for automations:create".
- frontend: add automations to CATEGORY_CONFIG (Workflow icon, slotted between
  podcasts and connectors) so the role editor groups them as a real section.
- frontend: extend the three ROLE_PRESETS — Editor and Contributor get
  create/read/update/execute (mirroring backend Editor); Viewer gets read.

Prep work for the automations frontend; canPerform/usePermissionGate already
handle the runtime gating, so no new hook is needed.
DELETE endpoints in the automations API return 204; calling .json() on
an empty body throws SyntaxError. Treat 204 as data=null and skip
schema validation so callers can opt out of response bodies without
errors or spurious schema-mismatch warnings.

Also drops a pre-existing 'unknown → BodyInit' type error on the
non-JSON body branch via a narrow cast (caller is responsible for
passing a real BodyInit when Content-Type isn't application/json).
Foundation for the v1 automations UI. Mirrors backend Pydantic schemas
into Zod and wires the data layer end-to-end so feature surfaces can
be built on top.

contracts/types/automation.types.ts:
- AutomationStatus, TriggerType, RunStatus enums.
- AutomationDefinition envelope (PlanStep, TriggerSpec, Execution,
  Metadata, Inputs).
- AutomationCreate/Update/Detail/Summary/List + listParams.
- TriggerCreate/Update/Detail.
- RunSummary/Detail/List + runListParams.

lib/apis/automations-api.service.ts:
- list/get/create/update/delete automations.
- add/update/remove triggers (sub-resource).
- list/get runs (read-only sub-resource).
- safeParse on every write, 204-safe deletes.

atoms/automations/:
- automationsListAtom (active search space, first page).
- 6 mutation atoms with toast + cache invalidation.

hooks/:
- use-automations.ts wraps the list atom.
- use-automation.ts: parameterized detail by id.
- use-automation-runs.ts: useAutomationRuns + useAutomationRun.

lib/query-client/cache-keys.ts: automations namespace (list, detail,
runs, run) keyed by (id, limit, offset) where relevant.

Smoke: zod round-trip OK on backend-shape payloads (Automation,
AutomationCreate, Trigger, Run); typecheck clean for new files;
biome clean.
Vertical slice at /dashboard/[id]/automations. The page is read-only by
default; every action gates on backend automations:* permissions via a
co-located permissions hook so adding/removing surfaces stays a
one-file change.

Route:
- page.tsx — server boundary; extracts search_space_id.
- automations-content.tsx — client orchestrator (loading / no-access /
  error / empty / table branches).

Components (one concern per file):
- automations-header.tsx — title + count + "Create via chat" CTA.
- automations-table.tsx + automation-row.tsx — name/status/updated
  columns; row name links to detail (PR4).
- automation-status-badge.tsx — active / paused / archived pill.
- automation-row-actions.tsx — ⋯ menu with pause/resume + delete,
  gated on canUpdate / canDelete. Archived rows hide the toggle.
- delete-automation-dialog.tsx — destructive confirm; mentions FK
  cascade explicitly so users know triggers/runs go too.
- automations-empty-state.tsx — zero-state pointing to chat (creation
  is intent-driven via the create_automation HITL tool, not a form).
- automations-loading.tsx — skeleton rows in the same shell so the
  layout doesn't shift on data arrival.
- automation-triggers-summary.tsx — small cron-describer (daily,
  weekdays, weekly, monthly, hourly) + timezone for the detail page.
  Kept inline since v1 only registers schedule.

Hooks:
- use-automation-permissions.ts — single source of truth for the
  slice's canCreate/canRead/canUpdate/canDelete/canExecute gates,
  backed by myAccessAtom.

Pause/resume and delete reuse the PR2 mutation atoms, so list +
detail caches stay coherent without bespoke invalidation.

Out of scope (later PRs):
- detail route (definition viewer + triggers manager) — PR4
- raw JSON editor — PR5
- nav entry / sidebar wiring — small follow-up PR
Adds an "Automations" nav entry rendered explicitly between Inbox and
(on mobile) Documents, mirroring how those two are pulled out of the
nav list and rendered above the chat sections. The icon is Workflow
to match settings/RBAC labelling.

LayoutDataProvider:
- Adds the entry to navItems pointing at /dashboard/[id]/automations.
- Marks isActive via pathname so the row highlights on the route.
- Tags /automations as a workspace-panel page so it renders in the
  centered settings-style viewport (same chrome as Team / settings).

Sidebar:
- Pulls out automationsItem alongside inboxItem and documentsItem.
- Renders it between them.
- Excludes its URL from footerNavItems so it doesn't double-render.

Page-level RBAC still gates the actual view; the sidebar entry is
always visible (consistent with Inbox/Documents which are also not
gated at the nav layer).

Anonymous (FreeLayoutDataProvider) intentionally not touched —
automations is an authenticated feature.
The empty-state card already hosts the primary "Create via chat" CTA;
keeping the header button on the same screen showed two identical
buttons. Adds an optional ``showCreateCta`` prop to AutomationsHeader
(default true) and turns it off only in the empty branch so the card
stays the focal point.
Vertical slice at /dashboard/[id]/automations/[automation_id]. Branches
in the orchestrator are: perms loading → skeleton, no-access → access
denied panel, bad id → not-found, fetch loading → skeleton, fetch
error → not-found, loaded → header + definition + triggers.

Route:
- page.tsx — server boundary; extracts both ids.
- automation-detail-content.tsx — client orchestrator.

Header:
- automation-detail-header.tsx — back link, name, status badge,
  description, pause/resume + delete actions. Delete navigates back to
  the list via a new onDeleted hook on DeleteAutomationDialog so the
  list page (where the row just vanishes) stays unaffected.
- automation-not-found.tsx — 404/403/NaN-id panel. We don't
  distinguish missing vs. forbidden in the UI.

Definition (read-only in v1):
- automation-definition-section.tsx — wrapper Card; renders goal +
  tags + execution defaults + inputs schema (if present) + plan.
- plan-step-card.tsx — one step (when, output_as, retries, timeout,
  params JSON).
- execution-summary.tsx — timeout / max_retries / backoff /
  concurrency + on_failure step count.
- inputs-schema-preview.tsx — formatted JSON of inputs.schema; only
  rendered when the definition declares inputs.

Triggers:
- automation-triggers-section.tsx — wrapper Card, "Add via chat" CTA
  (creation is intent-driven, same philosophy as automations).
- trigger-card.tsx — schedule + timezone + cron, last/next fire
  hints, static_inputs JSON, enable Switch and remove button.
- delete-trigger-dialog.tsx — confirm + mutation atom.

Shared:
- lib/describe-cron.ts — moved out of automation-triggers-summary.tsx
  so both list and detail can describe schedules consistently
  (daily/weekdays/weekly/monthly/hourly, raw cron fallback).

Loading:
- automation-detail-loading.tsx — same shell as the loaded view so the
  layout doesn't jump on data arrival.

RBAC: each interactive surface is independently gated
(canUpdate/canDelete/canCreate) so the orchestrator stays thin and the
component tree is self-documenting about what each action requires.

Out of scope (later PRs):
- Editing definition / trigger params (raw-JSON path) — PR5
- Run history — PR6
Closes the create loop in chat: the agent describes user intent → the
drafter sub-LLM produces an AutomationCreate JSON → this card surfaces
a structured preview → approve persists; reject cancels. Edits flow
through chat refinement (re-call with a refined intent), not in-card,
so the card stays simple and the multi-turn checkpointer carries the
context.

Tool UI (components/tool-ui/automation/):
- create-automation.tsx — entry dispatcher + ApprovalCard chrome
  (pending/processing/complete/rejected via useHitlPhase) + SavedCard
  (links to the detail page) + InvalidCard (lists drafter validation
  issues) + ErrorCard (verbatim message). Rejection result is hidden
  because the approval card itself shows the rejected phase inline.
- automation-draft-preview.tsx — structured preview body: name +
  description + goal, triggers (humanised cron + tz + static-input
  keys), plan steps (step_id → action), and a collapsible raw JSON
  for power users.

Wiring:
- components/tool-ui/index.ts — re-export.
- features/chat-messages/timeline/tool-registry/registry.ts —
  register create_automation → CreateAutomationToolUI (dynamic import,
  same pattern as other connector tools).
- contracts/enums/toolIcons.tsx — Workflow icon + "Create automation"
  display name so fallback chrome (and timeline headers) are honest.

Shared util:
- lib/automations/describe-cron.ts — lifted from the route slice's
  lib/ folder since both the dashboard slice and the new approval card
  now render schedule descriptions. Slice imports updated; the now-
  empty slice lib/ folder is gone.

Backend prompt fragments:
- main_agent/system_prompt/.../create_automation/description.md and
  the tool's docstring no longer promise in-card edits. They make the
  refinement path explicit: if the user wants changes after seeing the
  draft, they reply in chat and the agent calls the tool again with a
  refined intent.

v1 deliberately excludes:
- In-card edit form / right-side edit panel — defer until we see real
  demand. The chat refinement loop covers the common case.
- approve_always / persistent allow rules — automations are a single
  artifact, not a repeated mutation, so the "trust this kind of call"
  affordance doesn't apply.
Recent runs card under triggers. Each row expands lazily to fetch the
full run (step results, output, artifacts, error). 20-row cap for now;
real pagination lands if usage demands it.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 28, 2026

@CREDO23 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b73d2fd7-0565-49bf-8010-38a5463aeef0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CREDO23 CREDO23 marked this pull request as draft May 28, 2026 12:34
CREDO23 added 12 commits May 28, 2026 15:38
Cover the cron + IANA timezone + UTC normalization contract for the
schedule trigger: next-match strictly-after, DST offset shift across
spring-forward, malformed cron / unknown timezone rejection, and the
ScheduleTriggerParams Pydantic gate that surfaces InvalidCronError as
ValidationError at the API boundary.

8 tests, pure unit (no DB, no mocks).
Cover the input-validation contract dispatch_run relies on:

- no declared schema → inputs pass through unchanged (regression site
  that previously stripped runtime keys like fired_at / last_fired_at
  and broke Jinja templates).
- declared schema, valid inputs → passthrough validated.
- declared schema, invalid inputs → DispatchError (uniform exception
  type, not raw jsonschema.ValidationError).

Plus the DispatchError exception identity (Exception subclass, message
preserved, isinstance-friendly for the dispatch layer's consumers).

4 tests, pure unit.
execute_step (6 tests): happy path, when=falsy → skipped, unknown action
→ ActionNotFound failure, retry budget exhaustion (attempts = 1 +
max_retries), retry recovery, and template-rendering of step params
against the run context.

with_retries (3 tests): first-try success returns attempts=1, recovery
returns the actual attempt that produced the result, and exhaustion
re-raises the last exception with the handler called 1 + max_retries
times.

All tests use backoff="none" to keep wall-clock time zero; timeout
testing is intentionally skipped (would need >= 1s per the int contract,
and exhaustion already locks that any Exception triggers retry).
render.py (4): variable substitution, StrictUndefined raises on missing
keys, evaluate_predicate coerces to bool, render_value walks dicts/lists
and renders string leaves.

filters.py (4): slugify produces URL-safe output, date formats datetime
with strftime, date(None) → "" so templates can write
{{ inputs.last_fired_at | date }} on first run, date(str) passes through.

environment.py (4): the sandbox boundary — disallowed Jinja built-ins
(e.g. pprint) raise, and the finalize hook coerces non-string outputs
to predictable wire shapes (datetime → ISO, None → "", dict → JSON).

context.py (1): build_run_context exposes {run, inputs, steps} with the
exact shape every plan template body relies on.

13 tests total, all pure unit.
…alize)

auto_decide.build_auto_decisions (3): produces one decision per
action_request entry, defaults to one decision for legacy scalar
interrupts, and skips malformed interrupts silently so a misbehaving
tool can't take down the whole agent_task step.

finalize.extract_final_assistant_message (4): string-content AIMessage
returned verbatim, list-of-parts content concatenated (skipping
non-text parts like tool_use), walks back past trailing ToolMessages
to find the last AIMessage, and returns None when no extractable text
is present (so callers can branch on silence vs. empty).

7 tests, pure unit.
definition/ (29 tests): the envelope (defaults, extra=forbid, empty
plan/name rejection), Inputs schema-alias roundtrip (Python schema_ ↔
wire schema), PlanStep numeric bounds + addressing-field constraints,
Execution production defaults stability (10-min timeout, 2 retries,
exponential backoff, drop_if_running) + closed-set Literal gates,
Metadata's exceptional extra="allow" contract, and TriggerSpec type
requirement.

api/ (9 tests): AutomationCreate/Update cascade-validate into the
nested definition, reject unknown payload fields, enforce name length;
TriggerCreate exposes safe defaults (enabled=True, params={},
static_inputs={}) and rejects unknown TriggerType strings at the
boundary.

All pure unit, no DB.
…ared fixtures

Top-level tests that span multiple submodules:

- test_stores.py (7): the trigger + action registry contracts — register
  round-trip, unknown type → None (not raise), duplicate registration
  rejected, defensive snapshot from all_*.
- test_definition_types.py (2): params_schema property on both
  ActionDefinition and TriggerDefinition reflects the Pydantic model.
- test_persistence_enums.py (3): exact string values + member sets of
  AutomationStatus / RunStatus / TriggerType — the postgres-mirrored
  contract that breaks stored rows if drifted.
- test_import_registrations.py (2): the bundled agent_task action and
  schedule trigger self-register on package import (canary for the
  side-effect import chain).

conftest.py adds isolated_action_registry / isolated_trigger_registry
fixtures: snapshot + restore of the module-level _REGISTRY dicts so
tests that add their own definitions don't leak across the suite.

14 tests, pure unit.
…ry PoolTimeout

The shared AsyncPostgresSaver caches DB connections in a module-level
pool. Cached connections are bound to the asyncio loop that opened
them, but `run_async_celery_task` discards the loop on each task's
exit — so after the first task the pool holds connections pointing
to a dead loop, and the next automation hangs 30s before failing
with `PoolTimeout: couldn't get a connection after 30.00 sec`.

Swap agent_task to `InMemorySaver`; automation runs only need state
within one Celery task, so nothing is lost. Site-local TODO tracks
the proper future fix (dispose the checkpointer pool around each
Celery task, mirroring `_dispose_shared_db_engine`).
@CREDO23 CREDO23 marked this pull request as ready for review May 28, 2026 19:29
@MODSetter MODSetter merged commit 4dda02c into MODSetter:dev May 28, 2026
13 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants