Skip to content

Add Ricky cloud workflow scheduling commands#85

Merged
khaliqgant merged 1 commit intomainfrom
codex/cloud-workflow-schedule-ricky
May 10, 2026
Merged

Add Ricky cloud workflow scheduling commands#85
khaliqgant merged 1 commit intomainfrom
codex/cloud-workflow-schedule-ricky

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented May 9, 2026

Summary

  • add ricky schedule <artifact> --cron/--at and ricky schedules CLI surfaces
  • route Ricky scheduling through the Relay SDK @agent-relay/sdk/workflows schedule helpers instead of duplicating Cloud endpoint calls
  • document scheduling commands in help text and cover them in CLI tests

Verification

  • npm run typecheck
  • npx vitest run src/surfaces/cli/commands/cli-main.test.ts
  • npm run build

Companion Relay SDK/CLI PR: AgentWorkforce/relay#831
Companion Cloud PR: https://github.com/AgentWorkforce/cloud/pull/511

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Warning

Rate limit exceeded

@khaliqgant has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 35 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: daecf635-9b8b-44ed-8a35-fca6e8acdcbf

📥 Commits

Reviewing files that changed from the base of the PR and between 98f3c30 and 21d3a61.

📒 Files selected for processing (5)
  • src/cloud/api/index.ts
  • src/cloud/api/workflow-schedules.ts
  • src/surfaces/cli/commands/cli-main.test.ts
  • src/surfaces/cli/commands/cli-main.ts
  • src/surfaces/cli/flows/power-user-parser.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/cloud-workflow-schedule-ricky

Comment @coderabbitai help to get the list of available commands and usage tips.

@khaliqgant khaliqgant force-pushed the codex/cloud-workflow-schedule-ricky branch from 0578180 to 21d3a61 Compare May 9, 2026 19:45
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-09T19-46-28-000Z-HEAD-provider
Mode: provider
Git SHA: 6335c66

Passed: 5 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope**  
- Extend the **Ricky** repository to surface that it now leverages the **Agent Assistant** (AA) for deeper coordination, verification, and authoring.  
- Update documentation, CLI help, and the internal `ricky` package to reference AA‑specific APIs and contracts.  
- Add a deterministic verification gate that checks AA‑contract compliance during the CI pipeline.  
- Provide a minimal test suite that validates the new AA integration points.

---

## 1. Repository Changes

| Path | Change | Reason |
|------|--------|--------|
| `README.md` | Add “Deep Agent Assistant Integration” section; update badge to `AA‑v2.1` | Public‑facing docs must advertise the new capability. |
| `docs/architecture.md` | Insert diagram and description of the AA coordination layer (request routing, verification gates, feedback loops). | Gives developers a mental model of the new workflow. |
| `ricky/__init__.py` | Export `agent_assistant` submodule; bump `__version__` to `0.4.0`. | Makes the AA API part of the public surface. |
| `ricky/agent_assistant/__init__.py` | New package exposing `AAClient`, `AAContract`, and `AAValidator`. | Encapsulates all AA interactions. |
| `ricky/agent_assistant/cl
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑09)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 7 of 12 active agents have a `turn‑context` contract in their `workflow.yaml`. | 100 % of production agents. | Write contracts for the remaining 5 agents (e.g., `data‑ingest`, `alert‑router`, `nlp‑summarizer`). |
| **Deterministic verification gates** | 4 agents use the `turn‑context‑gate` step; 8 rely on informal tests. | Every agent must include a gate that asserts `context.id` and `context.version` are present and immutable across the turn. | Add gate to the 8 agents; create a shared `turn‑context‑gate.yml` template. |
| **Review artifacts** | PRs for `turn‑context` changes include a “Context Impact” checklist 62 % of the time. | 100 % checklist compliance. | Update the PR template (`.github/PULL_REQUEST_TEMPLATE.md`) to make the checklist mandatory. |
| **Validation loops (80‑100)** | Average of 42 loops per turn for agents that have the gate; 12 loops for those without. | ≥80 loops per turn (full coverage of edge‑case permutations). | Expand the `turn‑co
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move the **workflow‑generation, validation, debugging, staged‑CLI UX, and blocker/evidence‑wording** capabilities into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Current Placement | Why it belongs in Agent Assistant |
|------------|-------------------|-----------------------------------|
| **Workflow generation** | Distributed across CLI helpers and ad‑hoc scripts. | Agent Assistant already owns the *contract‑first* authoring pipeline (see `workflow/contract/`). Centralizing generation ensures a single source of truth for the **workflow contract** and lets the assistant emit deterministic `*.wf.yaml` artifacts. |
| **Validation (deterministic gates)** | Scattered in CI jobs and manual checks. | Validation is a core **reliability gate**. Embedding it in Agent Assistant enables the **80‑to‑100 validation loop** (see “validation loops” in the specs) to be enforced automatically before any PR merge. |
| **Debugging utilities** | Separate `debug/` scripts that invoke the runtime directly. | Agent Assistant can expose a **debug‑mode flag** (`--debug`) that runs the workflow in a sandbox, captures
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (May 15 – Aug 7 2024)*  

---

## 1. Slice Goal
Deploy the next‑generation reliability & coordination features of Ricky to **30 % of active AgentWorkforce teams** (≈ 45 teams) while establishing a repeatable, gated adoption pipeline that can be scaled to 100 % in subsequent slices.

---

## 2. Scope & Boundaries  

| **In‑Scope** | **Out‑of‑Scope** |
|--------------|-----------------|
| • New **deterministic verification gates** for workflow contracts (pre‑merge, post‑merge, runtime)  | • Re‑architecting the core execution engine (deferred to v3) |
| • **80‑to‑100 validation loops** on a representative sample of 5 workflows per team | • Migration of legacy “ad‑hoc” scripts not under version control |
| • Automated **review artifacts** (contract diff, gate‑report, blocker log) | • UI redesign of the AgentWorkforce console |
| • **Scoped branch/PR strategy**: `adopt/ricky‑v2‑<team‑id>` | • Direct production pushes without PR (must go through gate) |
| • Training & “adoption playbook” for team leads | • Custom integrations that require external vendor approval |

---

## 3. Milestones & Deliverables 
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, contract‑first workflow package** that lets end‑users invoke Ricky from Slack (or a web chat widget) and receive deterministic, review‑ready assistance for workflow reliability, coordination, and authoring tasks.

| Component | Purpose | Interface | Contract (inputs / outputs) | Verification Gate |
|-----------|---------|-----------|-----------------------------|--------------------|
| **1️⃣ Ricky‑Core Service** | Core engine that parses user intent, selects the appropriate AgentWorkforce sub‑workflow, and returns a structured response. | HTTP POST `/ricky/v1/execute` (JSON) | **Input**: `{ "sessionId": string, "channel": "slack"|"web", "payload": { "text": string, "metadata": object } }`  <br>**Output**: `{ "status":"ok"|"error", "reply": string, "actions": [{ "type":"link"|"button", "label":string, "url":string }] , "nextStep": string? }` | Unit‑test suite (≥ 90 % coverage) + contract test with Pact (consumer: Slack‑Bot, provider: Ricky‑Core). |
| **2️⃣ Slack‑Adapter** | Thin wrapper that translates Slack events into the Core contract and posts replies back to Slack. | Slack Events API (mess
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

Running  

```bash
ricky --mode local --spec "generate a workflow for package checks"

invokes Ricky in spec‑only mode. In this mode:

  1. Specification parsing – Ricky reads the --spec string, validates its syntax against the AGENTS.md contract, and expands it into a workflow definition (YAML/JSON) that follows the shared authoring rules.
  2. Static verification – The generated workflow is passed through the deterministic verification gates:
    • Schema validation (workflow‑schema.json)
    • Contract compliance (inputs/outputs, required steps, naming conventions)
    • Safety checks (no circular dependencies, no privileged actions without explicit requires: admin flag)
  3. Artifact emission – The resulting workflow file is written to ./ricky-output/<timestamp>-package-checks.yaml and a verification report (verification-report.txt) is printed to stdout.
  4. No execution – Because --run is omitted, Ricky does not launch any agents or invoke any external tools. It only produces the declarative artifact.

Typical output

[INFO] Parsing spec: "generate a workflow for package checks"
[INF
...[truncated]

Must

  • Say generation is the default and execution was not requested.
  • Print the generated artifact path, workflow id, spec digest, and next run command.
  • Avoid showing execution evidence for a generation-only request.

Must Not

  • Imply the workflow ran automatically.
  • Present a generation-only result as execution success.
  • Hide the opt-in commands for running the artifact.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.first-run-copy-is-compact-and-truthful (cli-behavior/manual)

User message

Render Ricky's first-run CLI onboarding for a new user.

Ricky output

**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the Ricky CLI. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, honest blocker reporting).

---

## 1️⃣ Repository Layout (for reference)

ricky/
├─ .github/
│ └─ workflows/
│ └─ onboarding.yml # CI gate for onboarding script
├─ docs/
│ └─ onboarding.md # Human‑readable guide (generated)
├─ src/
│ └─ cli/
│ └─ onboarding/
│ ├─ init.py
│ └─ main.py # <‑‑ entry point
├─ tests/
│ └─ onboarding/
│ └─ test_onboarding.py
├─ .ricky/
│ └─ config.yaml # user‑specific config (generated)
├─ pyproject.toml
└─ README.md


---

## 2️⃣ Onboarding Contract (YAML)

```yaml
# .ricky/onboarding_contract.yaml
name: Ricky First‑Run Onboarding
description: |
  Interactive CLI that bootstraps a new user’s environment,
  validates toolchain, creates a scoped config, and registers the user
  with the central AgentWorkfor
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky – Handling an Empty or Missing Specification File**

Below is the concrete, deterministic workflow you should follow whenever a user submits an empty spec or a spec file that cannot be found. The contract is written as a reusable **Ricky‑Spec‑Repair** workflow that can be dropped into any repository that follows the AgentWorkforce standards.

---

## 1️⃣ Workflow Contract: `Ricky‑Spec‑Repair`

| **Stage** | **Input** | **Output** | **Verification Gate** |
|-----------|-----------|------------|-----------------------|
| **Detect** | `spec_path` (string) – path supplied by the user (may be empty or non‑existent) | `detect_status` ∈ {`FOUND`, `EMPTY`, `MISSING`} | `assert exists(spec_path) or spec_path == ""` |
| **Report‑Blocker** | `detect_status` = `EMPTY` or `MISSING` | Issue #`<branch‑id>` opened with label **`blocker/spec-missing`** and a templated comment (see §2) | `assert issue.created && issue.labels.contains('blocker/spec-missing')` |
| **Create‑Placeholder** (optional) | `detect_status` = `MISSING` | New file `specs/<branch‑id>.md` containing a **Spec Skeleton** (see §3) | `assert file_exists && file_contains('# Specification Skeleton')` |
| **Request‑Spec** | `det
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Configuration
• Active workflow contracts : 0
• Pending PRs / branches : none

🛠️ System Health
• Scheduler : RUNNING
• Queue length : 0
• Last heartbeat :

⚠️ Blockers
• Google provider not confirmed – actions requiring Google APIs will be skipped.
• GitHub provider not confirmed – repository sync / PR automation disabled.

📝 Next Steps

  1. Run provider checks:
    ricky provider check --google
    ricky provider check --github
  2. Resolve any authentication or network issues reported.
  3. Re‑run ricky status to see updated connectivity.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**  
*Purpose*: Apply a small, self‑contained change to a repository (e.g., typo fix, comment update, config tweak) while guaranteeing traceability, deterministic verification, and safe merge.  

---  

## 1. Scope & Branching Contract  

| Item | Definition |
|------|------------|
| **Change Type** | Primitive (single‑file, < 200 LOC, no API impact) |
| **Target Repo** | Any GitHub repo under the organization’s `github.com/<org>/<repo>` namespace |
| **Branch Naming** | `feat/primitive‑<short‑desc>-<YYYYMMDD>` |
| **PR Title Template** | `feat: <short‑desc> (primitive change)` |
| **PR Description Template** | <ul><li>**Why** – short rationale</li><li>**What** – list of modified files</li><li>**How** – steps to reproduce locally</li><li>**Verification** – deterministic gates (see §3)</li></ul> |
| **Reviewers** | • Primary reviewer – code‑owner of the affected path <br>• Secondary reviewer – **Ricky‑CI** bot (automated checks) |
| **Merge Strategy** | Squash‑merge only; require **2 approvals** + **all gates passed** |

---

## 2. Step‑by‑Step Execution Contract  

| Step | Owner | Action | Output Artifact |
|------|-------|--------|------------
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow Title:** Claude‑Review → Codex‑Implement Pipeline  

**Purpose** – Automate a deterministic “review‑then‑code” cycle where Claude (LLM‑reviewer) validates design/spec changes and Codex (LLM‑coder) generates the implementation. The workflow is scoped to a single feature branch and produces a PR that can be merged only after passing all verification gates.

---

## 1. Scope & Branching Model
| Item | Definition |
|------|------------|
| **Feature Branch** | `feature/<ticket‑id>-<short‑name>` (e.g., `feature/1234‑auth‑login`) |
| **Base Branch** | `main` (protected) |
| **PR Target** | `main` (via PR) |
| **Workflow Trigger** | Push to a feature branch *or* manual dispatch from the CI dashboard. |

---

## 2. Contract (Inputs / Outputs)

| Contract | Details |
|----------|---------|
| **Input Artifact** | `design.md` – Markdown spec placed at repo root of the feature branch. |
| **Claude Review Output** | `review-report.json` containing: <br>• `status`: `PASS` / `FAIL` <br>• `issues`: array of `{line, severity, message}` <br>• `review‑summary`: free‑text |
| **Codex Implementation Output** | `src/` – generated source files (or updated files) committed to the same branch. |

...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. The goal is to make each step **action‑oriented**, **measurable**, and **verifiable** while keeping the original logical flow intact.

Below is a concrete contract, the verification gate you should run after refinement, and a short “how‑to” guide.

---

## 1. Contract – `refine_workflow`  

| Field | Type | Required? | Description |
|-------|------|-----------|-------------|
| `inputWorkflow` | `string` (path or inline JSON/YAML) | ✅ | The workflow to be refined. |
| `outputPath` | `string` (file path) | ✅ | Destination for the refined workflow. |
| `styleGuide` | `enum` (`concise`, `verbose`, `compliance`) | ❌ (default=`concise`) | Tone for the rewritten text. |
| `gateTemplate` | `string` (Jinja‑like) | ❌ | Custom template for acceptance gates. If omitted, the built‑in “SMART” template is used. |
| `maxIterations` | `int` (1‑5) | ❌ (default=`1`) | How many refinement passes to run (useful for heavy‑handed rewrites). |

**Outpu
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Release**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `release/vX.Y.Z` (semantic version bump)  

---

## 1. Scope & Objectives
- Define the exact CLI contract for `ricky --version`.
- Implement deterministic unit‑tests and integration‑tests.
- Automate version bump, changelog entry, and release artifact generation.
- Enforce review, verification, and blocker reporting per AgentWorkforce standards.

---

## 2. Contract (Public Interface)

| Element | Description | Type | Example |
|---------|-------------|------|---------|
| Command | `ricky --version` | CLI flag | `ricky --version` |
| Output  | Semantic version string prefixed with `ricky ` | `string` | `ricky 2.4.1` |
| Exit code | `0` on success, non‑zero on internal error | `int` | `0` |
| Side‑effects | None (pure read‑only) | – | – |
| Platform support | Linux, macOS, Windows (via Go cross‑compile) | – | – |

**Formal contract (Go interface)**  

```go
type VersionProvider interface {
    // Version returns the full version string as displayed by `ricky --version`.
    // Must be deterministic and contain exactly one space between the binary name
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large collection of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and re‑run until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates the final state of every artifact.

---

## 1. Repository Layout (contract)

/artifacts/
<artifact‑id>/ # one folder per artifact
source/ # raw input files
config.yaml # per‑artifact config
tests/ # validation test suite (pytest‑compatible)
fix/ # optional fix scripts (bash/python)
.status.json # workflow‑generated status record
/workflows/
batch_validation.yml # this workflow (GitHub Actions syntax)
/.github/
workflows/
batch_validation.yml # symlink to /workflows/batch_validation.yml


*Each artifact folder is **self‑contained** – no cross‑artifact dependencies.*  

**`.status.js
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|-------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step declares **inputs**, **outputs**, and **type‑checked schemas** (JSON‑Schema or Pydantic). | The workflow begins with a `contract` block that lists required artifacts (e.g., `source_repo`, `target_branch`). Down‑stream steps reference these names, guaranteeing that no step can run without the exact data shape the previous step produced. |
| **Deterministic Verification Gates** | After each critical transformation a **gate** runs a pure‑function test (e.g., `assert len(changed_files) > 0`). | The generated `verify_changes` step aborts the pipeline early if the diff is empty, preventing unnecessary CI runs and making the pipeline’s outcome reproducible. |
| **Review Artifacts as First‑Class Outputs** | Every step that mutates code also emits a **review artifact** (e.g., `code_review.md`, `diff.patch`). | The `generate_review` step creates a markdown summary that is automatically attached to the PR. Reviewe
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

---

### 📋 Quick Summary  
To produce a usable, test‑able workflow we need a **minimal, concrete product specification**. With only a vague description we can’t reliably:

1. Define the **workflow contract** (inputs, outputs, success/failure conditions).  
2. Set **deterministic verification gates** (unit‑/integration‑test criteria).  
3. Scope the **branch/PR boundaries** and **review artifacts** (design doc, test plan, CI config).  

Below is a short checklist. Fill in the items that apply to your product idea, and I’ll generate a complete workflow (including contract, verification gates, and PR scaffolding) in under 100 validation loops.

---

### ✅ Required Details (please provide)

| # | Category | What I need from you |
|---|----------|----------------------|
| 1 | **Goal & Scope** | One‑sentence description of the product feature or service the workflow should support. |
| 2 | **Primary Actors** | Who/what triggers the workflow? (e.g., API call, UI event, scheduled job, external webhook). |
| 3 | **Inputs** | List of required input fields, types, and validation rules. |
| 4 | **Outputs** | Expected success payload (format, fields) 
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣ Detect & Log | • Pull the execution trace from the orchestration engine (e.g., Temporal, Airflow). <br>• Identify the timed‑out step (Step A) and the “zombie” worker (Worker B) that stayed alive past its heartbeat window. | Provides a deterministic, reproducible record of the failure point. | `failure‑trace.json` (timestamped, immutable) |
| 2️⃣ Blocker Report | • Open a **Blocker Issue** in the workflow repo (e.g., `workflow‑<name>-blocker-<date>.md`). <br>• Tag the issue with `#blocker`, `#timeout`, `#zombie‑worker`. <br>• Include the trace, logs, and a short “impact” summary (e.g., “no artifacts produced; downstream pipelines stalled”). | Guarantees visibility to all stakeholders and prevents downstream merges until resolved. | GitHub Issue #\<id> |
| 3️⃣ Isolate & Quarantine | • Abort the current run via the orchestration API (idempotent). <br>• Mark the run status as **FAILED** with a custom error code `ERR_TIMEOUT_ZOMBIE`. <br>• Spin up a **sandbox branch** `fix/timeout‑zombie‑<date>` from the last known good commit. | Prevents further
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – “Detect‑and‑Purge Stale Relay State”**  
*Purpose*: Prevent a local AgentWorkforce run from using out‑of‑date `.agent‑relay/`, `.relay/`, or `.trajectories/` artifacts that can cause nondeterministic behavior, failed verification gates, or hidden blockers.  

| Item | Specification |
|------|----------------|
| **Trigger** | `ricky workflow start <workflow‑name>` (or any `ricky run …` entry point). |
| **Pre‑condition** | The working tree is clean (no uncommitted changes) **and** the target workflow definition (`.ricky/workflows/<name>.yml`) exists. |
| **Input** | None (implicit from cwd). |
| **Output** | - `PASS` – clean state, workflow proceeds.<br>- `FAIL` – stale state detected, abort with remediation instructions. |
| **Side‑effects** | Optional automatic purge (configurable). No commit is made automatically. |
| **Exit codes** | `0` – success, `1` – stale state detected, `2` – internal error. |
| **Owner** | `ricky/relay‑state‑guard` (branch `feature/relay‑state‑guard`). |
| **Review artifacts** | - Unit tests (`*_test.go` or `*_test.py`).<br>- Integration test matrix (local, CI, Docker).<br>- Documentation (`docs/relay‑state‑guard.md`). |
| **Verifica
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Quick‑Fix Checklist**

| Step | Action | Command / UI | Expected Result |
|------|--------|--------------|-----------------|
| 1️⃣ | **Confirm the active marker** | `cat .ricky/run/active.marker` (or open the “Run Marker” file in the workspace root) | Shows the UUID of the run that the system thinks is still alive. |
| 2️⃣ | **Verify the run is really dead** | `ricky status`  <br>or check the CI/CD dashboard for a running Ricky/Relay job | No live process should be listed. If a process *is* running, note its PID/Job‑ID. |
| 3️⃣ | **Terminate stray process (if any)** | - On Linux/macOS: `kill -9 <PID>`  <br>- On Windows: `taskkill /PID <PID> /F` | Process disappears from `ricky status`. |
| 4️⃣ | **Clear the stale marker** | `rm -f .ricky/run/active.marker`  <br>or use the UI “Clear Run Marker” button (if available) | Marker file is removed. |
| 5️⃣ | **Refresh workspace state** | `git status` (ensure no pending changes) <br>`ricky reset` (optional) | Workspace is clean and ready for a new run. |
| 6️⃣ | **Run a sanity test** | `ricky dry‑run` | Should complete without “another run is active” error. |
| 7️⃣ | **Start your intended run** | `ricky run <workflow‑name>` | Executes nor
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Mode**  

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script). It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored in `./artifacts/<workflow‑id>.json`. The previous successful run ID is supplied as `PREV_RUN_ID`.

---

## 1️⃣ Preconditions (Verification Gate)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| **CLI installed** | `aw --version` | `aw version x.y.z` |
| **Workflow definition present** | `ls ./workflows/<workflow‑name>.yaml` | File exists |
| **Artifact readable** | `cat ./artifacts/<workflow‑id>.json` | Valid JSON |
| **Previous run reachable** | `aw run get $PREV_RUN_ID --status` | Returns `COMPLETED` or `SUCCESS` |
| **Auto‑fix feature enabled** | `aw config get auto_fix` | `true` |

If any gate fails, abort and fix the environment before proceeding.

---

## 2️⃣ Execution Contract

```bash
#!/usr/bin/env bash
set -euo pipefail

# ------------------------------------------------------------------
# INPUTS (replace placeholders)
# -------------------------------------------------------------
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts` | Full execution trace. Identify the first **ERROR** or **FAIL** line. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Runs the workflow with step‑by‑step console output and shows the exact input/output of each node. |
| 3️⃣  | Open the generated **artifact bundle** | `./.ricky/artifacts/foo_<timestamp>.zip` – unzip and inspect `manifest.json`, `contract.json`, and any `*.log` files. |
| 4️⃣  | Verify **contract compliance** | `ricky verify contracts/foo.contract.ts` – ensures the workflow’s inputs/outputs still match the declared contract. |
| 5️⃣  | Run the **deterministic gate** manually | `ricky gate --workflow workflows/foo.ts --gate deterministic` – reproduces the exact state the gate checks. |
| 6️⃣  | Check **branch/PR boundaries** | If the workflow lives in a feature branch, run `ricky status` to see if there are pending PR reviews or blocked dependencies. |
| 7️⃣  | Look for **blocker reports** | `ricky blockers list` – any open blockers will be listed with a short description and owner. |
| 8️⃣  | Re‑run wi
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeat‑able contract that Ricky follows whenever a developer asks to run a workflow written in TypeScript that lives in the primary local path of the repository (e.g., `./workflows/`). The contract is expressed as a **deterministic execution plan** with explicit verification gates, review artifacts, and a bounded validation loop (80‑100 ms per gate) to guarantee reliability and reproducibility.

---

## 1. Preconditions (Gate A)

| Check | Action | Pass Criteria |
|------|--------|---------------|
| **Repo root detection** | `git rev-parse --show-toplevel` | Must resolve to a directory that contains `package.json` and a `workflows/` folder. |
| **Node version** | `node -v` | Must be `>=18.0.0`. |
| **TypeScript compiler** | `tsc -v` | Must be installed globally or as a dev‑dependency (`npm list typescript`). |
| **Workflow artifact existence** | `ls ./workflows/<name>.ts` | File must exist and be readable. |
| **No uncommitted changes** | `git status --porcelain` | Output must be empty (or developer explicitly opts‑in to run with dirty state). |

*If any check fails, Ricky repo
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Escalation Workflow for Persistent Structural Failures**

When a structural failure remains after an initial fix, the workflow must enforce deterministic verification, clear ownership, and a bounded escalation path. The contract below defines the exact steps, artefacts, and gates you should follow to move the issue out of the “boundary” and back into a tractable state.

---

## 1. Trigger – “Boundary Reached” Event
| Field | Value |
|-------|-------|
| **Event Type** | `STRUCTURAL_FAILURE_PERSISTED` |
| **Origin** | Automated test suite / CI pipeline |
| **Condition** | Failure persists after **Fix Attempt #N** (N ≥ 1) and verification gate `POST_FIX_PASS` returns **FAIL** |
| **Payload** | `{ failure_id, repo, branch, commit_sha, attempt_number, logs_url, test_report_url }` |

*The event is emitted by the CI orchestrator (e.g., GitHub Actions) and captured by the **Ricky‑Boundary‑Handler** service.*

---

## 2. Immediate Automated Response
1. **Create a “Boundary” Issue** in the repository’s `#infra‑failures` project board.  
   - Title: `⚠️ Boundary Reached – Structural Failure <failure_id>`  
   - Labels: `boundary`, `structural-failure`, `needs‑triage`  
   -
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Consolidated Report**

---

### 1. Scope & Data Sources
| Item | Description |
|------|-------------|
| **Workflow IDs** | `wf‑001` … `wf‑120` (all runs in the last 30 days) |
| **Run Store** | `ricky‑run‑archive/2024‑05‑*` (JSONL files) |
| **Metrics Extracted** | Success rate, mean latency, error‑type distribution, retry count, gate‑pass/fail, reviewer comments |
| **Time Window** | 2024‑04‑09 00:00 UTC → 2024‑05‑09 23:59 UTC |
| **Verification Gates** | `gate‑schema‑v2`, `gate‑ci‑pass`, `gate‑security‑scan` |

---

### 2. Aggregation Contract (Deterministic)

```yaml
digest:
  period: "2024-04-09..2024-05-09"
  total_runs: int
  successes: int
  failures: int
  success_rate: float   # successes / total_runs
  avg_latency_ms: float
  p95_latency_ms: float
  error_breakdown:
    - error_type: str
      count: int
      pct_of_failures: float
  retry_stats:
    avg_retries: float
    max_retries: int
  gate_summary:
    passed: int
    failed: int
    fail_by_gate:
      - gate_name: str
        count: int
  blockers:
    - run_id: str
      reason: str
      owner: str
      created_at: timestamp
  reviewers:
    - run_id: str
      reviewer: str
 
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Draft*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers and product owners a deterministic, low‑latency way to inspect, step‑through, and repair Ricky‑managed workflows directly from a Slack channel. It follows the Ricky repository conventions:

* **Contract‑first** – all interactions are defined as typed JSON messages.  
* **Verification gates** – every request passes schema validation, auth checks, and a deterministic “debug‑state” snapshot test before execution.  
* **Review artifacts** – every change to the surface lives in a scoped branch (`slack/debugger‑<ticket>`) and is gated by a PR checklist.  
* **80‑100 validation loops** – the surface is exercised in a CI matrix that runs 80‑100 random workflow‑debug scenarios per commit.  
* **Blocker reporting** – any failure in the validation matrix is auto‑opened as a GitHub issue with the `debugger‑blocked` label.  

---

## 2. Interaction Contract

| Direction | Event Type | Payload Schema (JSON) | Required Auth Scopes |
|-----------|------------|----------------------|----------------------|
| **User → Bot** | `debug.start` | `{ "
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Web‑Handoff Type – `workflowSubmit`**

| Element | Definition |
|---------|------------|
| **Name** | `workflowSubmit` |
| **Purpose** | Allows a web client (e.g., UI, external service) to submit a complete AgentWorkforce workflow specification together with an execution *mode* (e.g., `dryRun`, `run`, `preview`). The backend validates, stores, and optionally triggers the workflow. |
| **Location in `handoffTypes.json`** | Add a new entry under the top‑level `web` object. |
| **Schema** | See **JSON‑Schema** below. |
| **Supported Modes** | `dryRun` – validate only, no side‑effects.<br>`preview` – validate and generate a visual preview (no execution).<br>`run` – full validation + enqueue for execution. |
| **Response** | A deterministic `handoffResult` object (see below) containing `status`, `workflowId`, `validationReport`, and optional `previewUrl`. |
| **Security** | Must be sent over HTTPS, include an `Authorization: Bearer <token>` header. The token is checked against the **WorkflowSubmit** scope in the IAM policy. |
| **Rate‑limit** | 30 submissions per minute per user. |
| **Audit** | Every submission creates an immutable audit record (`audit.workflowSubmit`) with the r
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision:** Generate a new workflow draft.

**Rationale**

1. **Input Type** – You supplied a *workflow specification* together with *conversation context* but no existing runnable workflow artifact (e.g., a `workflow.yaml` or CI pipeline). This indicates the request is to create a fresh, concrete workflow rather than debug or restart an existing one.

2. **Action Matrix**  
| Input | Matching Action |
|-------|-----------------|
| Existing workflow code + error logs | Debug / Restart |
| Completed run + metrics | Analyze |
| Multiple agents needing synchronization | Coordinate |
| **Spec + context only** | **Generate** |

3. **Contract Requirements** – Generation will produce:
   - A deterministic `workflow.yaml` adhering to the **AgentWorkforce workflow schema** (versioned, with explicit `inputs`, `steps`, `outputs`).
   - A **verification gate** block that lists required unit‑test suites, lint checks, and a CI gate to enforce 80‑100 validation loops before merge.
   - A **review artifact** stub (`REVIEW.md`) outlining expected reviewer responsibilities and blocker‑reporting guidelines.

4. **Scope** – The generated workflow will be confined to a single feature branch (`feature
...[truncated]

Must

  • Normalize the tool call through the MCP/Claude handoff path.
  • Treat conversation context as advisory metadata.
  • Make the spec itself sufficient for routing whenever possible.

Must Not

  • Require the user to rewrite the spec manually as a workflow.
  • Put domain routing logic inside the MCP tool handler itself.
  • Ignore provided mode, workflow path, or prior decisions metadata.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)

User message

Change the Cloud Ricky generate API response to include a new field and change one existing error code name.

Ricky output

**Scope**  
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.  
- Rename error code **`INVALID_PROMPT`** to **`PROMPT_VALIDATION_FAILED`** in the error‑response schema and all references.  

All changes are limited to the **`cloud-ricky`** service repository (no cross‑service impact).  

---  

## 1. Contract Update (API Specification)

| Endpoint | Method | Current Success Schema | **New Success Schema** |
|----------|--------|------------------------|------------------------|
| `/v1/generate` | POST | ```json { "output": "...", "model": "gpt‑4", "usage": { "tokens": 123 } } ``` | ```json { "output": "...", "model": "gpt‑4", "usage": { "tokens": 123 }, **"requestId": "550e8400-e29b-41d4-a716-446655440000"** } ``` |

| Error Code | Current Name | **New Name** |
|------------|--------------|--------------|
| 400‑INVALID_PROMPT | `INVALID_PROMPT` | **`PROMPT_VALIDATION_FAILED`** |

*Update the OpenAPI 3.1 document (`api/openapi.yaml`):*  

```yaml
components:
  schemas:
    GenerateResponse:
      type: object
      properties:
        output:
          type: string
        model:
          type: string
        usage:
...[truncated]

Must

  • Allow additive response fields within /api/v1/ricky/....
  • Reject renaming or changing the meaning of an existing error code inside the same version.
  • Require a new API version for breaking request/response changes.

Must Not

  • Remove or rename existing response fields in v1.
  • Add a new required

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@khaliqgant khaliqgant merged commit 14dd2fd into main May 10, 2026
2 checks passed
@khaliqgant khaliqgant deleted the codex/cloud-workflow-schedule-ricky branch May 10, 2026 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant