Conversation
Rename the design folder to reflect the broader scope and reorganize into a top-level RFC plus per-work-package subfolders. - Top-level: RFC covering prompt variables, JSON value handling, template rendering semantics (mustache as new default for new apps, curly deprecated, fstring/jinja2), the per-service variable matrix, decisions, work-package layering (B1-B3 backend, F1-F3 frontend, D1 docs), rollout, test plan, future directions for sharing the prompt template across services. - wp-b1-runtime-foundation/: scoped to judge backend patch (provider /secret resolution + temperature removal) and the low-level rendering helper extraction. Plan, implementation notes, QA, research, variable-and-template analysis, and status all aligned with the RFC's WP-B1 scope; helper boundary explicitly excludes message-rendering and JSON-return rendering (those move to WP-B2).
Patches the LLM-as-a-judge runtime to share the provider/secret resolution
path with chat/completion and extracts the per-mode template substitution
logic into a single helper module so WP-B2/WP-B3 can build on it without
re-touching the judge or `PromptTemplate`.
Phase 1 — judge backend patch (`auto_ai_critique_v0`):
- resolve provider settings via `SecretsManager.ensure_secrets_in_workflow()` +
`SecretsManager.get_provider_settings_from_workflow(model)` (custom and
self-hosted models configured in Model Hub now reach the judge);
- raise `InvalidSecretsV0Error` with the selected model when settings are
missing, matching chat/completion;
- route the LLM call through `mockllm.acompletion` under
`mockllm.user_aws_credentials_from(provider_settings)` (replaces the
module-level `litellm.openai_key = ...` pattern; scrubs ECS/Lambda role
env vars for the duration of the call);
- stop sending `temperature=0.01`. Newer providers reject the kwarg and
the judge has no UI to configure it.
Phase 2 — low-level template helper (`agenta.sdk.utils.templating`):
- `render_template(*, template, mode, context) -> str` covering `curly`,
`fstring`, `jinja2` (mustache lands in WP-B3);
- typed `UnresolvedVariablesError(ValueError)` carries the unresolved set
so call sites can format their preferred message text;
- both call sites — `PromptTemplate._format_with_template` (chat/completion)
and the judge's `_format_with_template` — funnel through it. Public
behavior is unchanged: `PromptTemplate` keeps its legacy
`"Unreplaced variables in curly template: ['x'].{Hint}"` wording (pinned
by a regression test); the judge keeps its silent-return-on-Jinja-error
contract.
Tests (sdk/oss/tests/pytest/unit/, 249/249 passing):
- `test_auto_ai_critique_v0_runtime.py` — provider resolution (standard +
custom), missing-settings error, no-temperature, response_format /
json_schema forwarding, context aliases, result normalization;
- `test_render_template_helper.py` — each mode + JSONPath / JSON Pointer
/ literal-key-first / whole-object compact JSON / sandbox violation, plus
call-site message-text regression tests for both `PromptTemplate` and
the judge handler.
- status log: record Phase 1 + Phase 2 completion and the post-review cleanup pass (typed `UnresolvedVariablesError`, dead-helper removal, resolver de-duplication, message-text regression tests). - code-review/: scope, findings, risks, questions, summary, scorecard from the review pass.
…ring
Two bugs surfaced while reviewing the WP-B1 rendering helper for special-character
handling:
1. Backslash doubling. _render_curly defensively called .replace("\\", "\\\\") on
every substitution value. The defensive escape was meant to neutralize regex
backreferences, but re.sub with a function callable does not interpret
backslash escapes in the return value (Python's documented behavior). Net
effect: every backslash in a user-supplied value reached the LLM doubled —
e.g. a Windows-style path with one backslash arrived with two. Drop the
.replace; values now round-trip correctly.
2. Empty placeholder leak. resolve_dot_notation("", data) short-circuited to
data because the post-split(".") loop never executed, so the runtime
serialized the whole context dict (including any secrets, ground-truth
columns, trace fields, etc.) into the prompt whenever a template contained
{{}}. resolve_dot_notation now raises on empty expr, which surfaces as a
normal UnresolvedVariablesError.
Tests:
- sdk/oss/tests/pytest/unit/test_render_template_helper.py grew from 21 to 81
tests covering curly basics, placeholder syntax (whitespace / multiple /
repeated / multi-line / unicode), value coercion, value safety (no recursive
rendering, backslash round-trip, regex backref round-trip), error contract
(unresolved set, deep misses, mid-path scalars, empty placeholder), regex
edge cases (triple/quadruple braces, mismatched braces, embedded newlines),
fstring (escape, format specs, index access, value safety), jinja2 (raw
blocks, filters, conditionals, undefined behavior, sandbox violations), and
call-site preservation. Both bug fixes are pinned by regression tests.
- Full SDK unit suite: 309/309 passing.
Docs:
- New docs/design/prompt-runtime-unification/appendix-rendering-edge-cases.md
documents the template/value boundary, per-mode escape mechanisms, the
curly-mode escape gap, frontend↔backend extractor mismatches, and what's
pinned by tests.
- WP-B3 in the RFC now carries an explicit note that brace escaping for curly
is an open question and that mustache (greenfield) is the cleanest place to
land an explicit escape mechanism.
Companion to qa.md (which covers unit tests). Walks through a real-stack verification of the WP-B1 changes plus the rendering review pass: - Section A: new functionality — custom and self-hosted models in the judge, via UI and direct calls. Includes the temperature-removal check for reasoning models that previously rejected the hard-coded temperature=0.01. - Sections B–C: regression coverage for variable rendering across chat, completion, and judge — every curly mode feature (top-level, nested, array, JSONPath, JSON Pointer, literal-key-first, whitespace, repeated, multiple), fstring brace-escape, jinja2 filters/conditionals/raw blocks, sandbox blocking. Plus the two bug-fix verifications (backslash round-trip, empty placeholder no longer leaks context). - Section D: same regression matrix exercised directly via the API rather than the playground, to isolate transport from rendering. - Section E: UX touch-ups for the new error paths. Closes with a side note on SDK-direct usage of LLM-as-a-judge: the canonical path (evaluation service / runtime) is unchanged; the only behavior shift is for bare-script callers that previously relied on env-var key pickup instead of bootstrapping the workflow context. Documents the risk and the mitigation direction for WP-B2.
These were internal review notes that don't belong in the shipped design workspace. Remove the code-review/ subfolder; everything user-facing (plan, implementation-notes, qa, manual-qa-checklist, status, README) stays.
…I/agenta into feat/llm-judge-chat-unification
The legacy admin_router.create_accounts endpoint and the new fastapi/accounts/router.create_accounts both emit operation IDs that generate the same TypeScript method name in Fern client codegen. Excluding the legacy route from the OpenAPI schema removes the collision at the source, eliminating the need for downstream Fern post-processors to disambiguate the generated method.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughCentralizes prompt rendering into a new render_template helper (curly/fstring/jinja2), updates PromptTemplate and runtime handlers to use it, patches auto_ai_critique_v0 to use workflow-scoped provider settings and omit temperature from LLM calls, adds unit tests and RFC documentation, and bumps multiple package versions. ChangesPrompt runtime unification — WP‑B1 (templating, handler, SDK tests, docs)
Miscellaneous — version bumps, API metadata, contributors, README, frontend UI change
Sequence Diagram(s)sequenceDiagram
participant Handler as auto_ai_critique_v0
participant Secrets as SecretsManager
participant Renderer as render_template
participant MockLLM as mockllm.acompletion
Handler->>Secrets: ensure_secrets_in_workflow()
Handler->>Secrets: get_provider_settings_from_workflow(model)
Secrets-->>Handler: provider_settings or null
alt provider_settings missing
Handler->>Handler: raise InvalidSecretsV0Error
else provider_settings present
Handler->>Renderer: render messages/aliases with render_template(...)
Handler->>MockLLM: user_aws_credentials_from(provider_settings) (enter)
Handler->>MockLLM: acompletion(messages, response_format, **provider_settings)
MockLLM-->>Handler: LLM response
Handler->>Handler: normalize/parse response (JSON parsing, result normalization)
Handler-->>Caller: evaluation result / errors
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
feat(sdk): prompt runtime unification + WP-B1 implementation
chore(api): hide duplicate /admin/accounts route from OpenAPI
There was a problem hiding this comment.
Actionable comments posted: 8
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 05208aa5-9c45-4cee-b89a-399a3038a73e
📒 Files selected for processing (17)
docs/design/prompt-runtime-unification/README.mddocs/design/prompt-runtime-unification/appendix-rendering-edge-cases.mddocs/design/prompt-runtime-unification/findings.mddocs/design/prompt-runtime-unification/wp-b1-runtime-foundation/README.mddocs/design/prompt-runtime-unification/wp-b1-runtime-foundation/implementation-notes.mddocs/design/prompt-runtime-unification/wp-b1-runtime-foundation/manual-qa-checklist.mddocs/design/prompt-runtime-unification/wp-b1-runtime-foundation/plan.mddocs/design/prompt-runtime-unification/wp-b1-runtime-foundation/qa.mddocs/design/prompt-runtime-unification/wp-b1-runtime-foundation/research.mddocs/design/prompt-runtime-unification/wp-b1-runtime-foundation/status.mddocs/design/prompt-runtime-unification/wp-b1-runtime-foundation/variable-and-template-analysis.mdsdk/agenta/sdk/engines/running/handlers.pysdk/agenta/sdk/utils/resolvers.pysdk/agenta/sdk/utils/templating.pysdk/agenta/sdk/utils/types.pysdk/oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.pysdk/oss/tests/pytest/unit/test_render_template_helper.py
✅ Files skipped from review due to trivial changes (4)
- docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/plan.md
- docs/design/prompt-runtime-unification/findings.md
- docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/research.md
- docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/status.md
Railway Preview Environment
Updated at 2026-05-05T11:45:02.197Z |
fix: open create prompt modal when navigating from onboarding screen
docs: add Devarsh05 as a contributor for bug
|
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
web/oss/src/components/GetStarted/GetStarted.tsx (1)
49-63: ⚡ Quick winConsolidate workspace-context resolution to avoid fallback drift.
The same
waitForWorkspaceContext+buildPostLoginPathblock is repeated in three callbacks, and fallback behavior has already diverged (/wvs/apps). Extracting a shared resolver (and a shared fallback constant) will keep navigation behavior consistent.♻️ Suggested refactor sketch
+const WORKSPACE_FALLBACK_PATH = "/apps" + +const resolveWorkspacePath = useCallback(async () => { + const context = await waitForWorkspaceContext({ + timeoutMs: 5000, + requireProjectId: true, + requireWorkspaceId: true, + requireOrgData: true, + }) + return buildPostLoginPath(context) +}, []) const navigateToDestination = useCallback(async () => { try { - const context = await waitForWorkspaceContext({ ... }) - const path = buildPostLoginPath(context) + const path = await resolveWorkspacePath() router.push(path) } catch (e) { console.error("Failed to resolve workspace context", e) - router.push("/w") + router.push(WORKSPACE_FALLBACK_PATH) } -}, [router]) +}, [router, resolveWorkspacePath]) // same replacement pattern in handleSelection + handleNextAlso applies to: 71-85, 93-111
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 04fd73be-9acf-4d5b-9a84-6510043acdad
📒 Files selected for processing (3)
.all-contributorsrcREADME.mdweb/oss/src/components/GetStarted/GetStarted.tsx
✅ Files skipped from review due to trivial changes (2)
- .all-contributorsrc
- README.md
New version v0.98.1 in