[release] v0.98.1 by github-actions[bot] · Pull Request #4262 · Agenta-AI/agenta

github-actions · 2026-05-05T07:57:12Z

New version v0.98.1 in

(web)
web/oss
web/ee
sdk
api
services

Rename the design folder to reflect the broader scope and reorganize into a top-level RFC plus per-work-package subfolders. - Top-level: RFC covering prompt variables, JSON value handling, template rendering semantics (mustache as new default for new apps, curly deprecated, fstring/jinja2), the per-service variable matrix, decisions, work-package layering (B1-B3 backend, F1-F3 frontend, D1 docs), rollout, test plan, future directions for sharing the prompt template across services. - wp-b1-runtime-foundation/: scoped to judge backend patch (provider /secret resolution + temperature removal) and the low-level rendering helper extraction. Plan, implementation notes, QA, research, variable-and-template analysis, and status all aligned with the RFC's WP-B1 scope; helper boundary explicitly excludes message-rendering and JSON-return rendering (those move to WP-B2).

Patches the LLM-as-a-judge runtime to share the provider/secret resolution path with chat/completion and extracts the per-mode template substitution logic into a single helper module so WP-B2/WP-B3 can build on it without re-touching the judge or `PromptTemplate`. Phase 1 — judge backend patch (`auto_ai_critique_v0`): - resolve provider settings via `SecretsManager.ensure_secrets_in_workflow()` + `SecretsManager.get_provider_settings_from_workflow(model)` (custom and self-hosted models configured in Model Hub now reach the judge); - raise `InvalidSecretsV0Error` with the selected model when settings are missing, matching chat/completion; - route the LLM call through `mockllm.acompletion` under `mockllm.user_aws_credentials_from(provider_settings)` (replaces the module-level `litellm.openai_key = ...` pattern; scrubs ECS/Lambda role env vars for the duration of the call); - stop sending `temperature=0.01`. Newer providers reject the kwarg and the judge has no UI to configure it. Phase 2 — low-level template helper (`agenta.sdk.utils.templating`): - `render_template(*, template, mode, context) -> str` covering `curly`, `fstring`, `jinja2` (mustache lands in WP-B3); - typed `UnresolvedVariablesError(ValueError)` carries the unresolved set so call sites can format their preferred message text; - both call sites — `PromptTemplate._format_with_template` (chat/completion) and the judge's `_format_with_template` — funnel through it. Public behavior is unchanged: `PromptTemplate` keeps its legacy `"Unreplaced variables in curly template: ['x'].{Hint}"` wording (pinned by a regression test); the judge keeps its silent-return-on-Jinja-error contract. Tests (sdk/oss/tests/pytest/unit/, 249/249 passing): - `test_auto_ai_critique_v0_runtime.py` — provider resolution (standard + custom), missing-settings error, no-temperature, response_format / json_schema forwarding, context aliases, result normalization; - `test_render_template_helper.py` — each mode + JSONPath / JSON Pointer / literal-key-first / whole-object compact JSON / sandbox violation, plus call-site message-text regression tests for both `PromptTemplate` and the judge handler.

- status log: record Phase 1 + Phase 2 completion and the post-review cleanup pass (typed `UnresolvedVariablesError`, dead-helper removal, resolver de-duplication, message-text regression tests). - code-review/: scope, findings, risks, questions, summary, scorecard from the review pass.

… error

…ring Two bugs surfaced while reviewing the WP-B1 rendering helper for special-character handling: 1. Backslash doubling. _render_curly defensively called .replace("\\", "\\\\") on every substitution value. The defensive escape was meant to neutralize regex backreferences, but re.sub with a function callable does not interpret backslash escapes in the return value (Python's documented behavior). Net effect: every backslash in a user-supplied value reached the LLM doubled — e.g. a Windows-style path with one backslash arrived with two. Drop the .replace; values now round-trip correctly. 2. Empty placeholder leak. resolve_dot_notation("", data) short-circuited to data because the post-split(".") loop never executed, so the runtime serialized the whole context dict (including any secrets, ground-truth columns, trace fields, etc.) into the prompt whenever a template contained {{}}. resolve_dot_notation now raises on empty expr, which surfaces as a normal UnresolvedVariablesError. Tests: - sdk/oss/tests/pytest/unit/test_render_template_helper.py grew from 21 to 81 tests covering curly basics, placeholder syntax (whitespace / multiple / repeated / multi-line / unicode), value coercion, value safety (no recursive rendering, backslash round-trip, regex backref round-trip), error contract (unresolved set, deep misses, mid-path scalars, empty placeholder), regex edge cases (triple/quadruple braces, mismatched braces, embedded newlines), fstring (escape, format specs, index access, value safety), jinja2 (raw blocks, filters, conditionals, undefined behavior, sandbox violations), and call-site preservation. Both bug fixes are pinned by regression tests. - Full SDK unit suite: 309/309 passing. Docs: - New docs/design/prompt-runtime-unification/appendix-rendering-edge-cases.md documents the template/value boundary, per-mode escape mechanisms, the curly-mode escape gap, frontend↔backend extractor mismatches, and what's pinned by tests. - WP-B3 in the RFC now carries an explicit note that brace escaping for curly is an open question and that mustache (greenfield) is the cleanest place to land an explicit escape mechanism.

Companion to qa.md (which covers unit tests). Walks through a real-stack verification of the WP-B1 changes plus the rendering review pass: - Section A: new functionality — custom and self-hosted models in the judge, via UI and direct calls. Includes the temperature-removal check for reasoning models that previously rejected the hard-coded temperature=0.01. - Sections B–C: regression coverage for variable rendering across chat, completion, and judge — every curly mode feature (top-level, nested, array, JSONPath, JSON Pointer, literal-key-first, whitespace, repeated, multiple), fstring brace-escape, jinja2 filters/conditionals/raw blocks, sandbox blocking. Plus the two bug-fix verifications (backslash round-trip, empty placeholder no longer leaks context). - Section D: same regression matrix exercised directly via the API rather than the playground, to isolate transport from rendering. - Section E: UX touch-ups for the new error paths. Closes with a side note on SDK-direct usage of LLM-as-a-judge: the canonical path (evaluation service / runtime) is unchanged; the only behavior shift is for bare-script callers that previously relied on env-var key pickup instead of bootstrapping the workflow context. Documents the risk and the mitigation direction for WP-B2.

These were internal review notes that don't belong in the shipped design workspace. Remove the code-review/ subfolder; everything user-facing (plan, implementation-notes, qa, manual-qa-checklist, status, README) stays.

…I/agenta into feat/llm-judge-chat-unification

The legacy admin_router.create_accounts endpoint and the new fastapi/accounts/router.create_accounts both emit operation IDs that generate the same TypeScript method name in Fern client codegen. Excluding the legacy route from the OpenAPI schema removes the collision at the source, eliminating the need for downstream Fern post-processors to disambiguate the generated method.

vercel · 2026-05-05T07:57:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	May 5, 2026 11:04am

coderabbitai · 2026-05-05T07:57:31Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Centralizes prompt rendering into a new render_template helper (curly/fstring/jinja2), updates PromptTemplate and runtime handlers to use it, patches auto_ai_critique_v0 to use workflow-scoped provider settings and omit temperature from LLM calls, adds unit tests and RFC documentation, and bumps multiple package versions.

Changes

Prompt runtime unification — WP‑B1 (templating, handler, SDK tests, docs)

Layer / File(s)	Summary
Data / Types `sdk/agenta/sdk/utils/templating.py`	Add `TemplateMode`, `UnresolvedVariablesError`, `_coerce_to_str`, and public `render_template(template, mode, context)` implementing `curly`, `fstring`, and `jinja2`.
Lookup behavior `sdk/agenta/sdk/utils/resolvers.py`	`resolve_dot_notation` now raises `KeyError` for empty expressions to prevent `{{}}` resolving to the whole context.
Core integration `sdk/agenta/sdk/utils/types.py`	`PromptTemplate._format_with_template` delegates to `render_template`, validates format, and maps `UnresolvedVariablesError` / `KeyError` / Jinja errors into `TemplateFormatError`.
Handler wiring `sdk/agenta/sdk/engines/running/handlers.py`	`_format_with_template` delegates to shared renderer; `auto_ai_critique_v0` now calls `SecretsManager.ensure_secrets_in_workflow()` + `get_provider_settings_from_workflow(model)`, raises `InvalidSecretsV0Error` when missing, and invokes `mockllm.acompletion` under `mockllm.user_aws_credentials_from(provider_settings)` passing `**provider_settings` and omitting `temperature`.
Tests `sdk/oss/tests/pytest/unit/test_render_template_helper.py`, `sdk/oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py`	Add exhaustive unit tests for `render_template` (curly/fstring/jinja2) and runtime tests for `auto_ai_critique_v0` covering secret resolution, provider settings, omission of `temperature`, response_format forwarding, template aliases, normalization, and error contracts.
Docs / Process `docs/design/prompt-runtime-unification/**`	Add RFC, appendix, findings, WP‑B1 implementation notes/plan/QA/manual checklist/status/research/variable-analysis documenting rules, test plans, sequencing, and edge-case appendices.

Miscellaneous — version bumps, API metadata, contributors, README, frontend UI change

Layer / File(s)	Summary
Manifest updates `api/pyproject.toml`, `sdk/pyproject.toml`, `services/pyproject.toml`, `web/package.json`, `web/ee/package.json`, `web/oss/package.json`	Bumped package versions from `0.98.0` → `0.98.1` in six manifest files; no other manifest fields changed.
API metadata `api/oss/src/routers/admin_router.py`	Added `include_in_schema=False` to the `/accounts` POST route decorator (no handler/signature changes).
Contributors / README `.all-contributorsrc`, `README.md`	Added contributor Devarsh Prajapati and updated All Contributors badge/count and table entry.
Frontend UI `web/oss/src/components/GetStarted/GetStarted.tsx`	Use Jotai atom `setOnboardingWidgetActivationAtom` to open the create-prompt onboarding widget for `"test_prompt"`; redirect fallback changed to `"/apps"` and callback deps updated.

Sequence Diagram(s)

sequenceDiagram
    participant Handler as auto_ai_critique_v0
    participant Secrets as SecretsManager
    participant Renderer as render_template
    participant MockLLM as mockllm.acompletion

    Handler->>Secrets: ensure_secrets_in_workflow()
    Handler->>Secrets: get_provider_settings_from_workflow(model)
    Secrets-->>Handler: provider_settings or null
    alt provider_settings missing
        Handler->>Handler: raise InvalidSecretsV0Error
    else provider_settings present
        Handler->>Renderer: render messages/aliases with render_template(...)
        Handler->>MockLLM: user_aws_credentials_from(provider_settings) (enter)
        Handler->>MockLLM: acompletion(messages, response_format, **provider_settings)
        MockLLM-->>Handler: LLM response
        Handler->>Handler: normalize/parse response (JSON parsing, result normalization)
        Handler-->>Caller: evaluation result / errors
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Agenta-AI/agenta#4231: Directly related upstream work implementing WP‑B1 runtime unification and the shared render_template helper.
Agenta-AI/agenta#4252: Related change that also modifies the FastAPI /accounts route decorator to add include_in_schema=False.
Agenta-AI/agenta#4249: Related edits touching SDK runtime/template stack and PromptTemplate behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.38% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[release] v0.98.1' clearly summarizes the main change: a version release across multiple packages.
Description check	✅ Passed	The description lists the affected packages (web, web/oss, web/ee, sdk, api, services) and indicates a new version is being released, which relates to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch release/v0.98.1

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

feat(sdk): prompt runtime unification + WP-B1 implementation

chore(api): hide duplicate /admin/accounts route from OpenAPI

coderabbitai

Actionable comments posted: 8

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 05208aa5-9c45-4cee-b89a-399a3038a73e

📥 Commits

Reviewing files that changed from the base of the PR and between 7e85a68 and 8a1b14d.

📒 Files selected for processing (17)

docs/design/prompt-runtime-unification/README.md
docs/design/prompt-runtime-unification/appendix-rendering-edge-cases.md
docs/design/prompt-runtime-unification/findings.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/README.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/implementation-notes.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/manual-qa-checklist.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/plan.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/qa.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/research.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/status.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/variable-and-template-analysis.md
sdk/agenta/sdk/engines/running/handlers.py
sdk/agenta/sdk/utils/resolvers.py
sdk/agenta/sdk/utils/templating.py
sdk/agenta/sdk/utils/types.py
sdk/oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py
sdk/oss/tests/pytest/unit/test_render_template_helper.py

✅ Files skipped from review due to trivial changes (4)

docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/plan.md
docs/design/prompt-runtime-unification/findings.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/research.md
docs/design/prompt-runtime-unification/wp-b1-runtime-foundation/status.md

github-actions · 2026-05-05T08:49:14Z

Railway Preview Environment


Status	Destroyed (PR closed)

Updated at 2026-05-05T11:45:02.197Z

fix: open create prompt modal when navigating from onboarding screen

docs: add Devarsh05 as a contributor for bug

CLAassistant · 2026-05-05T09:43:23Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 5 committers have signed the CLA.

✅ mmabrouk
✅ bekossy
✅ jp-agenta
✅ Devarsh05
❌ allcontributors[bot]
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai

🧹 Nitpick comments (1)

web/oss/src/components/GetStarted/GetStarted.tsx (1)
49-63: ⚡ Quick win

Consolidate workspace-context resolution to avoid fallback drift.

The same waitForWorkspaceContext + buildPostLoginPath block is repeated in three callbacks, and fallback behavior has already diverged (/w vs /apps). Extracting a shared resolver (and a shared fallback constant) will keep navigation behavior consistent.
♻️ Suggested refactor sketch
+const WORKSPACE_FALLBACK_PATH = "/apps"
+
+const resolveWorkspacePath = useCallback(async () => {
+  const context = await waitForWorkspaceContext({
+    timeoutMs: 5000,
+    requireProjectId: true,
+    requireWorkspaceId: true,
+    requireOrgData: true,
+  })
+  return buildPostLoginPath(context)
+}, [])

 const navigateToDestination = useCallback(async () => {
   try {
-    const context = await waitForWorkspaceContext({ ... })
-    const path = buildPostLoginPath(context)
+    const path = await resolveWorkspacePath()
     router.push(path)
   } catch (e) {
     console.error("Failed to resolve workspace context", e)
-    router.push("/w")
+    router.push(WORKSPACE_FALLBACK_PATH)
   }
-}, [router])
+}, [router, resolveWorkspacePath])

 // same replacement pattern in handleSelection + handleNext
Also applies to: 71-85, 93-111

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 04fd73be-9acf-4d5b-9a84-6510043acdad

📥 Commits

Reviewing files that changed from the base of the PR and between 14d6f85 and f1dd38b.

📒 Files selected for processing (3)

.all-contributorsrc
README.md
web/oss/src/components/GetStarted/GetStarted.tsx

✅ Files skipped from review due to trivial changes (2)

.all-contributorsrc
README.md

…olver

mmabrouk and others added 22 commits April 28, 2026 11:53

docs: plan llm judge chat unification

390e026

docs: address llm judge plan review

fd4c513

Guard _load_jinja2() call in except handler to avoid masking original…

477ee62

… error

Merge branch 'main' into feat/llm-judge-chat-unification

d9af811

docs(wp-b1): drop internal code-review notes from PR

d1862e5

These were internal review notes that don't belong in the shipped design workspace. Remove the code-review/ subfolder; everything user-facing (plan, implementation-notes, qa, manual-qa-checklist, status, README) stays.

internal CR with findings

ef2c1fc

Updated findings

633d513

Merge branch 'release/v0.96.11' into feat/llm-judge-chat-unification

e8566b0

Merge branch 'feat/llm-judge-chat-unification' of github.com:Agenta-A…

c38227a

…I/agenta into feat/llm-judge-chat-unification

fix findings

140c210

Merge branch 'main' into feat/llm-judge-chat-unification

6caded2

Merge branch 'main' into feat/llm-judge-chat-unification

eda3a7e

fix: open create prompt modal when navigating from onboarding screen

ceec9ae

fix: wait for router.isReady before reading create_prompt query param

a0534e4

fix: resolve merge conflict with upstream/main

8d2e07f

v0.98.1

7e85a68

dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 5, 2026

vercel Bot deployed to Preview May 5, 2026 07:58 View deployment

Merge pull request #4231 from Agenta-AI/feat/llm-judge-chat-unification

8a1b14d

feat(sdk): prompt runtime unification + WP-B1 implementation

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels May 5, 2026

Merge branch 'release/v0.98.1' into fix/onboarding-modal-not-opening

e2d67f4

vercel Bot deployed to Preview May 5, 2026 08:31 View deployment

Merge pull request #4252 from Agenta-AI/fix/api/admin-accounts-openapi

14d6f85

chore(api): hide duplicate /admin/accounts route from OpenAPI

vercel Bot deployed to Preview May 5, 2026 08:37 View deployment

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

bekossy and others added 5 commits May 5, 2026 11:37

fix: update onboarding modal activation and clean up router usage

a2fbb9f

docs: update README.md [skip ci]

2acdc29

docs: update .all-contributorsrc [skip ci]

728beac

Merge pull request #4260 from Devarsh05/fix/onboarding-modal-not-opening

8ddc013

fix: open create prompt modal when navigating from onboarding screen

Merge pull request #4263 from Agenta-AI/all-contributors/add-Devarsh05

f1dd38b

docs: add Devarsh05 as a contributor for bug

vercel Bot deployed to Preview May 5, 2026 09:44 View deployment

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

fix: add validation for empty variable references in dot-notation res…

7df5210

…olver

vercel Bot deployed to Preview May 5, 2026 10:45 View deployment

fix: refactor AWS credentials handling in mockllm context

3284371

vercel Bot deployed to Preview May 5, 2026 11:04 View deployment

bekossy enabled auto-merge May 5, 2026 11:44

bekossy approved these changes May 5, 2026

View reviewed changes

bekossy merged commit 474eec0 into main May 5, 2026
30 of 31 checks passed

dosubot Bot added the lgtm This PR has been approved by a maintainer label May 5, 2026

Conversation

github-actions Bot commented May 5, 2026

Uh oh!

vercel Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

CLAassistant commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vercel Bot commented May 5, 2026 •

edited

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading

github-actions Bot commented May 5, 2026 •

edited

Loading

CLAassistant commented May 5, 2026 •

edited

Loading