Skip to content

feat(sdk): unify structured prompt rendering#4331

Merged
bekossy merged 8 commits into
release/v0.99.9from
feat/wp-b2-rendering-unification
May 15, 2026
Merged

feat(sdk): unify structured prompt rendering#4331
bekossy merged 8 commits into
release/v0.99.9from
feat/wp-b2-rendering-unification

Conversation

@mmabrouk
Copy link
Copy Markdown
Member

Summary

Implements WP-B2 for prompt runtime unification.

  • Adds agenta.sdk.utils.rendering with render_messages(...), render_json_like(...), and typed StructuredRenderingError.
  • Routes PromptTemplate.format(...) through the structured renderer while preserving TemplateFormatError for chat/completion callers.
  • Routes auto_ai_critique_v0(...) through the shared renderer for prompt messages and judge json_schema rendering.
  • Aligns judge Jinja failures to raise PromptFormattingV0Error instead of silently sending unrendered content to the LLM.
  • Adds pure renderer tests and call-site tests for PromptTemplate and LLM-as-a-judge.
  • Adds/updates WP-B2 design, QA, and status docs.

Validation

  • cd sdks/python && uv run ruff format agenta/sdk/utils/rendering.py agenta/sdk/utils/types.py agenta/sdk/engines/running/handlers.py oss/tests/pytest/unit/test_structured_rendering.py oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py oss/tests/pytest/unit/test_prompt_template_extensions.py oss/tests/pytest/unit/test_jinja2_sandbox.py oss/tests/pytest/unit/test_render_template_helper.py
  • cd sdks/python && uv run ruff check --fix agenta/sdk/utils/rendering.py agenta/sdk/utils/types.py agenta/sdk/engines/running/handlers.py oss/tests/pytest/unit/test_structured_rendering.py oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py oss/tests/pytest/unit/test_prompt_template_extensions.py oss/tests/pytest/unit/test_jinja2_sandbox.py oss/tests/pytest/unit/test_render_template_helper.py
  • cd sdks/python && uv run pytest oss/tests/pytest/unit -q

Result: 411 passed, 3 warnings.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment May 15, 2026 1:46pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 88d425b0-e91f-4459-8a49-5d20a660b8dc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements WP-B2 rendering unification by introducing a shared structured rendering layer (render_messages, render_json_like) that unifies prompt message and JSON-like value templating across completion/chat and LLM-as-a-judge paths, with comprehensive RFC/planning documentation, integration into PromptTemplate and handlers, and test coverage verifying correct behavior and error handling.

Changes

WP-B2 Rendering Unification

Layer / File(s) Summary
WP-B2 RFC, research, planning, QA documentation, and status tracking
docs/design/prompt-runtime-unification/README.md, docs/design/prompt-runtime-unification/wp-b2-rendering-unification/*
RFC document, research documentation, design plan, QA strategy, status tracking, and workspace README comprehensively specify the WP-B2 structured rendering unification feature, its four-phase implementation plan, test strategy, validation goals, and implementation progress.
Structured rendering module for messages and JSON-like values
sdks/python/agenta/sdk/utils/rendering.py
New rendering.py module with render_messages(...) and render_json_like(...) functions that render template strings within message content (text parts only, preserving non-text parts like images and audio) and recursively within JSON-like nested structures (lists/mappings), validating message shapes and detecting key collisions. Errors are wrapped as StructuredRenderingError with detailed paths and original error context.
PromptTemplate integration with structured renderers
sdks/python/agenta/sdk/utils/types.py
PromptTemplate.format(...) delegates message rendering to render_messages(...) and response-format variable substitution to render_json_like(...). New _template_error_from_structured_error(...) helper centralizes conversion of StructuredRenderingError to TemplateFormatError, with specialized messages for unresolved variables and Jinja2 failures.
Handler integration: _format_with_template and auto_ai_critique_v0 updates
sdks/python/agenta/sdk/engines/running/handlers.py
_format_with_template() removes Jinja2-specific silent-failure behavior. auto_ai_critique_v0() removes early response-format initialization, switches prompt formatting to render_messages(...), and renders json_schema via render_json_like(...) before building response_format, ensuring consistent error handling with completion/chat.
New test module for structured rendering behavior
sdks/python/oss/tests/pytest/unit/test_structured_rendering.py
New comprehensive test module verifying render_messages and render_json_like correctly handle message objects, dict-form messages, text-part vs non-text-part preservation, Jinja error wrapping with paths, key collision detection, and input immutability.
Integration tests for PromptTemplate and judge handler rendering
sdks/python/oss/tests/pytest/unit/test_prompt_template_extensions.py, sdks/python/oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py, sdks/python/oss/tests/pytest/unit/test_jinja2_sandbox.py, sdks/python/oss/tests/pytest/unit/test_render_template_helper.py
Updated and new tests verifying PromptTemplate.format(...) renders response-format JSON-schema fields with templated properties, auto_ai_critique_v0 renders variables inside json_schema before LLM calls, template errors raise exceptions without reaching the LLM, and Jinja2 sandbox violations now raise instead of silently returning payloads.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Agenta-AI/agenta#4231: WP-B1 refactoring of _format_with_template and low-level render_template that this PR builds upon by introducing the new structured rendering layer above it.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.27% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(sdk): unify structured prompt rendering' directly and accurately describes the main change: the introduction of unified structured prompt rendering via a new rendering module.
Description check ✅ Passed The description provides a comprehensive summary of the implementation, explaining key additions (rendering module with structured functions), routing changes (PromptTemplate and auto_ai_critique_v0), error behavior alignment, and test coverage—all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/wp-b2-rendering-unification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread sdks/python/agenta/sdk/utils/types.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
sdks/python/oss/tests/pytest/unit/test_jinja2_sandbox.py (1)

21-22: ⚡ Quick win

Assert a concrete exception type for sandbox violations.

Line 21 uses pytest.raises(Exception), which can pass on unrelated failures. Please assert the actual contract type (or wrapper type) to keep this test precise.

Suggested tightening
+from agenta.sdk.utils.lazy import _load_jinja2
...
 def test_handlers_jinja2_blocks_ssti_payload() -> None:
-    with pytest.raises(Exception):
+    _, TemplateError = _load_jinja2()
+    with pytest.raises(TemplateError):
         _format_with_template(
             content=SSTI_PAYLOAD,
             format="jinja2",
             kwargs={},
         )
sdks/python/oss/tests/pytest/unit/test_render_template_helper.py (1)

786-789: ⚡ Quick win

Use a specific exception assertion here, not Exception.

Line 788 is too broad and may hide unrelated failures. Match the concrete Jinja sandbox exception contract (consistent with the rest of this file).

Suggested tightening
 def test_handlers_format_with_template_jinja2_raises_on_sandbox_violation():
     payload = "{{ lipsum.__globals__['os'].popen('id').read() }}"
-    with pytest.raises(Exception):
+    _, TemplateError = _load_jinja2()
+    with pytest.raises(TemplateError):
         _format_with_template(content=payload, format="jinja2", kwargs={})

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a774c3b6-f63f-470d-b5c0-c819b76d7fce

📥 Commits

Reviewing files that changed from the base of the PR and between 0fae4a5 and f7cbeb7.

📒 Files selected for processing (15)
  • docs/design/prompt-runtime-unification/README.md
  • docs/design/prompt-runtime-unification/wp-b2-rendering-unification/README.md
  • docs/design/prompt-runtime-unification/wp-b2-rendering-unification/plan.md
  • docs/design/prompt-runtime-unification/wp-b2-rendering-unification/qa.md
  • docs/design/prompt-runtime-unification/wp-b2-rendering-unification/research.md
  • docs/design/prompt-runtime-unification/wp-b2-rendering-unification/rfc.md
  • docs/design/prompt-runtime-unification/wp-b2-rendering-unification/status.md
  • sdks/python/agenta/sdk/engines/running/handlers.py
  • sdks/python/agenta/sdk/utils/rendering.py
  • sdks/python/agenta/sdk/utils/types.py
  • sdks/python/oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py
  • sdks/python/oss/tests/pytest/unit/test_jinja2_sandbox.py
  • sdks/python/oss/tests/pytest/unit/test_prompt_template_extensions.py
  • sdks/python/oss/tests/pytest/unit/test_render_template_helper.py
  • sdks/python/oss/tests/pytest/unit/test_structured_rendering.py

Comment thread sdks/python/agenta/sdk/utils/types.py Outdated
@mmabrouk mmabrouk marked this pull request as ready for review May 14, 2026 15:45
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request python Pull requests that update Python code tests labels May 14, 2026
@mmabrouk mmabrouk marked this pull request as draft May 14, 2026 15:45
@mmabrouk mmabrouk marked this pull request as ready for review May 15, 2026 10:44
@dosubot dosubot Bot added the refactoring A code change that neither fixes a bug nor adds a feature label May 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

Railway Preview Environment

Status Destroyed (PR converted to draft)

Updated at 2026-05-15T11:29:42.394Z

@mmabrouk
Copy link
Copy Markdown
Member Author

mmabrouk commented May 15, 2026

QA

  • Smoke tests in playground an llm-as-a-judge playground done

Potential next test

Regression Tests

  • 5. Multipart content messages: Create a prompt in chat with image_url or file content parts alongside text parts — verify text parts render and non-text parts pass through unchanged
  • 6. None content messages: A prompt with an assistant message that has content: null (tool-call-only messages) — verify it doesn't crash
  • 7. response_format with json_schema: Create a prompt with response_format.type = "json_schema" and placeholders inside the schema (e.g., "description": "Evaluate {{topic}}") — verify the schema is rendered before the LLM call
  • 8. response_format with json_object: Same but with type: "json_object" — verify no schema is attached
  • 10. Fallback configs response_format: If using fallback model configs with their own response_format, verify those schemas are also rendered

Edge Cases

  • 11. Jinja2 sandbox violation in playground: Enter a prompt with {{ lipsum.globals['os'].popen('id').read() }} — verify it raises a clear error, NOT silently sends the raw template to the LLM
  • 12. Jinja2 sandbox violation in judge evaluator: Same payload in an LLM-as-a-judge prompt template — verify the evaluation fails with a PromptFormattingV0Error, not a silent pass-through
  • 13. Missing variable in curly template: Use {{name}} but don't provide name in inputs — verify a clear error message mentioning the missing variable
  • 14. Empty messages list: A prompt with no messages — verify it doesn't crash (should produce an empty rendered list)
  • 15. Key collision in json_schema: A schema with both "{{field}}" and "name" where field resolves to "name" — verify it raises an error about the collision rather than silently overwriting

@mmabrouk mmabrouk requested a review from jp-agenta May 15, 2026 11:26
@mmabrouk mmabrouk marked this pull request as draft May 15, 2026 11:29
@mmabrouk mmabrouk marked this pull request as ready for review May 15, 2026 11:29
@mmabrouk mmabrouk marked this pull request as draft May 15, 2026 11:34
Comment thread sdks/python/agenta/sdk/utils/rendering.py
Comment thread sdks/python/agenta/sdk/utils/rendering.py
Comment thread sdks/python/agenta/sdk/utils/rendering.py
Comment thread sdks/python/agenta/sdk/utils/rendering.py
Comment thread sdks/python/agenta/sdk/utils/rendering.py Outdated
Comment thread sdks/python/agenta/sdk/utils/rendering.py
Comment thread sdks/python/agenta/sdk/utils/rendering.py
Comment thread sdks/python/agenta/sdk/utils/types.py Outdated
@junaway junaway changed the base branch from main to release/v0.99.9 May 15, 2026 12:17
Copy link
Copy Markdown
Contributor

@junaway junaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! @mmabrouk

All comment are non-blocking.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 15, 2026
@mmabrouk mmabrouk marked this pull request as ready for review May 15, 2026 13:41
@dosubot dosubot Bot added the SDK label May 15, 2026
@bekossy bekossy merged commit 5b49124 into release/v0.99.9 May 15, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request lgtm This PR has been approved by a maintainer python Pull requests that update Python code refactoring A code change that neither fixes a bug nor adds a feature SDK size:XXL This PR changes 1000+ lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants