Fix : Fix revisions commit endpoints silently discard unknown fields …#4415
Fix : Fix revisions commit endpoints silently discard unknown fields …#4415justin212407 wants to merge 2 commits into
Conversation
…in data Signed-off-by: justin212407 <charlesjustin2124@gmail.com>
|
@justin212407 is attempting to deploy a commit to the agenta projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR updates revision data models across the backend API and Python SDK to forbid unknown fields via Pydantic's ChangesStrict validation for revision data models
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR hardens revision commit validation across git-backed entities to prevent silent data loss when clients send stale/unknown top-level fields inside the data payload (e.g., legacy ag_config). It does so by configuring the relevant Pydantic *RevisionData models to forbid extra fields, causing FastAPI to return HTTP 422 with a precise validation error instead of accepting-and-dropping data.
Changes:
- Set
model_config = ConfigDict(extra="forbid")on workflow/application/evaluator revision data models in the Python SDK (which the OSS API reuses for workflows). - Set
model_config = ConfigDict(extra="forbid")onQueryRevisionDataandTestsetRevisionDataDTOs in the OSS API. - Update Pydantic imports to include
ConfigDictwhere needed.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
sdks/python/agenta/sdk/models/workflows.py |
Forbid unknown keys in workflow/application/evaluator revision data payloads via ConfigDict(extra="forbid"). |
sdks/python/agenta/sdk/models/testsets.py |
Forbid unknown keys in SDK TestsetRevisionData. |
api/oss/src/core/testsets/dtos.py |
Forbid unknown keys in API TestsetRevisionData to avoid silent drops on commits. |
api/oss/src/core/queries/dtos.py |
Forbid unknown keys in API QueryRevisionData to avoid silent drops on commits. |
api/oss/src/core/evaluators/dtos.py |
Forbid unknown keys in evaluator revision data payloads. |
api/oss/src/core/applications/dtos.py |
Forbid unknown keys in application revision data payloads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class WorkflowRevisionData(BaseModel): | ||
| model_config = ConfigDict(extra="forbid") | ||
| uri: Optional[str] = None |
There was a problem hiding this comment.
The manual curl verification confirms the 422 behavior works correctly for /applications/revisions/commit. Acceptance tests for all five domains would be a good follow-up. Happy to add those here in this pr or a new one as preffered.
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 353f9ae9-b40f-4677-9aa8-c06dbfd5f49c
📒 Files selected for processing (6)
api/oss/src/core/applications/dtos.pyapi/oss/src/core/evaluators/dtos.pyapi/oss/src/core/queries/dtos.pyapi/oss/src/core/testsets/dtos.pysdks/python/agenta/sdk/models/testsets.pysdks/python/agenta/sdk/models/workflows.py
|
|
||
|
|
||
| class QueryRevisionData(BaseModel): | ||
| model_config = ConfigDict(extra="forbid") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether query payload sanitization swallows validation failures and removes data.
rg -n -C4 '_sanitize_persisted_query_revision_data|model_validate\(|except Exception|payload\.pop\("data"' api/oss/src/core/queries/service.pyRepository: Agenta-AI/agenta
Length of output: 1062
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Inspect QueryRevisionData definition =="
ls -la api/oss/src/core/queries/dtos.py
wc -l api/oss/src/core/queries/dtos.py
rg -n "class QueryRevisionData|model_config" api/oss/src/core/queries/dtos.py
# show around the forbid-extra config
sed -n '120,180p' api/oss/src/core/queries/dtos.py
echo
echo "== Inspect _sanitize_persisted_query_revision_data usage & surrounding flow =="
wc -l api/oss/src/core/queries/service.py
rg -n "_sanitize_persisted_query_revision_data" api/oss/src/core/queries/service.py
# show the whole sanitizer + immediate callsite area
sed -n '1,220p' api/oss/src/core/queries/service.py
echo
echo "== Check router layer behavior when 'data' is missing =="
# locate router methods handling persisted query revisions
rg -n "persist|revision|payload\\[\"data\"\\]|payload\\.pop\\(\"data\"|sanitize" api/oss/src/apis/fastapi/ api/oss/src/core/queries -S || true
# try to show relevant router/service integration
rg -n "core\\.queries|queries\\.service|QueryRevision" api/oss/src/apis/fastapi -S || trueRepository: Agenta-AI/agenta
Length of output: 50373
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate queries router and commit endpoint =="
ls -la api/oss/src/apis/fastapi/queries || true
rg -n "commit_query_revision|/queries/.*/revisions/commit|QueryRevisionCommit" api/oss/src/apis/fastapi/queries -S || true
rg -n "commit.*revision|revisions/commit|QueryRevisionCommit" api/oss/src/apis/fastapi/queries -S || true
echo
echo "== Inspect queries router handler body around commit =="
# Try best-effort: grab the first matching handler and print a small window
match_file="$(rg -l "commit_query_revision" api/oss/src/apis/fastapi/queries -S || true)"
if [ -n "${match_file}" ]; then
echo "Router file: ${match_file}"
rg -n "commit_query_revision" "${match_file}" -S | head -n 20
line="$(rg -n "commit_query_revision" "${match_file}" -S | head -n 1 | cut -d: -f1)"
start=$((line-40)); end=$((line+80))
sed -n "${start},${end}p" "${match_file}" | nl -ba | head -n 200
else
echo "No commit_query_revision call found in queries router."
fi
echo
echo "== Inspect core QueryRevisionCommit DTO and commit service flow =="
rg -n "class QueryRevisionCommit|QueryRevisionCommit\\b" api/oss/src/core/queries/dtos.py api/oss/src/core/queries/service.py -S
# Show commit_query_revision method implementation region
rg -n "async def commit_query_revision" api/oss/src/core/queries/service.py -S
l="$(rg -n "async def commit_query_revision" api/oss/src/core/queries/service.py -S | head -n 1 | cut -d: -f1)"
start=$((l-10)); end=$((l+80))
sed -n "${start},${end}p" api/oss/src/core/queries/service.py | nl -ba | head -n 200
echo
echo "== Inspect FastAPI request models for queries revision commit (extras behavior) =="
# Look for the request model used in router for commit endpoint
rg -n "RevisionCommit|QueryRevisionCommit|revisions/commit" api/oss/src/apis/fastapi/queries -S || true
# If there is a models.py, inspect it briefly
if [ -f api/oss/src/apis/fastapi/queries/models.py ]; then
echo "Found api/oss/src/apis/fastapi/queries/models.py"
wc -l api/oss/src/apis/fastapi/queries/models.py
rg -n "RevisionCommit|QueryRevisionCommit|extra" api/oss/src/apis/fastapi/queries/models.py -S || true
rg -n "class .*Commit|def .*commit|revision_commit" api/oss/src/apis/fastapi/queries/models.py -S || true
else
echo "No api/oss/src/apis/fastapi/queries/models.py"
ls -la api/oss/src/apis/fastapi/queries || true
fiRepository: Agenta-AI/agenta
Length of output: 3379
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Inspect QueryRevisionCommit DTO definition (type of .data) =="
rg -n "class QueryRevisionCommit\\b" api/oss/src/core/queries/dtos.py
l="$(rg -n "class QueryRevisionCommit\\b" api/oss/src/core/queries/dtos.py | head -n 1 | cut -d: -f1)"
start=$((l-30)); end=$((l+120))
sed -n "${start},${end}p" api/oss/src/core/queries/dtos.py
echo
echo "== Inspect FastAPI request model for commit =="
rg -n "class QueryRevisionCommitRequest" api/oss/src/apis/fastapi/queries/models.py
l="$(rg -n "class QueryRevisionCommitRequest" api/oss/src/apis/fastapi/queries/models.py | head -n 1 | cut -d: -f1)"
start=$((l-20)); end=$((l+60))
sed -n "${start},${end}p" api/oss/src/apis/fastapi/queries/models.py
echo
echo "== Inspect router handler for commit =="
l="$(rg -n "async def commit_query_revision\\b" api/oss/src/apis/fastapi/queries/router.py | head -n 1 | cut -d: -f1)"
start=$((l-30)); end=$((l+90))
sed -n "${start},${end}p" api/oss/src/apis/fastapi/queries/router.py
echo
echo "== Inspect QueriesService.commit_query_revision around sanitize =="
l="$(rg -n "async def commit_query_revision\\b" api/oss/src/core/queries/service.py | head -n 1 | cut -d: -f1)"
start=$((l-40)); end=$((l+80))
sed -n "${start},${end}p" api/oss/src/core/queries/service.pyRepository: Agenta-AI/agenta
Length of output: 13704
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find call sites of commit_query_revision (core service) =="
rg -n "commit_query_revision\\(" api/oss/src/core/queries -S
rg -n "commit_query_revision\\(" api/oss/src -S | head -n 200
echo
echo "== Inspect how QueryRevisionCommit is constructed in those call sites =="
# Show small windows around each occurrence in core/queries/service.py
rg -n "commit_query_revision\\(" api/oss/src/core/queries/service.py -n -S
for l in $(rg -n "commit_query_revision\\(" api/oss/src/core/queries/service.py -S | cut -d: -f1 | head -n 20); do
start=$((l-20)); end=$((l+40));
echo "--- service.py around line ${l} ---";
sed -n "${start},${end}p" api/oss/src/core/queries/service.py | nl -ba | head -n 120
done
echo
echo "== Check router model validation config for QueryRevisionCommitRequest =="
# QueryRevisionCommitRequest just wraps QueryRevisionCommit; check if any custom config is present
l="$(rg -n "class QueryRevisionCommitRequest\\b" api/oss/src/apis/fastapi/queries/models.py | head -n 1 | cut -d: -f1)"
start=$((l-30)); end=$((l+40))
sed -n "${start},${end}p" api/oss/src/apis/fastapi/queries/models.pyRepository: Agenta-AI/agenta
Length of output: 1240
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find QueryRevisionCommit constructions =="
rg -n "QueryRevisionCommit\\(" api/oss/src/core/queries/service.py api/oss/src/core/queries -S
rg -n "QueryRevisionCommit\\(" api/oss/src -S | head -n 200
echo
echo "== Inspect service.py around internal commit callers (~1052, ~1278) =="
sed -n '980,1120p' api/oss/src/core/queries/service.py | cat -n
echo
sed -n '1200,1360p' api/oss/src/core/queries/service.py | cat -n
echo
echo "== Search for model_construct or raw dict usage for QueryRevisionData =="
rg -n "model_construct\\(|QueryRevisionData\\s*=|QueryRevisionData\\(" api/oss/src/core/queries -S
rg -n "QueryRevisionData\\.model_validate\\(|QueryRevisionData\\(" api/oss/src -S | head -n 200Repository: Agenta-AI/agenta
Length of output: 13125
Avoid silently dropping QueryRevisionData when payload sanitization fails
_sanitize_persisted_query_revision_datacatchesExceptionfromQueryRevisionData.model_validate(...)and returnsNone;_sanitize_query_revision_payloadthen doespayload.pop("data", None), so sanitizer/model-validation failures get converted into “no data”.- For the FastAPI commit route, extra fields should already be rejected because
QueryRevisionDatausesConfigDict(extra="forbid")and the router acceptsQueryRevisionCommitRequest(nestedQueryRevisionCommit.data: Optional[QueryRevisionData]); still, invaliddatareaching the sanitizer should be surfaced/logged rather than silently dropped.
|
|
||
|
|
||
| class TestsetRevisionData(BaseModel): | ||
| model_config = ConfigDict(extra="forbid") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether testset payload sanitization swallows validation failures and removes data.
rg -n -C4 '_sanitize_persisted_testset_revision_data|model_validate\(|except Exception|payload\.pop\("data"' api/oss/src/core/testsets/service.pyRepository: Agenta-AI/agenta
Length of output: 2164
Avoid silently dropping invalid testset revision payloads
TestsetRevisionData forbids extras, but api/oss/src/core/testsets/service.py wraps TestsetRevisionData.model_validate(data) in try/except Exception and returns None; the caller then executes payload.pop("data", None) when sanitization fails. This converts validation failures into a silent removal of the revision payload’s data field instead of surfacing an error.
Although Pydantic v2 inherits model_config from WorkflowRevisionData, declaring extra="forbid" explicitly on ApplicationRevisionData and EvaluatorRevisionData (both API and SDK) makes the strict-validation contract grep-friendly and matches the per-domain expectation in Agenta-AI#4315. Also drop the now-unused ConfigDict import from workflows/dtos.py; WorkflowRevisionData inherits its strict config from the SDK base. Picked up from PR Agenta-AI#4415. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Closing this since #4345 got merged. |
Summary
What changed: Added
model_config = ConfigDict(extra="forbid")to all*RevisionDataclasses across all five affected domains:api/oss/src/core/applications/dtos.pyapi/oss/src/core/evaluators/dtos.pyapi/oss/src/core/queries/dtos.pyapi/oss/src/core/testsets/dtos.pysdks/python/agenta/sdk/models/workflows.pysdks/python/agenta/sdk/models/testsets.pyThis causes Pydantic to reject unknown fields in the
datapayload during revision commit, returning HTTP 422 with a validation error naming the offending field.Why: Fixes issue #4315 : revision commit endpoints silently discarded unknown top-level fields inside
data(e.g., the legacyag_configwrapper), while still returning HTTP 200 withcount: 1, creating a false sense of success. Users had no signal that their data was lost.What problem it solves:
jp-agenta: "I'd extend the fix to all RevisionData, that is to all git-backed entities")Testing
Verified locally with curl against the running server.
Stale payload -
data: {"ag_config": {...}}now returns HTTP 422:{ "detail": [ { "type": "extra_forbidden", "loc": ["body", "application_revision_commit", "data", "ag_config"], "msg": "Extra inputs are not permitted", "input": { "prompt": { "messages": [{"role": "system", "content": "sys"}] } } } ] }Valid payload -
data: {"parameters": {...}}continues to return HTTP 200 withcount: 1. Noextra_forbiddenerror.Demo
Before - stale
ag_configpayload silently accepted, returnedcount: 1, storeddata: {}After - stale
ag_configpayload rejected with HTTP 422 and clear error naming the offending fieldChecklist
Related Issues
data#4315 - Revisions commit endpoints silently discard unknown fields indataunsupported#4173 - Tracing: query↔ingest schema asymmetryunsupportedFiles Changed
api/oss/src/core/applications/dtos.py- Addedmodel_config = ConfigDict(extra="forbid")to*RevisionDataclassesapi/oss/src/core/evaluators/dtos.py- Addedmodel_config = ConfigDict(extra="forbid")to*RevisionDataclassesapi/oss/src/core/queries/dtos.py- Addedmodel_config = ConfigDict(extra="forbid")to*RevisionDataclassesapi/oss/src/core/testsets/dtos.py- Addedmodel_config = ConfigDict(extra="forbid")to*RevisionDataclassessdks/python/agenta/sdk/models/workflows.py- Addedmodel_config = ConfigDict(extra="forbid")toWorkflowRevisionDatasdks/python/agenta/sdk/models/testsets.py- Addedmodel_config = ConfigDict(extra="forbid")to*RevisionDataclasses