UN-3403 [FEAT] Agentic table extractor plugin with multi-agent LLM-powered table extraction#1914
UN-3403 [FEAT] Agentic table extractor plugin with multi-agent LLM-powered table extraction#1914jaseemjaskp merged 108 commits intomainfrom
Conversation
Conflicts resolved: - docker-compose.yaml: Use main's dedicated dashboard_metric_events queue for worker-metrics - PromptCard.jsx: Keep tool_id matching condition from our async socket feature - PromptRun.jsx: Merge useEffect import from main with our branch - ToolIde.jsx: Keep fire-and-forget socket approach (spinner waits for socket event) - SocketMessages.js: Keep both session-store and socket-custom-tool imports + updateCusToolMessages dep - SocketContext.js: Keep simpler path-based socket connection approach - usePromptRun.js: Keep Celery fire-and-forget with socket delivery over polling - setupProxy.js: Accept main's deletion (migrated to Vite)
for more information, see https://pre-commit.ci
… into feat/execution-backend
for more information, see https://pre-commit.ci
… into feat/execution-backend
for more information, see https://pre-commit.ci
|
| Filename | Overview |
|---|---|
| frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx | Adds AgenticTableChecklist plugin slot and isAgenticTableReady state. Removes the enforceType === TABLE guard from TableExtractionSettingsBtn, making it render for all prompt types — filtering now depends solely on plugin internals. |
| workers/file_processing/structure_tool_task.py | Partitions outputs into agentic/regular, validates agentic settings, dispatches each agentic prompt to a dedicated executor, then optionally runs the legacy pipeline. Missing log_events_id in agentic ExecutionContext may prevent IDE log streaming. |
| workers/ide_callback/tasks.py | Reshapes agentic executor output to map the tables list under prompt_key. Replaces outputs wholesale, discarding any sibling keys the executor might return alongside tables. |
| backend/prompt_studio/prompt_studio_core_v2/views.py | Adds an agentic-table fast path in fetch_response that builds the payload via the cloud plugin, dispatches to a dedicated Celery queue, and returns 202. The is_first_prompt_run query is copied identically from the existing non-agentic path. |
| unstract/sdk1/src/unstract/sdk1/llm.py | Adds complete_vision for multimodal (text + image) completions. Follows the same structure as complete() — error handling, usage recording, and LLMResponseCompat wrapping all match the existing pattern. |
| workers/executor/executors/legacy_executor.py | Adds a defensive agentic_table skip guard in _apply_type_conversion and refactors email handling to use the shared _convert_scalar_answer helper, aligned with updated tests. |
| backend/prompt_studio/prompt_studio_v2/migrations/0014_alter_toolstudioprompt_enforce_type.py | Adds agentic_table to the enforce_type choices in the migration, correctly chaining from 0013. |
| workers/tests/test_answer_prompt.py | Updates NA-sanitization test expectations from preserved to None, consistent with the refactored _sanitize_null_values behavior change. |
Sequence Diagram
sequenceDiagram
participant UI as Prompt Studio UI
participant BE as Backend (views.py)
participant Plugin as Cloud Plugin
participant Celery as Celery (agentic_table queue)
participant Executor as AgenticTable Executor
participant CB as IDE Callback Worker
UI->>BE: POST /fetch_response (enforce_type=agentic_table)
BE->>Plugin: build_agentic_table_payload(...)
Plugin-->>BE: context, cb_kwargs
BE->>Celery: dispatch_with_callback(context, on_success=ide_prompt_complete)
BE-->>UI: 202 Accepted {task_id, run_id}
Celery->>Executor: execute table extraction (page-by-page)
Executor-->>CB: {tables, page_count, ...}
CB->>CB: reshape outputs[prompt_key] = tables
CB->>BE: update_prompt_output(outputs)
BE-->>UI: WebSocket event (run complete)
Comments Outside Diff (1)
-
frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx, line 397-402 (link)TableExtractionSettingsBtnrenders for all enforce types after guard removalThe
enforceType === TABLEguard was dropped, soTableExtractionSettingsBtnnow mounts for every prompt type (text, number, boolean, etc.) whenever the cloud plugin is available. Filtering now depends entirely on the plugin component's internal logic — if the plugin renders the button unconditionally, users will see a "Table Extraction Settings" gear on every prompt card regardless of type.The
enforceTypeprop is still forwarded to the component, so the fix can live inside the plugin; but removing the OSS-side guard without a corresponding guard in the plugin (which can't be verified here) is a regression path that silently shows the button where it shouldn't appear.Prompt To Fix With AI
This is a comment left during a code review. Path: frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx Line: 397-402 Comment: **`TableExtractionSettingsBtn` renders for all enforce types after guard removal** The `enforceType === TABLE` guard was dropped, so `TableExtractionSettingsBtn` now mounts for every prompt type (text, number, boolean, etc.) whenever the cloud plugin is available. Filtering now depends entirely on the plugin component's internal logic — if the plugin renders the button unconditionally, users will see a "Table Extraction Settings" gear on every prompt card regardless of type. The `enforceType` prop is still forwarded to the component, so the fix can live inside the plugin; but removing the OSS-side guard without a corresponding guard in the plugin (which can't be verified here) is a regression path that silently shows the button where it shouldn't appear. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx
Line: 397-402
Comment:
**`TableExtractionSettingsBtn` renders for all enforce types after guard removal**
The `enforceType === TABLE` guard was dropped, so `TableExtractionSettingsBtn` now mounts for every prompt type (text, number, boolean, etc.) whenever the cloud plugin is available. Filtering now depends entirely on the plugin component's internal logic — if the plugin renders the button unconditionally, users will see a "Table Extraction Settings" gear on every prompt card regardless of type.
The `enforceType` prop is still forwarded to the component, so the fix can live inside the plugin; but removing the OSS-side guard without a corresponding guard in the plugin (which can't be verified here) is a regression path that silently shows the button where it shouldn't appear.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: workers/file_processing/structure_tool_task.py
Line: 455-463
Comment:
**`log_events_id` absent from agentic table `ExecutionContext`**
The legacy `ExecutionContext` includes `log_events_id=StateStore.get("LOG_EVENTS_ID") or ""` so the executor can stream log events back to the IDE. The agentic table context omits this field entirely. During an IDE agentic-table run, the executor worker won't know which log-events channel to write to, so real-time log lines won't appear in the Prompt Studio UI.
Consider adding `log_events_id=StateStore.get("LOG_EVENTS_ID") or ""` to the `at_ctx` constructor to match the legacy pipeline.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: workers/ide_callback/tasks.py
Line: 846-851
Comment:
**Wholesale `outputs` replacement drops sibling executor keys**
The reshape replaces `outputs` entirely with a single-key dict, discarding any other fields the agentic executor might return alongside `"tables"` (e.g. page counts, partial-failure info). An in-place remap that only moves the tables value under the prompt key would be safer and leave other keys intact for future use.
How can I resolve this? If you propose a fix, please make it concise.Reviews (4): Last reviewed commit: "Merge branch 'main' into feat/agentic-ta..." | Re-trigger Greptile
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx (1)
300-306:⚠️ Potential issue | 🟡 MinorKeep the table-settings button behind an enforce-type gate.
This now renders the settings entry for every prompt as soon as the plugin is installed, including text/number/email prompts. That is confusing at best, and it makes it easier to save table-specific config on incompatible prompt types.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx` around lines 300 - 306, The TableExtractionSettingsBtn is being rendered for all prompts; guard its render with the enforce-type check so the settings only appear for table-enforced prompts. Update the conditional around TableExtractionSettingsBtn in PromptCardItems.jsx (the JSX block that currently uses TableExtractionSettingsBtn, promptDetails?.prompt_id, enforceType, setAllTableSettings) to require a table-specific enforceType (e.g., enforceType === 'table' or enforceType?.includes('table')) in addition to TableExtractionSettingsBtn before rendering the component, so incompatible prompt types won’t show the table settings.
🧹 Nitpick comments (5)
unstract/sdk1/src/unstract/sdk1/llm.py (1)
390-390: Avoid per-call global mutation oflitellm.drop_params.Line 390 reassigns a module-global already initialized at Line 33; this contradicts the module-level intent to avoid repeated global mutation per request.
♻️ Proposed fix
- litellm.drop_params = True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/llm.py` at line 390, Remove the per-call reassignment of the module-global litellm.drop_params (the assignment at the shown call site) and instead either set the desired value once at module initialization where litellm is imported/initialized (the earlier initialization around line 33) or avoid mutating the global by using a local variable (e.g., drop_params) and pass that into the litellm API calls; in short, delete the litellm.drop_params = True line and either consolidate the flag into module-level setup or thread a local parameter through the functions that invoke litellm.frontend/src/hooks/usePromptRun.js (1)
19-23: Prefer a config-driven timeout instead of a fixed 16-minute constant.Line 23 can silently drift from server adapter settings across environments. Consider sourcing this value from backend-exposed config (with buffer applied client-side) to avoid premature UI timeout regressions after infra changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/hooks/usePromptRun.js` around lines 19 - 23, The hardcoded SOCKET_TIMEOUT_MS constant in usePromptRun.js can drift from server adapter settings; change it to derive the timeout from a backend-exposed config value (e.g., an API response or injected runtime config) and apply the client-side buffer (e.g., subtract or add the intended 1 minute) when computing SOCKET_TIMEOUT_MS; implement a safe fallback to the current 16-minute value if the backend config is unavailable, and update any functions using SOCKET_TIMEOUT_MS so they reference the computed/config-driven value instead of the hardcoded constant.docker/docker-compose.yaml (1)
532-532: Good queue addition—mirror this default in all deployment targets.Line 532 is correct for local/dev, but please ensure Helm/chart and runtime env defaults include
celery_executor_agentic_tableas well, or agentic-table jobs can remain unconsumed in some environments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/docker-compose.yaml` at line 532, The docker-compose default for CELERY_QUEUES_EXECUTOR was extended to include celery_executor_agentic_table but other deployment targets are missing it; update all runtime/defaults to match by adding celery_executor_agentic_table to the CELERY_QUEUES_EXECUTOR default in Helm values (values.yaml), chart Deployment/StatefulSet env entries (templates/* where CELERY_QUEUES_EXECUTOR is set), and any CI/runtime environment variable configs (e.g., container env vars, systemd or cloud run settings) so every environment uses "celery_executor_legacy,celery_executor_agentic,celery_executor_agentic_table" as the default queue list.backend/prompt_studio/prompt_studio_output_manager_v2/output_manager_helper.py (1)
173-179: Use centralized enforce-type constants here to avoid string drift.The new
agentic_tablebranch is correct, but this block is still string-literal based. Switching to shared constants will prevent future typo/divergence bugs.♻️ Suggested refactor
+from prompt_studio.prompt_studio_core_v2.constants import ( + ToolStudioPromptKeys as TSPKeys, +) ... - if prompt.enforce_type in { - "json", - "table", - "record", - "line-item", - "agentic_table", - }: + if prompt.enforce_type in { + TSPKeys.JSON, + TSPKeys.TABLE, + TSPKeys.RECORD, + TSPKeys.LINE_ITEM, + TSPKeys.AGENTIC_TABLE, + }: output = json.dumps(output)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/prompt_studio/prompt_studio_output_manager_v2/output_manager_helper.py` around lines 173 - 179, Replace the literal string set check on prompt.enforce_type with the centralized enforce-type constants: import and use the shared constants for JSON/TABLE/RECORD/LINE_ITEM/AGENTIC_TABLE (e.g., ENFORCE_TYPE_JSON, ENFORCE_TYPE_TABLE, ENFORCE_TYPE_RECORD, ENFORCE_TYPE_LINE_ITEM, ENFORCE_TYPE_AGENTIC_TABLE) from the module that defines enforce-type values (the centralized constants module in prompt_studio), and change the condition in output_manager_helper.py (the prompt.enforce_type check) to use those constants instead of the string literals to avoid string drift.workers/file_processing/structure_tool_task.py (1)
402-442: Consider defensive access forllmandnamekeys to provide clearer error messages.Lines 414 and 442 use direct key access (
at_output["llm"],at_output[_SK.NAME]) which will raiseKeyErrorwith a generic traceback if missing. Since the validation block (lines 302-313) only checksagentic_table_settings, these fields aren't validated beforehand.If the export process guarantees these keys, this is acceptable. Otherwise, wrapping in explicit checks would produce actionable error messages matching the style at lines 305-313.
🔧 Optional: Add explicit validation for required output keys
for at_output in agentic_table_outputs: at_settings = at_output.get("agentic_table_settings") or {} + if not at_output.get(_SK.NAME) or not at_output.get("llm"): + return ExecutionResult.failure( + error=( + f"Agentic table output is missing required 'name' or 'llm' key. " + f"Re-export the tool from Prompt Studio." + ) + ).to_dict() if not at_settings.get("target_table") or not at_settings.get("json_structure"):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@workers/file_processing/structure_tool_task.py` around lines 402 - 442, The loop over agentic_table_outputs accesses at_output["llm"] and at_output[_SK.NAME] directly which can raise KeyError; add defensive validation before using them (e.g., confirm required keys in each at_output or use at_output.get(...) and raise/return a clear error) so failures mirror the earlier validation style for agentic_table_settings; specifically check each entry in agentic_table_outputs for "llm" and _SK.NAME (or provide sensible defaults) before building agentic_params and before assigning agentic_results[...], and if missing return a structured error result (similar to other validation paths) rather than letting a KeyError bubble from dispatcher.dispatch/ExecutionContext.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.py`:
- Around line 1574-1590: The single-pass prompt filter must also exclude
agentic-table prompts to prevent them from being bundled into legacy single-pass
execution; update the single-pass filter logic (the code that currently excludes
only TSPKeys.TABLE and TSPKeys.RECORD) to additionally exclude
TSPKeys.AGENTIC_TABLE by checking prompt_instance.enforce_type ==
TSPKeys.AGENTIC_TABLE (same symbol used in the single-prompt branch) so
agentic-table prompts follow the payload_modifier_plugin path and do not end up
in legacy_executor silent skips.
In `@frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx`:
- Around line 94-95: The `isAgenticTableReady` state is being used globally
causing non-agentic prompts to be blocked; scope this readiness to only
agentic_table prompts by initializing and updating `isAgenticTableReady` based
on `promptDetails?.prompt_type === 'agentic_table'` (use the
`promptDetails`/`promptId` context) and reset it to true or undefined when
`promptDetails.prompt_type` changes away from 'agentic_table'; update the places
that read this flag (components/functions `Header`, `PromptOutput`, and any
setters in `PromptCardItems.jsx` such as the `setIsAgenticTableReady` usage) so
they only disable run buttons when the current prompt is of type 'agentic_table'
and the readiness flag is false.
In `@workers/executor/executors/legacy_executor.py`:
- Around line 1873-1885: The current guard in the legacy executor silently
returns when output_type == "agentic_table", leaving
structured_output[prompt_name] unset; change this to raise an explicit exception
instead so the run fails visibly: in the same block that checks output_type ==
"agentic_table" (using variables output_type and prompt_name and logger),
replace the silent return with raising a clear exception (e.g., RuntimeError or
ValueError) that includes prompt_name and a message stating the prompt was
misrouted and should have been dispatched to the agentic_table executor; keep
the logger.warning call if you want a log entry before raising so the error is
recorded.
In `@workers/ide_callback/tasks.py`:
- Around line 395-403: The current branch for cb.get("is_agentic_table")
incorrectly replaces the full executor payload with only outputs["tables"],
discarding fields like page_count and headers; instead, preserve the entire
payload by nesting it under the prompt key before calling
update_prompt_output(): when cb.get("is_agentic_table") and prompt_key is set,
set outputs = {prompt_key: outputs} (if outputs is already a dict, wrap that
dict; if it isn't, wrap the original value as-is) so update_prompt_output()
receives the complete agentic-table payload (reference symbols: cb, prompt_key,
outputs, update_prompt_output, is_agentic_table).
---
Outside diff comments:
In `@frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx`:
- Around line 300-306: The TableExtractionSettingsBtn is being rendered for all
prompts; guard its render with the enforce-type check so the settings only
appear for table-enforced prompts. Update the conditional around
TableExtractionSettingsBtn in PromptCardItems.jsx (the JSX block that currently
uses TableExtractionSettingsBtn, promptDetails?.prompt_id, enforceType,
setAllTableSettings) to require a table-specific enforceType (e.g., enforceType
=== 'table' or enforceType?.includes('table')) in addition to
TableExtractionSettingsBtn before rendering the component, so incompatible
prompt types won’t show the table settings.
---
Nitpick comments:
In
`@backend/prompt_studio/prompt_studio_output_manager_v2/output_manager_helper.py`:
- Around line 173-179: Replace the literal string set check on
prompt.enforce_type with the centralized enforce-type constants: import and use
the shared constants for JSON/TABLE/RECORD/LINE_ITEM/AGENTIC_TABLE (e.g.,
ENFORCE_TYPE_JSON, ENFORCE_TYPE_TABLE, ENFORCE_TYPE_RECORD,
ENFORCE_TYPE_LINE_ITEM, ENFORCE_TYPE_AGENTIC_TABLE) from the module that defines
enforce-type values (the centralized constants module in prompt_studio), and
change the condition in output_manager_helper.py (the prompt.enforce_type check)
to use those constants instead of the string literals to avoid string drift.
In `@docker/docker-compose.yaml`:
- Line 532: The docker-compose default for CELERY_QUEUES_EXECUTOR was extended
to include celery_executor_agentic_table but other deployment targets are
missing it; update all runtime/defaults to match by adding
celery_executor_agentic_table to the CELERY_QUEUES_EXECUTOR default in Helm
values (values.yaml), chart Deployment/StatefulSet env entries (templates/*
where CELERY_QUEUES_EXECUTOR is set), and any CI/runtime environment variable
configs (e.g., container env vars, systemd or cloud run settings) so every
environment uses
"celery_executor_legacy,celery_executor_agentic,celery_executor_agentic_table"
as the default queue list.
In `@frontend/src/hooks/usePromptRun.js`:
- Around line 19-23: The hardcoded SOCKET_TIMEOUT_MS constant in usePromptRun.js
can drift from server adapter settings; change it to derive the timeout from a
backend-exposed config value (e.g., an API response or injected runtime config)
and apply the client-side buffer (e.g., subtract or add the intended 1 minute)
when computing SOCKET_TIMEOUT_MS; implement a safe fallback to the current
16-minute value if the backend config is unavailable, and update any functions
using SOCKET_TIMEOUT_MS so they reference the computed/config-driven value
instead of the hardcoded constant.
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Line 390: Remove the per-call reassignment of the module-global
litellm.drop_params (the assignment at the shown call site) and instead either
set the desired value once at module initialization where litellm is
imported/initialized (the earlier initialization around line 33) or avoid
mutating the global by using a local variable (e.g., drop_params) and pass that
into the litellm API calls; in short, delete the litellm.drop_params = True line
and either consolidate the flag into module-level setup or thread a local
parameter through the functions that invoke litellm.
In `@workers/file_processing/structure_tool_task.py`:
- Around line 402-442: The loop over agentic_table_outputs accesses
at_output["llm"] and at_output[_SK.NAME] directly which can raise KeyError; add
defensive validation before using them (e.g., confirm required keys in each
at_output or use at_output.get(...) and raise/return a clear error) so failures
mirror the earlier validation style for agentic_table_settings; specifically
check each entry in agentic_table_outputs for "llm" and _SK.NAME (or provide
sensible defaults) before building agentic_params and before assigning
agentic_results[...], and if missing return a structured error result (similar
to other validation paths) rather than letting a KeyError bubble from
dispatcher.dispatch/ExecutionContext.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 745f3b34-3732-4f3c-9564-7de5c201cfcd
📒 Files selected for processing (21)
backend/prompt_studio/prompt_studio_core_v2/constants.pybackend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.pybackend/prompt_studio/prompt_studio_core_v2/static/select_choices.jsonbackend/prompt_studio/prompt_studio_core_v2/views.pybackend/prompt_studio/prompt_studio_output_manager_v2/output_manager_helper.pybackend/prompt_studio/prompt_studio_registry_v2/constants.pybackend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.pybackend/prompt_studio/prompt_studio_v2/migrations/0014_alter_toolstudioprompt_enforce_type.pybackend/prompt_studio/prompt_studio_v2/models.pydocker/docker-compose.yamlfrontend/src/components/custom-tools/prompt-card/Header.jsxfrontend/src/components/custom-tools/prompt-card/PromptCardItems.jsxfrontend/src/components/custom-tools/prompt-card/PromptOutput.jsxfrontend/src/hooks/usePromptRun.jsunstract/sdk1/src/unstract/sdk1/llm.pyworkers/executor/executors/legacy_executor.pyworkers/executor/executors/retrievers/fusion.pyworkers/executor/executors/retrievers/keyword_table.pyworkers/file_processing/structure_tool_task.pyworkers/ide_callback/tasks.pyworkers/tests/test_answer_prompt.py
Additional Review Findings🔵 LOW:
|
jaseemjaskp
left a comment
There was a problem hiding this comment.
Additional Review Findings
Beyond what CodeRabbit and Greptile already flagged, here are additional issues found during a deeper review:
Critical
1. Silent incomplete export when payload_modifier plugin is missing
backend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.py — the new elif prompt.enforce_type == AGENTIC_TABLE block (around line 375)
When exporting an agentic_table prompt without the payload_modifier plugin available, the if payload_modifier_plugin: guard silently skips the call to export_agentic_table_settings. The export succeeds without agentic_table_settings, and the user only discovers this at document-processing time when structure_tool_task.py validation fails with "Re-export the tool from Prompt Studio."
This is a "fail later" anti-pattern — the failure should happen at export time:
elif prompt.enforce_type == PromptStudioRegistryKeys.AGENTIC_TABLE:
payload_modifier_plugin = get_plugin("payload_modifier")
if not payload_modifier_plugin:
raise OperationNotSupported(
"Agentic table export requires the payload_modifier plugin."
)
modifier_service = payload_modifier_plugin["service_class"]()
output = modifier_service.export_agentic_table_settings(...)Important
2. Missing prompt_key silently skips callback reshaping
workers/ide_callback/tasks.py — around line 397
When is_agentic_table=True but prompt_key is empty/missing, the if prompt_key: guard skips reshaping silently. The raw executor output ({"tables": [...], "page_count": ..., "headers": [...]}) gets persisted as-is with zero logging. Should log an error and fail explicitly rather than persisting malformed data.
3. No error handling around agentic table dispatch in views.py
backend/prompt_studio/prompt_studio_core_v2/views.py — lines ~512-562
The entire agentic table dispatch block (plugin instantiation, build_agentic_table_payload, dispatch_with_callback) runs without any try/except. Compare this to the existing indexing dispatch which wraps dispatch_with_callback in try/except with cleanup logic. If the cloud plugin's build_agentic_table_payload raises or the Celery broker is down, users get an opaque 500 with no actionable information.
4. Single agentic_table failure aborts ALL remaining prompts
workers/file_processing/structure_tool_task.py — around line 430
In the agentic_table dispatch loop, if any single prompt fails (if not at_result.success: return at_result.to_dict()), the function returns immediately — all subsequent agentic prompts AND the entire regular legacy pipeline are abandoned. For a tool with 10 prompts where only 1 is agentic_table, a failure in that one prompt produces zero output for all 10. At minimum, log the broader impact (how many prompts were abandoned).
5. 16-minute SOCKET_TIMEOUT_MS applies to ALL prompt types
frontend/src/hooks/usePromptRun.js — line 19
The timeout increase from 5→16 minutes is global. For regular text/number/email prompts that should complete in seconds, a stalled request now takes 16 minutes to surface a timeout error. Consider making the timeout type-aware (e.g. keep 5min for regular prompts, 16min for agentic_table).
Suggestions
6. Inaccurate comments referencing non-existent terminology
workers/executor/executors/legacy_executor.py: References "Layer 2 in workers/file_processing/structure_tool_task.py" — "Layer 2" doesn't appear anywhere in the codebaseworkers/file_processing/structure_tool_task.py: References "populated by Layer 1 export" — same issueworkers/file_processing/structure_tool_task.py(~line 670): Comment says "Use local variables so tool_metadata[_SK.OUTPUTS] is preserved for METADATA.json serialization downstream in _write_tool_result" — this is factually incorrect._write_tool_result()does not readtool_metadata[_SK.OUTPUTS]. The real reason is to feed only regular prompts intoanswer_paramswhile keeping the full list for the agentic dispatch loop.
7. complete_vision() docstring omits key behavioral differences from complete()
unstract/sdk1/src/unstract/sdk1/llm.py — around line 488
The docstring says "Same error handling, usage tracking, and metrics as complete()" but doesn't mention:
- Does NOT support
extract_jsonorpost_process_fnpost-processing - Does NOT prepend the adapter's system prompt (unlike
complete()which builds[{"role": "system", ...}, {"role": "user", ...}]internally)
Callers reading "same as complete()" might expect feature parity.
8. Significant test coverage gaps
No tests added for:
complete_vision()— 90-line new public method, zero coverage- Structure tool task partitioning/dispatch logic — core routing with zero tests
- IDE callback agentic table reshaping — 2-3 test cases needed in existing
TestIdePromptComplete - Legacy executor
agentic_tableguard — single test case needed
The IDE callback reshaping test is highest ROI: catches critical data-loss scenarios and the test infrastructure already exists in workers/tests/test_ide_callback.py.
Read from SOURCE instead of INFILE when dispatching to the agentic_table executor. INFILE gets overwritten with JSON output by the regular pipeline, causing PDFium parse errors when the agentic_table executor tries to process it as a PDF. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@workers/file_processing/structure_tool_task.py`:
- Around line 302-313: The error text claims "target_table / json_structure /
instructions" but the code only validates target_table and json_structure;
update the validation to either include instructions as required or remove it
from the message. Concretely, in the loop over agentic_table_outputs (variables
at_output, at_settings) adjust the if-condition to also check
at_settings.get("instructions") when instructions should be required, or change
the ExecutionResult.failure message (the f-string that references
at_output[_SK.NAME]) to only mention target_table / json_structure if
instructions are optional.
- Around line 492-498: The all-agentic branch currently sets pipeline_elapsed =
0.0 which causes METADATA.json to record zero pipeline time; instead measure
wall-clock time spent in the agentic dispatch and set pipeline_elapsed to that
duration before calling _write_tool_result. Specifically, around the agentic
dispatch loop that produces agentic_results (the "Step 6a" loop), capture start
= time.monotonic() before entering the loop and end = time.monotonic() after it
completes, compute pipeline_elapsed = end - start, and replace the hard-coded
0.0 in the else branch (where structured_output and metadata.agentic_only are
set) so _write_tool_result(...) receives the measured duration. Ensure you
import/time function usage is consistent with the rest of the module.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5707d676-d319-4a8e-a6a3-b00c1cf1f27d
📒 Files selected for processing (1)
workers/file_processing/structure_tool_task.py
|
@jaseemjaskp
|
|
_sanitize_null_values behavior change (Low) — This was intentional. The previous behavior of passing literal "NA" strings downstream caused issues with type coercion in destination connectors (e.g., a NUMBER field receiving the string "NA" instead of Socket timeout bump is global (Low) — Same reasoning as comment #5 from the previous batch: the 16-min client timeout trails the |
|
[Scope] Unrelated behavior changes bundled into this PR Two changes have nothing to do with agentic_table and aren't mentioned in the PR description:
Please split these into a separate PR with a proper description of the intended behavior change. Keeps this PR's review/revert history clean and makes git blame actually useful for the agentic_table work. |
@chandrasekharan-zipstack This was merged in previous commit to main. Will update the description. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 409-415: complete_vision currently calls litellm.completion
directly (using completion_kwargs) which bypasses the retry policy used by
complete() and streaming calls; change the implementation so the call to
litellm.completion is invoked through the same retry wrapper used by the
existing complete()/streaming code path (i.e., use the internal retry helper
that complete() uses) and pass messages and completion_kwargs (after popping
"cost_model") through that wrapper so transient provider/rate-limit errors are
retried consistently.
- Around line 401-403: Remove the per-call global mutation of LiteLLM by
deleting the assignment "litellm.drop_params = True" inside the try block in
unstract/sdk1/src/unstract/sdk1/llm.py; rely on the module-level initialization
(set at import) instead, and if per-call behavior is required use a local/config
variable rather than mutating the global litellm.drop_params flag.
In `@workers/file_processing/structure_tool_task.py`:
- Around line 416-437: The code currently does a hard lookup at at_output["llm"]
when building agentic_params, which can raise KeyError for older/malformed
exports; update the handling to either (a) validate presence of "llm" during
readiness checks for agentic_table_outputs (the same place that validates
target_table/json_structure) or (b) change the build of agentic_params in the
loop to access the key safely (e.g., use at_output.get("llm") and if missing
return the same user-friendly failure path via ExecutionResult.failure with a
clear message), ensuring any fallback behavior is documented in comments around
agentic_table_outputs/agentic_params and preserving the existing user-facing
re-export guidance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8344653e-1517-4cc7-b3e5-051e7675662a
📒 Files selected for processing (2)
unstract/sdk1/src/unstract/sdk1/llm.pyworkers/file_processing/structure_tool_task.py
Signed-off-by: harini-venkataraman <115449948+harini-venkataraman@users.noreply.github.com>
Frontend Lint Report (Biome)✅ All checks passed! No linting or formatting issues found. |
|
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
workers/file_processing/structure_tool_task.py (1)
433-468: Add per-prompt dispatch logging for the agentic loop.The legacy branch logs a single
Dispatching structure_pipeline: ...line at 475–482, but the agentic loop dispatches one executor call per prompt with no equivalent log. Given each call can run for minutes (EXECUTOR_TIMEOUT=3600s) and may be repeated across multiple prompts, a brieflogger.infobefore/afterdispatcher.dispatchper prompt (with prompt name and elapsed time) would meaningfully aid triage of stuck or slow runs without changing behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@workers/file_processing/structure_tool_task.py` around lines 433 - 468, Add per-prompt dispatch logging inside the agentic loop around the call to dispatcher.dispatch so each prompt logs when it starts and when it finishes with elapsed time; specifically, just before calling dispatcher.dispatch(at_ctx, timeout=EXECUTOR_TIMEOUT) log a brief logger.info including the prompt identifier (use at_output[_SK.NAME] or at_settings.get("target_table") as available) and execution_id/file_execution_id, capture start = time.time(), then after the dispatch completes log another logger.info with the same identifiers plus success status and elapsed = time.time() - start; keep behavior unchanged (still return at_result.to_dict() on failure) and add only lightweight log lines near the dispatcher.dispatch call in the agentic_table_outputs loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@workers/file_processing/structure_tool_task.py`:
- Around line 433-468: Add per-prompt dispatch logging inside the agentic loop
around the call to dispatcher.dispatch so each prompt logs when it starts and
when it finishes with elapsed time; specifically, just before calling
dispatcher.dispatch(at_ctx, timeout=EXECUTOR_TIMEOUT) log a brief logger.info
including the prompt identifier (use at_output[_SK.NAME] or
at_settings.get("target_table") as available) and
execution_id/file_execution_id, capture start = time.time(), then after the
dispatch completes log another logger.info with the same identifiers plus
success status and elapsed = time.time() - start; keep behavior unchanged (still
return at_result.to_dict() on failure) and add only lightweight log lines near
the dispatcher.dispatch call in the agentic_table_outputs loop.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e937efe2-09e7-4f5c-b781-6350d62c7d14
📒 Files selected for processing (2)
workers/executor/executors/legacy_executor.pyworkers/file_processing/structure_tool_task.py
🚧 Files skipped from review as they are similar to previous changes (1)
- workers/executor/executors/legacy_executor.py


What
AgenticTableSettingsCRUD backend (pluggable app) with per-prompt configuration for the extractor (LLM adapter, page range, parallel pages, highlight toggle)AgenticTableSettingsmodal for configuring the extractor andAgenticTableChecklistfor real-time prompt readiness validationWhy
agentic_tableHow
Backend
agentic_table_settings_v2pluggable app with model, serializer, views, URL routing, and validation serviceAgenticTableSettingsViewSet— full CRUD withupdate_or_createsemantics; returns saved instance (withid) so frontend can PATCHPromptValidationView— LLM-powered prompt analysis endpoint that checks whether a prompt contains target table, JSON structure, and instructions; usesget_or_createto avoid 404 chicken-and-egg issuesagentic_tableexecution payloads with adapter UUIDs from profileagentic_tablequeueCan this PR break any existing features? If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
/prompt-studio/prompt/agentic-table/)agentic_tableenforce type — no existing enforce types are modifiedenforceType === "agentic_table"is selectedcreateview status code change (201 for new, 200 for update) is internal to this featureagentic_tabletype checkRelevant Docs
Related Issues or PRs
- UN-3403
Dependencies Versions / Env Variables
Notes on Testing
Backend Tests
Run the agentic table settings test suite:
Manual Testing
agentic_tableon a fresh prompt card -> type prompt -> verify no 404 -> configure LLM adapter -> verify checkboxes updateScreenshots
Attached in respective cloud PR.
...
Checklist
I have read and understood the Contribution Guidelines.