ReverseMerge: V0.163.4 hotfix by pk-zipstack · Pull Request #1980 · Zipstack/unstract

pk-zipstack · 2026-05-21T05:27:09Z

What

Back-merge of v0.163.4-hotfix (released as v0.163.5) into main, with merge-conflict resolution carried on this branch (the hotfix branch itself is push-protected, so the merge commit lives here).

Net new content for main:

2dab564b [HOTFIX] Bump litellm to 1.83.10 from PyPI to clear CVE-2026-42208 ([HOTFIX] Bump litellm to 1.83.10 from PyPI to clear CVE-2026-42208 #1976)

The other two hotfix commits on v0.163.4-hotfix are already in main via the squash back-merge #1946:

c34f2409 [HOTFIX] Add IAM Role / Instance Profile auth mode to AWS Bedrock adapter ([HOTFIX] Add IAM Role / Instance Profile auth mode to AWS Bedrock adapter #1944)
e6bb412e [HOTFIX] Use importlib.util.find_spec for pluggable worker discovery ([HOTFIX] Use importlib.util.find_spec for pluggable worker discovery #1918)

Why

Per the Hotfix Deployment Guide Step 9, hotfix branches must be back-merged to main after production deployment so the fixes carry forward into future releases.

How

Use rebase merge (preferred) or squash merge — not the default merge commit. Per the doc, click the merge dropdown and select the right option.

Conflict resolution

Five files conflicted during git merge origin/main on v0.163.4-hotfix:

File	Resolution
`unstract/sdk1/src/unstract/sdk1/adapters/base1.py`	Took main — main's `_BEDROCK_AUTH_TYPES = {None, "access_keys", "iam_role", "bearer_token"}` is a strict superset of the hotfix branch's IAM-Role-only version
`unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/bedrock.json`	Took main — schema includes all three auth modes
`unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/bedrock.json`	Took main — same reasoning
`unstract/sdk1/tests/test_bedrock_adapter.py`	Took main — covers all three auth modes
`uv.lock`	Took main's, then re-ran `uv lock` to apply our litellm 1.83.10 pin and switch the source back to PyPI registry

Conflicts came from #1944 (hotfix's IAM Role) overlapping with main's newer #1952 (UN-3152 [FEAT] Add AWS_BEARER_TOKEN_BEDROCK auth). My litellm changes from #1976 did NOT cause any conflicts.

Validation

Ran the relevant test suites against the merged tree:

unstract/sdk1/tests/patches/test_litellm_cohere_timeout.py → 6/6 passed (litellm patch still binds on 1.83.10)
unstract/sdk1/tests/test_bedrock_adapter.py → 37/37 passed (bedrock auth modes all working with main's version)

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

No. The only new content reaching main is the litellm 1.82.3 → 1.83.10 bump (released to production as v0.163.5). Bedrock-adapter behavior on main is unchanged — we kept main's version of every conflicting file.

Database Migrations

None

Env Config

None

Relevant Docs

Related Issues or PRs

[HOTFIX] Bump litellm to 1.83.10 from PyPI to clear CVE-2026-42208 #1976 (litellm CVE hotfix — merged into v0.163.4-hotfix, released as v0.163.5)
ReverseMerge: V0.163.4 hotfix #1979 (previous back-merge PR — closed, conflicts unresolved; superseded by this PR)
ReverseMerge: V0.163.2 hotfix #1946 (prior back-merge that carried [HOTFIX] Use importlib.util.find_spec for pluggable worker discovery #1918 and [HOTFIX] Add IAM Role / Instance Profile auth mode to AWS Bedrock adapter #1944 to main)
UN-3152 [FEAT] Add AWS_BEARER_TOKEN_BEDROCK auth to Bedrock LLM and Embedding adapters #1952 (AWS_BEARER_TOKEN_BEDROCK on main — source of the bedrock-file conflicts)

Dependencies Versions

litellm: 1.82.3 (Zipstack GitHub fork) → 1.83.10 (PyPI)
tiktoken: ~=0.9.0 → ~=0.12.0 (required by litellm 1.83.10's pin)

Notes on Testing

Tested in #1976 and validated in production via the v0.163.5 release. Bedrock adapter test suite re-run locally post-merge — all 37 tests pass.

Screenshots

N/A

Checklist

I have read and understood the Contribution Guidelines.

…1918) * [FIX] Use importlib.util.find_spec for pluggable worker discovery _verify_pluggable_worker_exists() previously checked for the literal file `pluggable_worker/<name>/worker.py` on disk, which breaks when the plugin has been compiled to a .so (Nuitka, Cython, or any C extension) — the module is perfectly importable but the pre-check rejects it because only the .py extension is considered. Replace the filesystem check with importlib.util.find_spec(), which is Python's standard way to ask "is this module resolvable by the import system?". It honors every registered finder — source .py, compiled .so, bytecode .pyc, namespace packages, zipimports — so the function now matches what its docstring claims: verifying the module can be loaded, not that a specific file extension is present. Behavior is preserved for existing deployments: - Images with no `pluggable_worker/<name>/` subpackage → find_spec raises ModuleNotFoundError (ImportError subclass) → returns False. - Images with source .py → find_spec resolves the .py → returns True. - Images with compiled .so → find_spec resolves the .so → returns True. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FIX] Handle ValueError from find_spec in pluggable worker verification Greptile-flagged edge case: importlib.util.find_spec() can raise ValueError (not just ImportError) when sys.modules has a partially initialised module entry with __spec__ = None from a prior failed import. Broaden the except to catch both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FIX] Resolve api-deployment worker directory from enum import path worker.py:452 did worker_type.value.replace("-", "_") to derive the on-disk dir name. All WorkerType enum values already use underscores, so the replace was a no-op; for API_DEPLOYMENT whose dir is "api-deployment" (hyphen), it resolved to "api_deployment" and the os.path.exists() check failed. Boot then logged a spurious "❌ Worker directory not found: /app/api_deployment" at ERROR level. The task registration path (builder + celery autodiscover via to_import_path) is unaffected, so this was purely log noise — but noise at ERROR level that masks real failures in log scans. Fix: derive the directory from the authoritative to_import_path() which already handles the hyphen case (api_deployment -> api-deployment). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…pter (#1944) * [FEAT] Allow Bedrock to fall through to boto3's default credential chain Match the S3/MinIO connector pattern: when AWS access keys are left blank on the Bedrock LLM and embedding adapter forms, drop them from the kwargs dict so boto3's default credential chain handles authentication. This unlocks IAM role / instance profile / IRSA / AWS Profile scenarios on hosts that already have ambient AWS credentials (e.g. EKS workers with IRSA, EC2 with an instance profile). - llm1/static/bedrock.json: clarify access-key descriptions to mention IRSA and instance profile (already non-required at v0.163.2 base). - embedding1/static/bedrock.json: drop aws_access_key_id and aws_secret_access_key from top-level required; same description fix; expose aws_profile_name for parity with the LLM form. - base1.py: AWSBedrockLLMParameters and AWSBedrockEmbeddingParameters now strip empty access-key values from the validated kwargs before returning, so empty strings don't override boto3's default chain. AWSBedrockEmbeddingParameters fields gain explicit None defaults and an aws_profile_name field. Backward-compatible: existing adapters with access keys filled in continue to work unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FEAT] Add Authentication Type selector to Bedrock adapter form Add an explicit `auth_type` selector with two options, making the auth choice clear to users: - "Access Keys" (default): existing flow, keys required - "IAM Role / Instance Profile (on-prem AWS only)": no fields; relies on boto3's default credential chain (IRSA on EKS, task role on ECS, instance profile on EC2). Description on the selector explicitly notes this option is only for AWS-hosted Unstract deployments. The form-only auth_type field is stripped before LiteLLM validation in both AWSBedrockLLMParameters.validate() and AWSBedrockEmbeddingParameters. validate(). Empty access keys continue to be stripped so boto3 falls through to the default chain even when the access_keys arm is selected without values (matches the S3/MinIO connector pattern). Backward-compatible: legacy adapters without auth_type behave as "Access Keys" mode (the default), and existing keys are forwarded unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [REVIEW] Address Bedrock auth_type review feedback Fixes the P0/P1 issues raised by greptile-apps and jaseemjaskp on PR #1944. Behaviour fixes: - Stale-key leak in IAM Role mode: switching an existing adapter from Access Keys to IAM Role would carry truthy stored access keys through the strip-empty-only loop, so boto3 silently authenticated with the old long-lived credentials instead of falling through to the host's IRSA / instance-profile identity. Both LLM and embedding paths were affected. - Silent acceptance of unknown auth_type: a typo (e.g. "access_key") or a malformed payload from a non-UI client passed through the dict comprehension untouched, with no enum guard. - Cross-field validation gap: explicit Access Keys mode with blank or whitespace-only values silently fell through to the default credential chain instead of surfacing the misconfiguration. Implementation: - Add a module-level _resolve_bedrock_aws_credentials helper used by both AWSBedrockLLMParameters.validate() and AWSBedrock EmbeddingParameters.validate(), so the auth-type contract is expressed once. - Validates auth_type against an allowlist (None | "access_keys" | "iam_role"); raises ValueError on anything else. - iam_role: unconditionally drops aws_access_key_id and aws_secret_access_key. - access_keys (explicit): requires non-blank values; raises ValueError if either is empty or whitespace-only. - Legacy (auth_type absent): retains the lenient strip behaviour so pre-PR adapter configurations continue to deserialise unchanged. - Restore aws_region_name as required (no `= None` default) on AWSBedrockEmbeddingParameters; only credentials may legitimately be absent. - Drop the orphan aws_profile_name field from embedding1/static/bedrock.json: it was added for parity with the LLM form but lives outside the auth_type oneOf and contradicts the selector's "no further input" semantics. The LLM form already had aws_profile_name pre-PR and is left alone for backwards compatibility. Tests: - New tests/test_bedrock_adapter.py covers 15 cases across LLM and embedding adapters: legacy-no-auth-type, explicit access_keys with valid/blank/whitespace keys, iam_role with stale/no keys, unknown auth_type rejection, cross-field validation, and preservation of unrelated params (model_id, aws_profile_name, region, thinking). Skipped (P2 nice-to-have): - Comment-scope clarification, MinIO reference rewording, validate-mutates-caller'\''s-dict, and the LLM form description nit about aws_profile_name visibility. These don'\''t change behaviour and can be addressed in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…1976) Hotfix for cloud v0.159.3 (OSS v0.163.4). Customer scanner flagged litellm 1.82.3 for CVE-2026-42208 (SQL injection in litellm proxy auth path, affects 1.81.16-1.83.6). We do not use litellm.proxy, but vulnerability scanners flag the installed package regardless of which code path is reachable. Bump to 1.83.10 — the exact version recommended by the upstream advisory (v1.83.10-stable) and the smallest jump that clears the CVE range while keeping python-dotenv==1.0.1 compatible (1.83.14 would force bumping python-dotenv across 7+ pyproject.toml files). Only tiktoken needed to move 0.9 -> 0.12 to satisfy litellm's pin. Switch source back to PyPI now that the PyPI quarantine is over, reversing the temporary fork in #1873. Cohere embed timeout patch: verified that litellm/llms/cohere/embed/handler.py is byte-identical between v1.82.3, v1.83.10-stable, and v1.83.14-stable (the timeout-not-forwarded bug fixed in #1848 is still present upstream — BerriAI/litellm#14635 remains OPEN). Version guard bumped 1.82.3 -> 1.83.10; 6/6 patch tests pass on the new version, confirming the monkey-patch still binds correctly. Other cleanup from #1873: - Drop git apt-install from worker-unified and tool Dockerfiles (no git-sourced deps remain in any uv.lock) - Bump tool versions: structure 0.0.100 -> 0.0.101, classifier 0.0.79 -> 0.0.80, text_extractor 0.0.75 -> 0.0.76 Note on root uv.lock churn: the v0.163.4 root uv.lock had a pre-existing corruption (banks v2.4.1 entry pointing at banks-2.2.0 wheel) that blocked incremental resolution. Regenerated from scratch. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Conflicts resolved: - unstract/sdk1/src/unstract/sdk1/adapters/base1.py - unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/bedrock.json - unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/bedrock.json - unstract/sdk1/tests/test_bedrock_adapter.py → Took main's version. Main integrates the IAM Role auth (from #1944 / back-merge #1946) AND the newer AWS_BEARER_TOKEN_BEDROCK auth (#1952). The hotfix-branch version only had IAM Role, so main's content is a strict superset. - uv.lock → Regenerated from main's version with our litellm 1.83.10 pin applied (from #1976). litellm now sourced from PyPI registry (no more Zipstack GitHub fork). Validation: - tests/patches/test_litellm_cohere_timeout.py: 6/6 pass - tests/test_bedrock_adapter.py: 37/37 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-21T05:27:24Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 73e21b71-f46d-4c7e-ae2e-3c5aab339389

📥 Commits

Reviewing files that changed from the base of the PR and between 4a661e5 and 7386c26.

📒 Files selected for processing (1)

unstract/sdk1/src/unstract/sdk1/patches/litellm_cohere_timeout.py

Summary by CodeRabbit

Bug Fixes
- Ensure request timeouts are forwarded for Cohere embed requests to prevent hangs.
Chores
- Bumped Classifier, Text Extractor, and Structure tool versions and updated public tool metadata.
- Updated SDK dependencies (tiktoken ~0.12.0, litellm 1.83.10).
- Slimmed Docker images by removing unused packages.

Walkthrough

This PR coordinates incremental version updates across the tool ecosystem: structure (0.0.100→0.0.101), classifier (0.0.79→0.0.80), and text-extractor (0.0.75→0.0.76) tools receive version bumps and git dependency removal in their container builds. The litellm SDK is pinned to 1.83.10 with a corresponding timeout propagation patch for Cohere embeddings. Tool registry and environment configurations are synchronized to reflect the new tool versions.

Changes

Tool Version Bumps, Dependency Cleanup, and SDK Litellm Update

Layer / File(s)	Summary
Structure tool version and Dockerfile cleanup `tools/structure/Dockerfile`, `tools/structure/src/config/properties.json`	Structure tool Dockerfile removes git dependency; properties.json version is bumped from 0.0.100 to 0.0.101.
Classifier tool version and Dockerfile cleanup `tools/classifier/Dockerfile`, `tools/classifier/src/config/properties.json`	Classifier tool Dockerfile removes git dependency; properties.json version is bumped from 0.0.79 to 0.0.80.
Text extractor tool version and Dockerfile cleanup `tools/text_extractor/Dockerfile`, `tools/text_extractor/src/config/properties.json`	Text extractor tool Dockerfile removes git dependency; properties.json version is bumped from 0.0.75 to 0.0.76.
Worker base image dependency cleanup `docker/dockerfiles/worker-unified.Dockerfile`	Worker unified Dockerfile removes git from base stage system dependencies.
Litellm SDK update to 1.83.10 with timeout propagation `unstract/sdk1/pyproject.toml`, `unstract/sdk1/src/unstract/sdk1/patches/litellm_cohere_timeout.py`	SDK pyproject.toml pins litellm to 1.83.10 and bumps tiktoken to ~0.12.0, removing custom git source; litellm cohere timeout patch is updated to 1.83.10 version and modified to forward timeout parameter in both sync and async embedding calls.
Registry and environment configuration synchronization `backend/sample.env`, `unstract/tool-registry/tool_registry_config/public_tools.json`	Tool registry public_tools.json updates classify, text_extractor, and their dependent tool-classifier, tool-text-extractor image versions to match new tool releases; backend sample.env updates structure tool image URL and tag to 0.0.101.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title 'ReverseMerge: V0.163.4 hotfix' is specific and clearly describes the PR as a back-merge of a hotfix branch, which aligns with the changeset's primary purpose.
Description check	✅ Passed	PR description is comprehensive and complete, addressing all required template sections including What, Why, How, break analysis, migrations, env config, docs, related issues, dependencies, testing notes, and the contribution guidelines checklist.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch back-merge/v0.163.4-hotfix-to-main

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-05-21T05:30:11Z

Greptile Summary

This back-merge PR brings the v0.163.5 hotfix into main, with the only net-new change being the litellm bump from a Zipstack GitHub fork at v1.82.3 to the official PyPI release at 1.83.10 (addressing CVE-2026-42208). Conflict resolution retained main's versions of all Bedrock-adapter files.

litellm source switch: Removes [tool.uv.sources] git fork override and pins litellm==1.83.10 from PyPI; tiktoken is bumped to ~=0.12.0 to satisfy the new litellm dependency pin.
Dockerfile cleanup: git is removed from apt-get installs in all four Dockerfiles now that litellm is no longer fetched from a git source.
Patch update: litellm_cohere_timeout.py version guard and copied function bodies are updated from 1.82.3 to 1.83.10, with all relevant lock files regenerated consistently.

Confidence Score: 5/5

Safe to merge — the only functional change reaching main is the litellm switch from a Zipstack GitHub fork to the official PyPI 1.83.10 release, which has already been deployed to production as v0.163.5.

The litellm bump has been validated in production. The Cohere timeout patch, lock files, and Dockerfile cleanup are all internally consistent. Bedrock-adapter files are unchanged from main's version. No database migrations or API surface changes are introduced.

No files require special attention. The multiple uv.lock updates are mechanical regenerations and all point to litellm 1.83.10 from PyPI.

Important Files Changed

Filename	Overview
unstract/sdk1/pyproject.toml	Switches litellm from Zipstack git fork (v1.82.3) to PyPI (==1.83.10); tiktoken constraint widened to ~=0.12.0 to match new litellm requirements.
unstract/sdk1/src/unstract/sdk1/patches/litellm_cohere_timeout.py	Version guard and copied function bodies updated to 1.83.10; docstring clarified to explain skip-and-warn behavior on version mismatch.
docker/dockerfiles/worker-unified.Dockerfile	Removes git from apt-get install; no longer needed now that litellm is fetched from PyPI rather than a git source.
tools/structure/Dockerfile	Removes git from apt-get install alongside tool version bump to 0.0.101.
unstract/tool-registry/tool_registry_config/public_tools.json	Tool registry versions updated consistently: classifier 0.0.79→0.0.80, text_extractor 0.0.75→0.0.76.
uv.lock	Root lock file updated: litellm now resolves to 1.83.10 from PyPI registry instead of the Zipstack git fork.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[litellm dependency] -->|before| B[Zipstack GitHub fork\nv1.82.3\ngit source]
    A -->|after| C[PyPI registry\n==1.83.10\nCVE-2026-42208 fixed]

    C --> D[pyproject.toml\nlitellm==1.83.10\ntiktoken~=0.12.0]
    C --> E[Dockerfiles\ngit removed from apt-get]
    C --> F[litellm_cohere_timeout.py\n_PATCHED_LITELLM_VERSION\n1.82.3 → 1.83.10]

    F --> G{Version matches\n1.83.10?}
    G -->|Yes| H[Apply Cohere\ntimeout patch]
    G -->|No| I[Log warning\nSkip patch]

_{Reviews (2): Last reviewed commit: "[FIX] Align cohere patch docstring with ..." | Re-trigger Greptile}

jaseemjaskp

Multi-agent review summary (Code Reviewer · Comment Analyzer · Silent Failure Hunter · Test Analyzer · Type Design · Code Simplifier)

The back-merge body is small (litellm 1.82.3 → 1.83.10 + tiktoken 0.9 → 0.12 + patch version constant + Dockerfile git removal + tool-image tag bumps). Findings concentrate on unstract/sdk1/src/unstract/sdk1/patches/litellm_cohere_timeout.py. Inline comments below cover findings on diff lines; the items below are out-of-diff but worth tracking before merge.

Out-of-diff findings (cannot anchor inline)

MEDIUM — Two stand-alone uv.locks still pin litellm==1.82.3 (the CVE-flagged version). unstract/workflow-execution/uv.lock (~L731) and unstract/connectors/uv.lock (~L1236) were not regenerated. Both exist on main too — so this is pre-existing on main, not introduced by this PR — but since the explicit purpose of this back-merge is to carry the CVE-2026-42208 fix forward, regenerating these locks here would close the loop. Fix: run uv lock --upgrade-package litellm in each of those package dirs and include the updated lockfiles, OR open a follow-up issue noting these locks are runtime-unused.
HIGH — Test gap: no test fails on litellm version drift. _PATCHED_LITELLM_VERSION (line 30) is intended to force re-verification on bump, but every existing test in unstract/sdk1/tests/patches/test_litellm_cohere_timeout.py calls _patched_embedding directly and bypasses the guard. A bump to 1.83.11 would log a warning, silently disable the patch, and CI would still pass. Add test_pinned_version_matches_installed_litellm() asserting importlib.metadata.version("litellm") == _PATCHED_LITELLM_VERSION.
HIGH — Test gap: no upstream-signature compatibility test. The patched functions are verbatim copies (+ one kwarg) of upstream litellm.llms.cohere.embed.handler.{embedding,async_embedding}. If upstream renames a kwarg or removes a positional, the monkey-patched callable becomes a TypeError at first Bedrock cohere call. Add inspect.signature(...)-based parity tests against the upstream symbols.
MEDIUM — Async path untested. _patched_async_embedding (patch lines 74–137) and the aembedding=True dispatch branch (patch lines 164–182) are only covered by binding-identity assertions; no test exercises timeout forwarding through the async client.
MEDIUM — Partial-patch application risk (patch file, lines 215–217, outside the diff). The three monkey-patch assignments run sequentially with no try/except. If _bedrock_embed.cohere_embedding = _patched_embedding fails after the two cohere-handler assignments succeed (e.g. litellm refactors the bedrock module path), the cohere handler is patched but bedrock keeps calling the unpatched embedding — "None seconds" timeouts return intermittently with no clear signal. Consider wrapping the three assignments in try/except, reverting on partial failure, and emitting an error-level log.
LOW — Skip-patch log severity. When the version guard skips (today's log at line 34), users hit the timeout bug their patch was meant to prevent. logger.warning will get filtered by most aggregators. Consider logger.error so it surfaces in Sentry. (Adjacent to but distinct from greptile's existing comment on line 32 which flagged the silent-skip control flow.)
**NIT — unstract/sdk1/src/unstract/sdk1/utils/callback_manager.py has no tests around tiktoken.encoding_for_model(...) (callback_manager.py:102,107), so the 0.9 → 0.12 tiktoken bump has no CI safety net. Low risk; tiktoken's encode API has been stable.

Verified clean

Dockerfile git removal (docker/dockerfiles/worker-unified.Dockerfile, tools/{classifier,structure,text_extractor}/Dockerfile) is safe — no remaining git+https uv sources in any updated lockfile.
Tool image tag bumps consistent across sample.env, properties.json, and public_tools.json.
litellm==1.83.10 pin in unstract/sdk1/pyproject.toml:30 matches _PATCHED_LITELLM_VERSION so the guard activates on this build.
tiktoken~=0.12.0 matches litellm 1.83.10's transitive requirement.
No new types introduced (type-design analyzer found nothing actionable).
Code-simplifier: no meaningful simplifications recommended — most candidates are inside the verbatim-copy regions where the "exactly one change vs. upstream" invariant is more valuable than micro-cleanup.

Reviewer flagged that the docstring claimed the patch is "confirmed in every release between 1.82.3 and 1.83.14-stable", but the guard at _PATCHED_LITELLM_VERSION activates only on the exact pinned version. A future maintainer reading the old text could reasonably expect bumping to e.g. 1.83.11 to keep the fix active; in reality it silently turns off. Rewritten to reference _PATCHED_LITELLM_VERSION as the single source of truth and to drop the rot-prone "as of 2026-05-20" calendar date. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-21T06:40:53Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-05-21T06:41:08Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 334 passed, 0 failed (334 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

* [HOTFIX] Use importlib.util.find_spec for pluggable worker discovery (#1918) * [FIX] Use importlib.util.find_spec for pluggable worker discovery _verify_pluggable_worker_exists() previously checked for the literal file `pluggable_worker/<name>/worker.py` on disk, which breaks when the plugin has been compiled to a .so (Nuitka, Cython, or any C extension) — the module is perfectly importable but the pre-check rejects it because only the .py extension is considered. Replace the filesystem check with importlib.util.find_spec(), which is Python's standard way to ask "is this module resolvable by the import system?". It honors every registered finder — source .py, compiled .so, bytecode .pyc, namespace packages, zipimports — so the function now matches what its docstring claims: verifying the module can be loaded, not that a specific file extension is present. Behavior is preserved for existing deployments: - Images with no `pluggable_worker/<name>/` subpackage → find_spec raises ModuleNotFoundError (ImportError subclass) → returns False. - Images with source .py → find_spec resolves the .py → returns True. - Images with compiled .so → find_spec resolves the .so → returns True. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FIX] Handle ValueError from find_spec in pluggable worker verification Greptile-flagged edge case: importlib.util.find_spec() can raise ValueError (not just ImportError) when sys.modules has a partially initialised module entry with __spec__ = None from a prior failed import. Broaden the except to catch both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FIX] Resolve api-deployment worker directory from enum import path worker.py:452 did worker_type.value.replace("-", "_") to derive the on-disk dir name. All WorkerType enum values already use underscores, so the replace was a no-op; for API_DEPLOYMENT whose dir is "api-deployment" (hyphen), it resolved to "api_deployment" and the os.path.exists() check failed. Boot then logged a spurious "❌ Worker directory not found: /app/api_deployment" at ERROR level. The task registration path (builder + celery autodiscover via to_import_path) is unaffected, so this was purely log noise — but noise at ERROR level that masks real failures in log scans. Fix: derive the directory from the authoritative to_import_path() which already handles the hyphen case (api_deployment -> api-deployment). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [HOTFIX] Add IAM Role / Instance Profile auth mode to AWS Bedrock adapter (#1944) * [FEAT] Allow Bedrock to fall through to boto3's default credential chain Match the S3/MinIO connector pattern: when AWS access keys are left blank on the Bedrock LLM and embedding adapter forms, drop them from the kwargs dict so boto3's default credential chain handles authentication. This unlocks IAM role / instance profile / IRSA / AWS Profile scenarios on hosts that already have ambient AWS credentials (e.g. EKS workers with IRSA, EC2 with an instance profile). - llm1/static/bedrock.json: clarify access-key descriptions to mention IRSA and instance profile (already non-required at v0.163.2 base). - embedding1/static/bedrock.json: drop aws_access_key_id and aws_secret_access_key from top-level required; same description fix; expose aws_profile_name for parity with the LLM form. - base1.py: AWSBedrockLLMParameters and AWSBedrockEmbeddingParameters now strip empty access-key values from the validated kwargs before returning, so empty strings don't override boto3's default chain. AWSBedrockEmbeddingParameters fields gain explicit None defaults and an aws_profile_name field. Backward-compatible: existing adapters with access keys filled in continue to work unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FEAT] Add Authentication Type selector to Bedrock adapter form Add an explicit `auth_type` selector with two options, making the auth choice clear to users: - "Access Keys" (default): existing flow, keys required - "IAM Role / Instance Profile (on-prem AWS only)": no fields; relies on boto3's default credential chain (IRSA on EKS, task role on ECS, instance profile on EC2). Description on the selector explicitly notes this option is only for AWS-hosted Unstract deployments. The form-only auth_type field is stripped before LiteLLM validation in both AWSBedrockLLMParameters.validate() and AWSBedrockEmbeddingParameters. validate(). Empty access keys continue to be stripped so boto3 falls through to the default chain even when the access_keys arm is selected without values (matches the S3/MinIO connector pattern). Backward-compatible: legacy adapters without auth_type behave as "Access Keys" mode (the default), and existing keys are forwarded unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [REVIEW] Address Bedrock auth_type review feedback Fixes the P0/P1 issues raised by greptile-apps and jaseemjaskp on PR #1944. Behaviour fixes: - Stale-key leak in IAM Role mode: switching an existing adapter from Access Keys to IAM Role would carry truthy stored access keys through the strip-empty-only loop, so boto3 silently authenticated with the old long-lived credentials instead of falling through to the host's IRSA / instance-profile identity. Both LLM and embedding paths were affected. - Silent acceptance of unknown auth_type: a typo (e.g. "access_key") or a malformed payload from a non-UI client passed through the dict comprehension untouched, with no enum guard. - Cross-field validation gap: explicit Access Keys mode with blank or whitespace-only values silently fell through to the default credential chain instead of surfacing the misconfiguration. Implementation: - Add a module-level _resolve_bedrock_aws_credentials helper used by both AWSBedrockLLMParameters.validate() and AWSBedrock EmbeddingParameters.validate(), so the auth-type contract is expressed once. - Validates auth_type against an allowlist (None | "access_keys" | "iam_role"); raises ValueError on anything else. - iam_role: unconditionally drops aws_access_key_id and aws_secret_access_key. - access_keys (explicit): requires non-blank values; raises ValueError if either is empty or whitespace-only. - Legacy (auth_type absent): retains the lenient strip behaviour so pre-PR adapter configurations continue to deserialise unchanged. - Restore aws_region_name as required (no `= None` default) on AWSBedrockEmbeddingParameters; only credentials may legitimately be absent. - Drop the orphan aws_profile_name field from embedding1/static/bedrock.json: it was added for parity with the LLM form but lives outside the auth_type oneOf and contradicts the selector's "no further input" semantics. The LLM form already had aws_profile_name pre-PR and is left alone for backwards compatibility. Tests: - New tests/test_bedrock_adapter.py covers 15 cases across LLM and embedding adapters: legacy-no-auth-type, explicit access_keys with valid/blank/whitespace keys, iam_role with stale/no keys, unknown auth_type rejection, cross-field validation, and preservation of unrelated params (model_id, aws_profile_name, region, thinking). Skipped (P2 nice-to-have): - Comment-scope clarification, MinIO reference rewording, validate-mutates-caller'\''s-dict, and the LLM form description nit about aws_profile_name visibility. These don'\''t change behaviour and can be addressed in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [HOTFIX] Bump litellm to 1.83.10 from PyPI to clear CVE-2026-42208 (#1976) Hotfix for cloud v0.159.3 (OSS v0.163.4). Customer scanner flagged litellm 1.82.3 for CVE-2026-42208 (SQL injection in litellm proxy auth path, affects 1.81.16-1.83.6). We do not use litellm.proxy, but vulnerability scanners flag the installed package regardless of which code path is reachable. Bump to 1.83.10 — the exact version recommended by the upstream advisory (v1.83.10-stable) and the smallest jump that clears the CVE range while keeping python-dotenv==1.0.1 compatible (1.83.14 would force bumping python-dotenv across 7+ pyproject.toml files). Only tiktoken needed to move 0.9 -> 0.12 to satisfy litellm's pin. Switch source back to PyPI now that the PyPI quarantine is over, reversing the temporary fork in #1873. Cohere embed timeout patch: verified that litellm/llms/cohere/embed/handler.py is byte-identical between v1.82.3, v1.83.10-stable, and v1.83.14-stable (the timeout-not-forwarded bug fixed in #1848 is still present upstream — BerriAI/litellm#14635 remains OPEN). Version guard bumped 1.82.3 -> 1.83.10; 6/6 patch tests pass on the new version, confirming the monkey-patch still binds correctly. Other cleanup from #1873: - Drop git apt-install from worker-unified and tool Dockerfiles (no git-sourced deps remain in any uv.lock) - Bump tool versions: structure 0.0.100 -> 0.0.101, classifier 0.0.79 -> 0.0.80, text_extractor 0.0.75 -> 0.0.76 Note on root uv.lock churn: the v0.163.4 root uv.lock had a pre-existing corruption (banks v2.4.1 entry pointing at banks-2.2.0 wheel) that blocked incremental resolution. Regenerated from scratch. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FIX] Align cohere patch docstring with version-guard semantics Reviewer flagged that the docstring claimed the patch is "confirmed in every release between 1.82.3 and 1.83.14-stable", but the guard at _PATCHED_LITELLM_VERSION activates only on the exact pinned version. A future maintainer reading the old text could reasonably expect bumping to e.g. 1.83.11 to keep the fix active; in reality it silently turns off. Rewritten to reference _PATCHED_LITELLM_VERSION as the single source of truth and to drop the rot-prone "as of 2026-05-20" calendar date. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

chandrasekharan-zipstack and others added 4 commits April 21, 2026 11:58

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Comment thread unstract/sdk1/src/unstract/sdk1/patches/litellm_cohere_timeout.py

jaseemjaskp self-requested a review May 21, 2026 05:52

jaseemjaskp reviewed May 21, 2026

View reviewed changes

jaseemjaskp self-requested a review May 21, 2026 07:13

jaseemjaskp approved these changes May 21, 2026

View reviewed changes

jaseemjaskp merged commit d912c6d into main May 21, 2026
10 checks passed

jaseemjaskp deleted the back-merge/v0.163.4-hotfix-to-main branch May 21, 2026 07:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReverseMerge: V0.163.4 hotfix#1980

ReverseMerge: V0.163.4 hotfix#1980
jaseemjaskp merged 5 commits into
mainfrom
back-merge/v0.163.4-hotfix-to-main

pk-zipstack commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot commented May 21, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

jaseemjaskp left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pk-zipstack commented May 21, 2026

What

Why

How

Conflict resolution

Validation

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

jaseemjaskp left a comment

Choose a reason for hiding this comment

Multi-agent review summary (Code Reviewer · Comment Analyzer · Silent Failure Hunter · Test Analyzer · Type Design · Code Simplifier)

Out-of-diff findings (cannot anchor inline)

Verified clean

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 21, 2026

Quality Gate passed

Uh oh!

github-actions Bot commented May 21, 2026

Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 21, 2026 •

edited

Loading

greptile-apps Bot commented May 21, 2026 •

edited

Loading