Conversation
…o CORS Production socket connections were failing for `*.env.us-central.unstract.com` because python-socketio does exact-string comparison on `cors_allowed_origins`, so a literal `*` pattern silently rejected every real subdomain. - Add `CORS_ALLOWED_ORIGIN_REGEXES` derived from `WEB_APP_ORIGIN_URL_WITH_WILD_CARD`. - Wire SocketIO via `_RegexOrigin` whose `__eq__` does the regex match — single list entry covers all wildcard subdomains, no library subclass needed. - Normalize `WEB_APP_ORIGIN_URL` through `urlparse` so trailing slashes / paths in env are stripped (also fixes the `…com//oauth-status/` double-slash). - Add startup guard for malformed env values. Resolves item #1 of UN-3439. Items #2/#3 (decoupling indexing from Socket.io, fallback) are owned separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Summary by CodeRabbit
WalkthroughWEB_APP_ORIGIN_URL is validated and normalized into a canonical origin, a wildcard origin, and an anchored subdomain regex. CORS settings now include regex-based origin patterns, and the socket.io server accepts both static origins and regex-origin matchers for CORS checks. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SocketIO_Server
participant CORS_Matcher
participant Settings
Client->>SocketIO_Server: Connect (Origin header)
SocketIO_Server->>Settings: Retrieve CORS lists & regexes
Settings-->>SocketIO_Server: Return CORS_ALLOWED_ORIGINS + CORS_ALLOWED_ORIGIN_REGEXES
SocketIO_Server->>CORS_Matcher: Validate Origin against static origins or RegexOrigin
alt Origin in static list
CORS_Matcher-->>SocketIO_Server: Allow
else Origin matches RegexOrigin (full-match)
CORS_Matcher-->>SocketIO_Server: Allow
else No match
CORS_Matcher-->>SocketIO_Server: Reject
end
SocketIO_Server-->>Client: Accept or reject connection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 5/8 reviews remaining, refill in 17 minutes and 20 seconds.Comment |
|
| Filename | Overview |
|---|---|
| backend/utils/cors_origin.py | New module: RegexOrigin for SocketIO eq-based CORS matching and normalize_web_app_origin for canonical URL parsing — well-anchored regex, fullmatch, re.escape, startup ValueError guard; no significant issues. |
| backend/utils/log_events.py | Builds _cors_allowed_origins by merging exact-string CORS_ALLOWED_ORIGINS with RegexOrigin wrappers from CORS_ALLOWED_ORIGIN_REGEXES; wires result into socketio.Server — logic correct. |
| backend/backend/settings/base.py | Delegates WEB_APP_ORIGIN_URL parsing to normalize_web_app_origin; adds CORS_ALLOWED_ORIGIN_REGEXES for django-cors-headers; minor — subdomain regex is always emitted even for localhost default, which is harmless. |
| backend/utils/tests/test_cors_origin.py | Comprehensive 48-case test suite covering URL normalization, startup guard, regex safety (ReDoS, lookalike, trailing newline, wrong scheme), and SocketIO in-operator routing — thorough. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
ENV["WEB_APP_ORIGIN_URL env var\n(e.g. https://us-central.unstract.com/)"]
NORM["normalize_web_app_origin()\ncors_origin.py"]
ENV --> NORM
NORM -->|ValueError| FAIL["Startup crash\n(fail-fast guard)"]
NORM --> ORIGIN["origin\nhttps://us-central.unstract.com"]
NORM --> WILDCARD["wildcard_origin\nhttps://*.us-central.unstract.com"]
NORM --> REGEX["subdomain_regex\n^https://[^/]+\\.us-central\\.unstract\\.com$"]
ORIGIN --> CORS_ALLOWED["Django CORS_ALLOWED_ORIGINS\n(exact match)"]
REGEX --> CORS_REGEX["Django CORS_ALLOWED_ORIGIN_REGEXES\n(regex match via django-cors-headers)"]
REGEX --> REGEXORIGIN["RegexOrigin(pattern)\nlog_events.py"]
CORS_ALLOWED --> SIO_LIST["_cors_allowed_origins list"]
REGEXORIGIN --> SIO_LIST
SIO_LIST --> SIO["socketio.Server\ncors_allowed_origins=_cors_allowed_origins"]
BROWSER["Browser Origin header\nhttps://dev.env.us-central.unstract.com"]
BROWSER -->|"origin in list → __eq__"| REGEXORIGIN
BROWSER -->|"HTTP request"| CORS_REGEX
Reviews (4): Last reviewed commit: "UN-3439 [FIX] Address remaining CodeRabb..." | Re-trigger Greptile
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/backend/settings/base.py`:
- Around line 72-84: parsed_url.netloc can preserve casing and default ports,
causing Django CORS string-matching to fail; change construction of
WEB_APP_ORIGIN_URL and WEB_APP_ORIGIN_URL_WITH_WILD_CARD to use
parsed_url.hostname lowercased and append the port only when it's present and
non-default for the scheme (omit :80 for http and :443 for https); also validate
the scheme is 'http' or 'https' and raise a ValueError otherwise, and update the
surrounding comments to explain canonicalization and port-handling; locate
references to parsed_url, WEB_APP_ORIGIN_URL and
WEB_APP_ORIGIN_URL_WITH_WILD_CARD to implement these changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bd894288-c4c7-441d-8702-0c646dced116
📒 Files selected for processing (2)
backend/backend/settings/base.pybackend/utils/log_events.py
muhammad-ali-e
left a comment
There was a problem hiding this comment.
Automated review (Comment Analyzer / PR Test Analyzer / Silent Failure Hunter / Type Design Analyzer / Code Reviewer / Code Simplifier passes). Findings inline; the re.match→re.fullmatch issue and the regex-construction concern raised by greptile/coderabbit are intentionally not duplicated here.
…ble RegexOrigin, tests Addresses five review comments on #1938: 1. coderabbitai (Major) — RFC 6454 canonicalization. Browsers serialize `Origin` headers with a lowercase host and no explicit default ports; `parsed_url.netloc` preserved both, so `https://APP.EXAMPLE.COM:443` would silently fail to match the browser's `https://app.example.com`. Switch to `parsed_url.hostname` + drop default ports, and reject non-http(s) schemes at startup. 2. greptile (P2) — `re.fullmatch` instead of `re.match`. With `re.match` plus `$`, a candidate ending in `\n` matches because `$` is allowed before an optional trailing newline. `fullmatch` removes the ambiguity. 3. self — `_RegexOrigin.__hash__` violated `a == b ⇒ hash(a) == hash(b)` (one fixed pattern hash vs. many matching strings). Today this is masked because python-socketio uses linear `__eq__` on a list, but if the allow-list is ever wrapped in a set, every legitimate subdomain would silently be rejected — exactly the failure mode UN-3439 closes. Make instances unhashable so the contract can't be broken. 4. self — No regression tests. Add `backend/utils/tests/test_cors_origin.py` (33 cases) covering: regex match/no-match, lookalike spoofing, scheme mismatch, trailing-newline rejection, non-string equality protocol, unhashability, ReDoS bounds, URL normalization (case, default ports, trailing slash, paths, queries), startup-guard rejections (empty, no-scheme, non-browser-scheme, no-host), and end-to-end via the same `RegexOrigin` path SocketIO uses. 5. self — Over-clever wildcard-to-regex builder. The `split('*').join(re.escape, ...)` construction generalised to N wildcards but the input has exactly one; replace with a direct rf-string that's self-evident on review. Refactor for testability: extract `RegexOrigin` and `normalize_web_app_origin` into `backend/utils/cors_origin.py` (Django-free, importable from settings and tests). Settings now delegates to one helper call; `log_events.py` imports `RegexOrigin`. No behavioural change beyond what each comment fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/utils/cors_origin.py`:
- Around line 64-67: Accessing parsed.port can raise ValueError and bypass the
nice config error; update the logic around parsed, default_port, and netloc in
cors_origin.py (the block that computes default_port = {"http": 80, "https":
443}[parsed.scheme] and checks parsed.port) to catch ValueError from
parsed.port, and re-raise a new error with the same actionable message you use
for invalid WEB_APP_ORIGIN_URL so operators always see the normalized config
error; ensure you still preserve the existing behavior of including the port in
netloc when it is valid and non-default.
In `@backend/utils/tests/test_cors_origin.py`:
- Around line 74-81: The test test_no_redos in
backend/utils/tests/test_cors_origin.py uses a too-strict absolute timing
assertion (< 0.05s) which flakes on CI; update the assertion for RegexOrigin to
use a looser bound (e.g., 0.2–0.5s) or compare against a baseline run time
(e.g., run the same check twice and assert the second is within a small
multiplier of the first) so the hostile input check remains fast but not
CI-sensitive; modify the assertion line in test_no_redos accordingly to use the
new threshold or relative comparison.
- Around line 69-73: The set-literal assertion test uses {ro} which is flagged
as B018 but the noqa on that line incorrectly suppresses B015; update the
suppression to match the actual lint code by changing the trailing comment for
the set-literal assertion (the line with "{ro}") to use "# noqa: B018" so the
linter no longer reports B018 for that test (refer to the test lines using
hash(ro) and {ro} to locate the two assertions).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 30170be8-11ac-421a-91fd-f4bc711dbc41
📒 Files selected for processing (4)
backend/backend/settings/base.pybackend/utils/cors_origin.pybackend/utils/log_events.pybackend/utils/tests/test_cors_origin.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/backend/settings/base.py
The Sonar quality gate failed with C reliability + 5 security hotspots, all
on the new test file:
- S905 (Bug, Major) — `{ro}` flagged as no-side-effect statement (Sonar
doesn't see the implicit `__hash__` call). Drove the C reliability rating.
Fix: use `len({ro})` so the side effect is via an explicit function call;
test still asserts the same `TypeError`.
- S5727 (Code Smell, Critical) — `assert ro != None` is tautological and
doesn't exercise `__eq__`. Switch to `(ro == None) is False` which directly
tests that `NotImplemented` falls back to identity-equality.
- S5332 × 5 (Hotspots) — `http://` and `ftp://` literals in test data.
These are intentional inputs proving the rejection logic. Annotate with
`# NOSONAR` and an explanatory comment so the hotspots can be marked
reviewed.
No production code changed; tests still 33/33 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sonar S5727 correctly inferred that ``ro == None`` is statically always False (NotImplemented falls back to identity), making the assertion look tautological. The intent is to lock the protocol contract: ``__eq__`` must return the ``NotImplemented`` sentinel for non-strings. Test that directly via ``ro.__eq__(None) is NotImplemented`` instead of going through ``==``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/utils/tests/test_cors_origin.py`:
- Around line 61-67: Replace the direct equality check against None with an
explicit call to the equality magic method so the test asserts the
NotImplemented contract: in test_non_string_returns_not_implemented call
ro.__eq__(None) and assert it is NotImplemented (leave the existing (ro == 42)
is False assertion as-is or similarly replace it with ro.__eq__(42) is
NotImplemented if you want both non-string cases checked explicitly); reference
the RegexOrigin class and its __eq__ method to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6f266d4e-d52a-46aa-9e23-7382cd6fa8d2
📒 Files selected for processing (1)
backend/utils/tests/test_cors_origin.py
…DoS bound Two minor follow-ups from the second CodeRabbit pass: - `parsed.port` is a property that raises ValueError on malformed/out-of-range inputs (e.g. `:abc`, `:99999`). That bypassed our normalized config-error message and surfaced as a stack trace. Wrap the access and re-raise with the same actionable text. Adds two test cases (`https://example.com:abc`, `https://example.com:99999`) to lock the new behaviour. - The 50ms ReDoS timing bound is too tight for noisy CI runners. Loosen to 500ms — still orders of magnitude below what catastrophic backtracking would produce. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
|
All clean — ready for re-review. SonarCloud quality gate now passing on the latest commit ( Comments addressed end-to-end across the two review passes:
The fix itself remains scoped to UN-3439 item #1 (websocket connectivity restoration); items #2 (decoupling indexing from Socket.io) and #3 (indexing fallback) remain owned separately. |
* UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS (#1938) * UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS Production socket connections were failing for `*.env.us-central.unstract.com` because python-socketio does exact-string comparison on `cors_allowed_origins`, so a literal `*` pattern silently rejected every real subdomain. - Add `CORS_ALLOWED_ORIGIN_REGEXES` derived from `WEB_APP_ORIGIN_URL_WITH_WILD_CARD`. - Wire SocketIO via `_RegexOrigin` whose `__eq__` does the regex match — single list entry covers all wildcard subdomains, no library subclass needed. - Normalize `WEB_APP_ORIGIN_URL` through `urlparse` so trailing slashes / paths in env are stripped (also fixes the `…com//oauth-status/` double-slash). - Add startup guard for malformed env values. Resolves item #1 of UN-3439. Items #2/#3 (decoupling indexing from Socket.io, fallback) are owned separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3439 [FIX] Address PR review: canonical origin, fullmatch, unhashable RegexOrigin, tests Addresses five review comments on #1938: 1. coderabbitai (Major) — RFC 6454 canonicalization. Browsers serialize `Origin` headers with a lowercase host and no explicit default ports; `parsed_url.netloc` preserved both, so `https://APP.EXAMPLE.COM:443` would silently fail to match the browser's `https://app.example.com`. Switch to `parsed_url.hostname` + drop default ports, and reject non-http(s) schemes at startup. 2. greptile (P2) — `re.fullmatch` instead of `re.match`. With `re.match` plus `$`, a candidate ending in `\n` matches because `$` is allowed before an optional trailing newline. `fullmatch` removes the ambiguity. 3. self — `_RegexOrigin.__hash__` violated `a == b ⇒ hash(a) == hash(b)` (one fixed pattern hash vs. many matching strings). Today this is masked because python-socketio uses linear `__eq__` on a list, but if the allow-list is ever wrapped in a set, every legitimate subdomain would silently be rejected — exactly the failure mode UN-3439 closes. Make instances unhashable so the contract can't be broken. 4. self — No regression tests. Add `backend/utils/tests/test_cors_origin.py` (33 cases) covering: regex match/no-match, lookalike spoofing, scheme mismatch, trailing-newline rejection, non-string equality protocol, unhashability, ReDoS bounds, URL normalization (case, default ports, trailing slash, paths, queries), startup-guard rejections (empty, no-scheme, non-browser-scheme, no-host), and end-to-end via the same `RegexOrigin` path SocketIO uses. 5. self — Over-clever wildcard-to-regex builder. The `split('*').join(re.escape, ...)` construction generalised to N wildcards but the input has exactly one; replace with a direct rf-string that's self-evident on review. Refactor for testability: extract `RegexOrigin` and `normalize_web_app_origin` into `backend/utils/cors_origin.py` (Django-free, importable from settings and tests). Settings now delegates to one helper call; `log_events.py` imports `RegexOrigin`. No behavioural change beyond what each comment fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3439 [FIX] Address SonarCloud quality gate The Sonar quality gate failed with C reliability + 5 security hotspots, all on the new test file: - S905 (Bug, Major) — `{ro}` flagged as no-side-effect statement (Sonar doesn't see the implicit `__hash__` call). Drove the C reliability rating. Fix: use `len({ro})` so the side effect is via an explicit function call; test still asserts the same `TypeError`. - S5727 (Code Smell, Critical) — `assert ro != None` is tautological and doesn't exercise `__eq__`. Switch to `(ro == None) is False` which directly tests that `NotImplemented` falls back to identity-equality. - S5332 × 5 (Hotspots) — `http://` and `ftp://` literals in test data. These are intentional inputs proving the rejection logic. Annotate with `# NOSONAR` and an explanatory comment so the hotspots can be marked reviewed. No production code changed; tests still 33/33 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3439 [FIX] Remove last S5727 code smell — test __eq__ via dunder Sonar S5727 correctly inferred that ``ro == None`` is statically always False (NotImplemented falls back to identity), making the assertion look tautological. The intent is to lock the protocol contract: ``__eq__`` must return the ``NotImplemented`` sentinel for non-strings. Test that directly via ``ro.__eq__(None) is NotImplemented`` instead of going through ``==``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3439 [FIX] Address remaining CodeRabbit nits — port validation, ReDoS bound Two minor follow-ups from the second CodeRabbit pass: - `parsed.port` is a property that raises ValueError on malformed/out-of-range inputs (e.g. `:abc`, `:99999`). That bypassed our normalized config-error message and surfaced as a stack trace. Wrap the access and re-raise with the same actionable text. Adds two test cases (`https://example.com:abc`, `https://example.com:99999`) to lock the new behaviour. - The 50ms ReDoS timing bound is too tight for noisy CI runners. Loosen to 500ms — still orders of magnitude below what catastrophic backtracking would produce. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ReverseMerge: V0.161.4 hotfix (#1943) * Change csp to report only * [HOTFIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var (v0.161.4) (#1939) [HOTFIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var (#1937) [FIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var os.environ.get returns the raw string when the variable is set, so ENABLE_HIGHLIGHT_API_DEPLOYMENT="False" was truthy in Python (any non-empty string is truthy). Wrap in CommonUtils.str_to_bool so "False" / "false" / "0" actually evaluate to False. The setting is consumed by the cloud configuration plugin's spec default (ConfigSpec.default in plugins/configuration/cloud_config.py) on cloud and on-prem builds. With this fix, an admin who explicitly sets the env var to a falsy string sees highlight data stripped as expected. Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Deepak K <89829542+Deepak-Kesavan@users.noreply.github.com> Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3448 [FIX] Remove vestigial `uv pip install` line in uv-lock-automation workflow (#1941) * UN-3448 [FIX] Add --system flag to uv pip install in uv-lock-automation workflow Modern uv requires uv pip install to run inside a virtual environment OR with the explicit --system flag. The workflow currently has neither, so it errors out: error: No virtual environment found for Python 3.12.9; run `uv venv` to create an environment, or pass `--system` to install into a non-virtual environment This breaks every PR that touches a pyproject.toml (the workflow's paths filter triggers on those). Last successful run was 2026-04-01, before a behaviour change in uv or astral-sh/setup-uv@v7. The --system flag is exactly what the error message suggests and is correct here — we install pip into the runner's system Python; the downstream uv-lock.sh script creates its own venvs as needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3448 [FIX] Remove vestigial `uv pip install` line per review Per @jaseemjaskp's review: the pre-step `uv pip install ... pip` does nothing useful for this workflow. The downstream uv-lock.sh script uses uv sync at line 74, which manages its own venvs internally and never invokes pip directly: $ grep -rn 'pip' docker/scripts/uv-lock-gen/ docker/scripts/uv-lock-gen/uv-lock.sh:2:set -o pipefail Only match is pipefail (shell option), no real pip references. Removing the line entirely is cleaner than papering over with --system. The line was likely copy-pasted from a sibling workflow that legitimately needed pip in the system Python. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ReverseMerge: V0.163.2 hotfix (#1946) * [HOTFIX] Use importlib.util.find_spec for pluggable worker discovery (#1918) * [FIX] Use importlib.util.find_spec for pluggable worker discovery _verify_pluggable_worker_exists() previously checked for the literal file `pluggable_worker/<name>/worker.py` on disk, which breaks when the plugin has been compiled to a .so (Nuitka, Cython, or any C extension) — the module is perfectly importable but the pre-check rejects it because only the .py extension is considered. Replace the filesystem check with importlib.util.find_spec(), which is Python's standard way to ask "is this module resolvable by the import system?". It honors every registered finder — source .py, compiled .so, bytecode .pyc, namespace packages, zipimports — so the function now matches what its docstring claims: verifying the module can be loaded, not that a specific file extension is present. Behavior is preserved for existing deployments: - Images with no `pluggable_worker/<name>/` subpackage → find_spec raises ModuleNotFoundError (ImportError subclass) → returns False. - Images with source .py → find_spec resolves the .py → returns True. - Images with compiled .so → find_spec resolves the .so → returns True. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FIX] Handle ValueError from find_spec in pluggable worker verification Greptile-flagged edge case: importlib.util.find_spec() can raise ValueError (not just ImportError) when sys.modules has a partially initialised module entry with __spec__ = None from a prior failed import. Broaden the except to catch both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FIX] Resolve api-deployment worker directory from enum import path worker.py:452 did worker_type.value.replace("-", "_") to derive the on-disk dir name. All WorkerType enum values already use underscores, so the replace was a no-op; for API_DEPLOYMENT whose dir is "api-deployment" (hyphen), it resolved to "api_deployment" and the os.path.exists() check failed. Boot then logged a spurious "❌ Worker directory not found: /app/api_deployment" at ERROR level. The task registration path (builder + celery autodiscover via to_import_path) is unaffected, so this was purely log noise — but noise at ERROR level that masks real failures in log scans. Fix: derive the directory from the authoritative to_import_path() which already handles the hyphen case (api_deployment -> api-deployment). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [HOTFIX] Add IAM Role / Instance Profile auth mode to AWS Bedrock adapter (#1944) * [FEAT] Allow Bedrock to fall through to boto3's default credential chain Match the S3/MinIO connector pattern: when AWS access keys are left blank on the Bedrock LLM and embedding adapter forms, drop them from the kwargs dict so boto3's default credential chain handles authentication. This unlocks IAM role / instance profile / IRSA / AWS Profile scenarios on hosts that already have ambient AWS credentials (e.g. EKS workers with IRSA, EC2 with an instance profile). - llm1/static/bedrock.json: clarify access-key descriptions to mention IRSA and instance profile (already non-required at v0.163.2 base). - embedding1/static/bedrock.json: drop aws_access_key_id and aws_secret_access_key from top-level required; same description fix; expose aws_profile_name for parity with the LLM form. - base1.py: AWSBedrockLLMParameters and AWSBedrockEmbeddingParameters now strip empty access-key values from the validated kwargs before returning, so empty strings don't override boto3's default chain. AWSBedrockEmbeddingParameters fields gain explicit None defaults and an aws_profile_name field. Backward-compatible: existing adapters with access keys filled in continue to work unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FEAT] Add Authentication Type selector to Bedrock adapter form Add an explicit `auth_type` selector with two options, making the auth choice clear to users: - "Access Keys" (default): existing flow, keys required - "IAM Role / Instance Profile (on-prem AWS only)": no fields; relies on boto3's default credential chain (IRSA on EKS, task role on ECS, instance profile on EC2). Description on the selector explicitly notes this option is only for AWS-hosted Unstract deployments. The form-only auth_type field is stripped before LiteLLM validation in both AWSBedrockLLMParameters.validate() and AWSBedrockEmbeddingParameters. validate(). Empty access keys continue to be stripped so boto3 falls through to the default chain even when the access_keys arm is selected without values (matches the S3/MinIO connector pattern). Backward-compatible: legacy adapters without auth_type behave as "Access Keys" mode (the default), and existing keys are forwarded unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [REVIEW] Address Bedrock auth_type review feedback Fixes the P0/P1 issues raised by greptile-apps and jaseemjaskp on PR #1944. Behaviour fixes: - Stale-key leak in IAM Role mode: switching an existing adapter from Access Keys to IAM Role would carry truthy stored access keys through the strip-empty-only loop, so boto3 silently authenticated with the old long-lived credentials instead of falling through to the host's IRSA / instance-profile identity. Both LLM and embedding paths were affected. - Silent acceptance of unknown auth_type: a typo (e.g. "access_key") or a malformed payload from a non-UI client passed through the dict comprehension untouched, with no enum guard. - Cross-field validation gap: explicit Access Keys mode with blank or whitespace-only values silently fell through to the default credential chain instead of surfacing the misconfiguration. Implementation: - Add a module-level _resolve_bedrock_aws_credentials helper used by both AWSBedrockLLMParameters.validate() and AWSBedrock EmbeddingParameters.validate(), so the auth-type contract is expressed once. - Validates auth_type against an allowlist (None | "access_keys" | "iam_role"); raises ValueError on anything else. - iam_role: unconditionally drops aws_access_key_id and aws_secret_access_key. - access_keys (explicit): requires non-blank values; raises ValueError if either is empty or whitespace-only. - Legacy (auth_type absent): retains the lenient strip behaviour so pre-PR adapter configurations continue to deserialise unchanged. - Restore aws_region_name as required (no `= None` default) on AWSBedrockEmbeddingParameters; only credentials may legitimately be absent. - Drop the orphan aws_profile_name field from embedding1/static/bedrock.json: it was added for parity with the LLM form but lives outside the auth_type oneOf and contradicts the selector's "no further input" semantics. The LLM form already had aws_profile_name pre-PR and is left alone for backwards compatibility. Tests: - New tests/test_bedrock_adapter.py covers 15 cases across LLM and embedding adapters: legacy-no-auth-type, explicit access_keys with valid/blank/whitespace keys, iam_role with stale/no keys, unknown auth_type rejection, cross-field validation, and preservation of unrelated params (model_id, aws_profile_name, region, thinking). Skipped (P2 nice-to-have): - Comment-scope clarification, MinIO reference rewording, validate-mutates-caller'\''s-dict, and the LLM form description nit about aws_profile_name visibility. These don'\''t change behaviour and can be addressed in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --------- Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Athul <89829560+athul-rs@users.noreply.github.com> * batch notification --------- Co-authored-by: ali <117142933+muhammad-ali-e@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com> Co-authored-by: Deepak K <89829542+Deepak-Kesavan@users.noreply.github.com> Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com> Co-authored-by: Praveen Kumar <praveen@zipstack.com> Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Athul <89829560+athul-rs@users.noreply.github.com>



What
CORS_ALLOWED_ORIGIN_REGEXESto Django settings, derived from the existingWEB_APP_ORIGIN_URL_WITH_WILD_CARD. Any subdomain of the configured netloc is now accepted by django-cors-headers._RegexOriginhelper inutils/log_events.pyand wire it intosocketio.Server(cors_allowed_origins=…). The python-socketio engine.io handshake now applies the same wildcard semantics, so connections from*.env.us-central.unstract.comare no longer rejected with 401 beforeconnect()runs.WEB_APP_ORIGIN_URLviaurlparseso trailing slashes / paths in the env value are stripped. Apex origins now match browserOriginheaders, and OAuth redirects no longer produce…com//oauth-status/.WEB_APP_ORIGIN_URL(missing scheme or netloc) raisesValueErrorat import time instead of silently producing a regex that matches nothing real.Why
*.env.us-central.unstract.com, blocking prompt-studio indexing workflows. python-socketio does exact-string comparison oncors_allowed_origins, so a literal*pattern silently rejected every real subdomain.How
WEB_APP_ORIGIN_URL_WITH_WILD_CARDon*,re.escapeeach static segment, join with[^/]+, anchor with^…$. Single source of truth for both Django HTTP CORS and SocketIO._RegexOrigin.__eq__returns the regex match result. python-socketio checks origin viaorigin in allowed_origins, which Python routes to our__eq__— a single list entry covers all wildcard subdomains without subclassing the library.__hash__is provided too in case future versions of socketio swap to a set-based check.cors_allowed_origins=API. Stable across socketio releases.Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
connect()remains the real authn gate; CORS is defense in depth.Database Migrations *
Env Config *
WEB_APP_ORIGIN_URLsemantics tightened: must bescheme://host[:port](already the documented form). A trailing slash is now tolerated; a malformed value raises at startup.Related Issues or PRs
Notes on Testing
_RegexOriginequality protocol, regex-injection safety, ReDoS safety, default-localhost behaviour).WEB_APP_ORIGIN_URL=https://us-central.unstract.com/:https://dev.env.us-central.unstract.comacceptedhttps://test.env.us-central.unstract.comacceptedhttps://us-central.unstract.com(apex) acceptedhttps://evil.comrejectedhttp://app.us-central.unstract.com(wrong scheme) rejectedhttps://us-central.unstract.com.evil.com(lookalike) rejectedChecklist *
_RegexOrigin.__eq__-trick docstring)🤖 Generated with Claude Code