Extract SPOG org-id from cluster http_path for non-Thrift requests#816
Open
msrathore-db wants to merge 1 commit into
Open
Extract SPOG org-id from cluster http_path for non-Thrift requests#816msrathore-db wants to merge 1 commit into
msrathore-db wants to merge 1 commit into
Conversation
For all-purpose-compute Thrift connections on SPOG (custom-URL) hosts the http_path is /sql/protocolv1/o/<workspace-id>/<cluster-id> and the workspace ID is encoded in the path itself. PoPP routes the Thrift request correctly off the /o/<wsid>/ segment, so the connection succeeds without an explicit ?o= query parameter. Other requests on the same connection (telemetry uploads to /telemetry-ext, feature-flag fetches, SEA REST calls) hit different paths that don't carry the workspace ID. Previously _extract_spog_headers only looked at ?o= in the http_path, so the x-databricks-org-id header was never set for cluster URLs without ?o=. On SPOG hosts PoPP then had no workspace context for these requests and redirected them to /login, silently dropping telemetry. Extend _extract_spog_headers to also extract the workspace ID from the cluster path segment as a fallback when ?o= is absent. Priority order: explicit caller header > ?o= query param > /o/<wsid>/ path segment. Adds five unit tests covering the new cluster-path extraction, leading slash, query-param-wins priority, explicit-header-wins priority, and a warehouse-path regression guard. Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sql/protocolv1/o/<workspace-id>/<cluster-id>.Session._extract_spog_headersnow extracts the workspace ID from the/o/<wsid>/path segment as a fallback when?o=<wsid>is not present, and sets it as thex-databricks-org-idheader so the workspace-scoped HTTP transports (telemetry, feature flags, SEA) can route correctly.?o=inhttp_path▶/sql/protocolv1/o/<wsid>/path segment.Why
On a SPOG host the workspace identity has to be in either the URL or the
x-databricks-org-idheader so PoPP can route a request to the right workspace. For all-purpose cluster Thrift this is free — the workspace ID is in the/o/<wsid>/segment of the http_path, so PoPP routes Thrift viarouting_reason=workspace-idand the session opens fine without?o=.Connection-scoped HTTP clients used for telemetry, feature flags, and the SEA backend talk to different paths (
/telemetry-ext,/api/...) that do not carry the workspace ID. The previous extraction only looked at?o=in the query string, so on a cluster http_path without?o=nox-databricks-org-idheader was ever attached. PoPP fell back to default (account) routing on those endpoints and responded with a 303 redirect to/login— silently dropping every telemetry batch.What changes
src/databricks/sql/session.pySession._CLUSTER_PATH_ORG_ID_REmatching(?:^|/)sql/protocolv1/o/(\d+)/[^/?]+._extract_spog_headersnow checks the caller-header guard first, then?o=, then the cluster path segment. Emptyhttp_pathreturns{}as before. The log line reports which source produced the workspace ID.tests/unit/test_session.pyTestSpogHeadersclass:?o=→ header extracted from/o/<wsid>//→ same?o=→ query-param value wins?o=→ no header (regression guard: the new regex must not match warehouse paths)Test plan
pytest tests/unit/test_session.py::TestSpogHeaders -v— 12 tests pass (5 new).pytest tests/unit/test_session.py— full file passes (26/26).peco.azuredatabricks.net) all-purpose cluster: telemetry POST/telemetry-extflipped fromHTTP 303 → /logintoHTTP 200 OKonce the path-segment extraction populatedx-databricks-org-id. Same shape of fix here.Out of scope
databricks/databricks-sql-nodejs) already extracts the org ID from both query param and path segment (seeextractWorkspaceId/hasMalformedOrgParaminlib/DBSQLClient.ts).databricks/databricks-sql-go) has a separate PR for the same fix.adbc-drivers/databricks) has the same parsing limitation inPropertyHelper.ParseOrgIdFromPropertiesand needs a corresponding fix in its own repo.This pull request and its description were written with assistance from Claude Code.