Skip to content

Extract SPOG org-id from cluster http_path for non-Thrift requests#816

Open
msrathore-db wants to merge 1 commit into
databricks:mainfrom
msrathore-db:fix-spog-org-id-extract-cluster-path
Open

Extract SPOG org-id from cluster http_path for non-Thrift requests#816
msrathore-db wants to merge 1 commit into
databricks:mainfrom
msrathore-db:fix-spog-org-id-extract-cluster-path

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

Summary

  • Fixes silent telemetry loss on SPOG (custom-URL) hosts when connecting to an all-purpose cluster via an http_path like sql/protocolv1/o/<workspace-id>/<cluster-id>.
  • Session._extract_spog_headers now extracts the workspace ID from the /o/<wsid>/ path segment as a fallback when ?o=<wsid> is not present, and sets it as the x-databricks-org-id header so the workspace-scoped HTTP transports (telemetry, feature flags, SEA) can route correctly.
  • Priority order preserved: explicit caller header ▶ ?o= in http_path/sql/protocolv1/o/<wsid>/ path segment.

Why

On a SPOG host the workspace identity has to be in either the URL or the x-databricks-org-id header so PoPP can route a request to the right workspace. For all-purpose cluster Thrift this is free — the workspace ID is in the /o/<wsid>/ segment of the http_path, so PoPP routes Thrift via routing_reason=workspace-id and the session opens fine without ?o=.

Connection-scoped HTTP clients used for telemetry, feature flags, and the SEA backend talk to different paths (/telemetry-ext, /api/...) that do not carry the workspace ID. The previous extraction only looked at ?o= in the query string, so on a cluster http_path without ?o= no x-databricks-org-id header was ever attached. PoPP fell back to default (account) routing on those endpoints and responded with a 303 redirect to /login — silently dropping every telemetry batch.

What changes

src/databricks/sql/session.py

  • New compiled regex Session._CLUSTER_PATH_ORG_ID_RE matching (?:^|/)sql/protocolv1/o/(\d+)/[^/?]+.
  • _extract_spog_headers now checks the caller-header guard first, then ?o=, then the cluster path segment. Empty http_path returns {} as before. The log line reports which source produced the workspace ID.

tests/unit/test_session.py

  • Five new tests under the existing TestSpogHeaders class:
    • Cluster path without ?o= → header extracted from /o/<wsid>/
    • Cluster path with leading / → same
    • Cluster path with ?o= → query-param value wins
    • Cluster path + explicit caller header → caller wins
    • Warehouse path without ?o= → no header (regression guard: the new regex must not match warehouse paths)

Test plan

  • pytest tests/unit/test_session.py::TestSpogHeaders -v — 12 tests pass (5 new).
  • pytest tests/unit/test_session.py — full file passes (26/26).
  • Behavior validated on the OSS JDBC equivalent fix against Prod SPOG (peco.azuredatabricks.net) all-purpose cluster: telemetry POST /telemetry-ext flipped from HTTP 303 → /login to HTTP 200 OK once the path-segment extraction populated x-databricks-org-id. Same shape of fix here.

Out of scope

  • The corresponding OSS JDBC driver fix is opened as Extract SPOG org-id from cluster httpPath for non-Thrift requests databricks-jdbc#1472.
  • The Node.js connector (databricks/databricks-sql-nodejs) already extracts the org ID from both query param and path segment (see extractWorkspaceId/hasMalformedOrgParam in lib/DBSQLClient.ts).
  • The Go connector (databricks/databricks-sql-go) has a separate PR for the same fix.
  • The C# ADBC driver (adbc-drivers/databricks) has the same parsing limitation in PropertyHelper.ParseOrgIdFromProperties and needs a corresponding fix in its own repo.

This pull request and its description were written with assistance from Claude Code.

For all-purpose-compute Thrift connections on SPOG (custom-URL) hosts the
http_path is /sql/protocolv1/o/<workspace-id>/<cluster-id> and the
workspace ID is encoded in the path itself. PoPP routes the Thrift
request correctly off the /o/<wsid>/ segment, so the connection succeeds
without an explicit ?o= query parameter.

Other requests on the same connection (telemetry uploads to
/telemetry-ext, feature-flag fetches, SEA REST calls) hit different
paths that don't carry the workspace ID. Previously _extract_spog_headers
only looked at ?o= in the http_path, so the x-databricks-org-id header
was never set for cluster URLs without ?o=. On SPOG hosts PoPP then had
no workspace context for these requests and redirected them to /login,
silently dropping telemetry.

Extend _extract_spog_headers to also extract the workspace ID from the
cluster path segment as a fallback when ?o= is absent. Priority order:
explicit caller header > ?o= query param > /o/<wsid>/ path segment.

Adds five unit tests covering the new cluster-path extraction, leading
slash, query-param-wins priority, explicit-header-wins priority, and a
warehouse-path regression guard.

Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant