fix(bigquery-analytics): route Storage Write API appends to the dataset's region#2
Merged
Merged
Conversation
22328fe to
f41093d
Compare
3 tasks
caohy1988
added a commit
that referenced
this pull request
Jun 1, 2026
) The "Check for hardcoded googleapis.com endpoints" step in .github/workflows/check-file-contents.yml uses grep -lE 'https?://[a-zA-Z0-9.-]+\.googleapis\.com' to find files that should also declare an `.mtls.googleapis.com` counterpart for dynamic endpoint selection. The regex matches any googleapis.com URL — including OAuth 2.0 scope URLs like https://www.googleapis.com/auth/cloud-platform and .../auth/bigquery — which are identity strings, not API endpoints. They don't have mTLS counterparts and never will. Any file that legitimately declares an OAuth scope (very common for ADK plugins integrating Google APIs) trips the gate even when no real endpoint is hardcoded. Fix: add a second pass that filters the candidate set down to files that have at least one googleapis.com URL OUTSIDE the OAuth scope namespace (i.e. not matching `googleapis.com/auth/`). The mTLS check runs only against that filtered set. Verified against four synthesized cases: only_oauth.py (only OAuth scopes) → ignored ✓ real_endpoint.py (endpoint, no mTLS) → flagged ✓ real_endpoint_with_mtls (endpoint + mTLS) → passes ✓ mixed.py (OAuth + endpoint, no mTLS)→ flagged ✓ No effect on the surrounding `logger`, `from __future__`, or `cli` import checks. CI policy intent unchanged: real hardcoded googleapis.com endpoints still must declare their `.mtls` counterpart. Refs: - #2 (the BQAA Storage Write regional routing fix that surfaced this false positive) - GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK#262
…et's region The Storage Write API `AppendRows` streaming RPC does not auto-populate the request-routing header, so the plugin's writes were always routed to the US multiregion. Writes to a dataset in any other region (e.g. northamerica-northeast1) failed with a "session not found" / stream-not-found error and no rows were ever written — which surfaced to users as session_id (and every other column) failing to propagate. Set the `x-goog-request-params: write_stream=<stream>` routing header on the append_rows call, matching what google.cloud.bigquery_storage_v1.writer does internally, so requests reach the region that owns the write stream. US-multiregion behavior is unchanged. Adds a regression test asserting the routing header is passed. Fixes GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK#262
f41093d to
159f185
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes cross-region BigQuery logging failures reported in GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK#262: logging works when the destination dataset is in the
USmultiregion (e.g.us-central1) but fails with a"session not found"error in single regions such asnorthamerica-northeast1.Root cause
The plugin writes rows via the Storage Write API's streaming
AppendRowsRPC:AppendRowsis a client-streaming RPC, so the GAPIC client cannot auto-populate the implicit request-routing header (it has no unary request body to readwrite_streamfrom). Without that header, every append is routed to the US multiregion by default:us-central1resolves within theUSmultiregion → the_defaultstream is found → works.northamerica-northeast1is a single region → the US-routed backend has no knowledge of the stream created there →"session not found"/ stream-not-found, the append fails, and no rows are ever written.This surfaced to users as
session_id(and every other column) "failing to propagate" — in reality the rows never landed because the writes were misrouted. Thesession_idplumbing itself is fine.Per the Storage Write API docs: "if you write data to any region except the
USmultiregion, you must include the following header in your requests:x-goog-request-params: write_stream=<stream_name>." The officialgoogle.cloud.bigquery_storage_v1.writer.AppendRowsStreamhelper sets exactly this header (writer.py); the raw asyncappend_rowsthis plugin calls does not.Fix
Set the routing header explicitly on the
append_rowscall so the request reaches the region that owns the write stream:US-multiregion behavior is unchanged (the header is correct there too). This is a one-line functional fix that mirrors Google's own client-library helper.
Tests
test_append_rows_sets_regional_routing_header(deploys the plugin withlocation="northamerica-northeast1", asserts the routing header is passed toappend_rows).229 passed.Notes / out of scope
bigquerystorage.<location>.rep.googleapis.com) is only required for strict data-residency guarantees; the routing header alone resolves the functional cross-region failure here. Can be a follow-up if data-residency support is desired.locationparam is intentionally not passed to the synchronousbigquery.Client—TestDatasetLocationHandlingdocuments that omitting it lets BigQuery infer location from the dataset and avoids silent view-creation failures for non-US datasets. Left as-is.