Skip to content

fix(bigquery): set default dataset from schema in adjust_engine_params#40776

Draft
aminghadersohi wants to merge 2 commits into
apache:masterfrom
aminghadersohi:aminghadersohi/fix-bigquery-schema-default-dataset
Draft

fix(bigquery): set default dataset from schema in adjust_engine_params#40776
aminghadersohi wants to merge 2 commits into
apache:masterfrom
aminghadersohi:aminghadersohi/fix-bigquery-schema-default-dataset

Conversation

@aminghadersohi
Copy link
Copy Markdown
Contributor

@aminghadersohi aminghadersohi commented Jun 4, 2026

SUMMARY

BigQueryEngineSpec.adjust_engine_params() ignored the schema parameter. BigQuery requires fully qualified table names (project.dataset.table) unless a default dataset is configured via the SQLAlchemy URL database component (bigquery://project/dataset). Without this, any SQL with unqualified table names fails:

Table must be qualified with a dataset

Fix: propagate schema to the URL database component so unqualified table names resolve to schema.table_name.

When both catalog and schema are provided: catalog sets the host (project), schema sets the database (default dataset), giving bigquery://catalog/schema.

Also fixes test_calculated_column_in_order_by: the birth_names fixture table has schema="public" from the Postgres test database; after the fix, that was being passed to BigQuery as the default dataset, triggering a real GCP credential check. Clears table.schema before get_query_str to keep the test focused on ORDER BY SQL generation.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend-only change.

TESTING INSTRUCTIONS

pytest tests/unit_tests/db_engine_specs/test_bigquery.py::test_adjust_engine_params_schema_as_dataset -v
pytest tests/unit_tests/db_engine_specs/test_bigquery.py::test_adjust_engine_params_catalog_as_host -v
pytest tests/integration_tests/db_engine_specs/bigquery_tests.py::TestBigQueryDbEngineSpec::test_calculated_column_in_order_by -v

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration
  • Introduces new feature or API
  • Removes existing feature or API

BigQuery requires fully qualified table names (project.dataset.table)
unless the SQLAlchemy URL database component is set to a default dataset.
Previously, the schema parameter was ignored, causing 'Table must be
qualified with a dataset' errors when the chatbot called execute_sql
without explicit dataset qualification.
@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 4, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 1ceea04
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a21cceba65c510008ec6244
😎 Deploy Preview https://deploy-preview-40776--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.85%. Comparing base (a6d2c95) to head (ea79ce8).
⚠️ Report is 114 commits behind head on master.

Files with missing lines Patch % Lines
superset/db_engine_specs/bigquery.py 0.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (a6d2c95) and HEAD (ea79ce8). Click for more details.

HEAD has 9 uploads less than BASE
Flag BASE (a6d2c95) HEAD (ea79ce8)
python 10 3
unit 3 1
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40776      +/-   ##
==========================================
- Coverage   63.94%   55.85%   -8.10%     
==========================================
  Files        2658     2660       +2     
  Lines      143011   143673     +662     
  Branches    32866    33002     +136     
==========================================
- Hits        91454    80242   -11212     
- Misses      49994    62715   +12721     
+ Partials     1563      716     -847     
Flag Coverage Δ
hive 39.75% <0.00%> (-0.01%) ⬇️
mysql ?
postgres ?
presto 41.34% <0.00%> (-0.03%) ⬇️
python 42.64% <0.00%> (-17.32%) ⬇️
sqlite ?
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…chema propagation

adjust_engine_params now sets the BigQuery default dataset from the schema
parameter. The birth_names table has schema="public" from the Postgres test
database, which was being passed to BigQuery as the default dataset, triggering
a real credential check and failing in CI without GCP credentials.

Clear table.schema before calling get_query_str so the test stays focused on
its actual intent: verifying ORDER BY SQL generation for calculated columns.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the BigQuery engine spec so that when Superset provides a schema, it is propagated into the SQLAlchemy URL’s database component to act as BigQuery’s default dataset (enabling unqualified table references), and it adjusts tests accordingly.

Changes:

  • Update BigQueryEngineSpec.adjust_engine_params() to set the URL database from schema (default dataset behavior).
  • Add a unit test covering the schema→dataset adjustment behavior.
  • Update an integration test to clear a Postgres-derived schema to avoid unintended BigQuery credential checks during SQL generation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
superset/db_engine_specs/bigquery.py Implements schema→URL database propagation so BigQuery can use a default dataset for unqualified table names.
tests/unit_tests/db_engine_specs/test_bigquery.py Adds coverage for the new schema-as-dataset behavior in adjust_engine_params().
tests/integration_tests/db_engine_specs/bigquery_tests.py Clears table.schema to avoid leaking the Postgres "public" schema into BigQuery URL construction during the test.

Comment on lines +465 to +469
url = make_url("bigquery://project")

# Without schema, URL is unchanged
uri = BigQueryEngineSpec.adjust_engine_params(url, {})[0]
assert str(uri) == "bigquery://project"
Comment on lines 743 to +748
if catalog:
uri = uri.set(host=catalog, database="")
if schema:
# Setting database to schema makes it the BigQuery default dataset,
# so unqualified table names in SQL resolve to schema.table_name.
uri = uri.set(database=schema)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants