fix(bigquery): set default dataset from schema in adjust_engine_params#40776
fix(bigquery): set default dataset from schema in adjust_engine_params#40776aminghadersohi wants to merge 2 commits into
Conversation
BigQuery requires fully qualified table names (project.dataset.table) unless the SQLAlchemy URL database component is set to a default dataset. Previously, the schema parameter was ignored, causing 'Table must be qualified with a dataset' errors when the chatbot called execute_sql without explicit dataset qualification.
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40776 +/- ##
==========================================
- Coverage 63.94% 55.85% -8.10%
==========================================
Files 2658 2660 +2
Lines 143011 143673 +662
Branches 32866 33002 +136
==========================================
- Hits 91454 80242 -11212
- Misses 49994 62715 +12721
+ Partials 1563 716 -847
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…chema propagation adjust_engine_params now sets the BigQuery default dataset from the schema parameter. The birth_names table has schema="public" from the Postgres test database, which was being passed to BigQuery as the default dataset, triggering a real credential check and failing in CI without GCP credentials. Clear table.schema before calling get_query_str so the test stays focused on its actual intent: verifying ORDER BY SQL generation for calculated columns.
There was a problem hiding this comment.
Pull request overview
This PR updates the BigQuery engine spec so that when Superset provides a schema, it is propagated into the SQLAlchemy URL’s database component to act as BigQuery’s default dataset (enabling unqualified table references), and it adjusts tests accordingly.
Changes:
- Update
BigQueryEngineSpec.adjust_engine_params()to set the URLdatabasefromschema(default dataset behavior). - Add a unit test covering the schema→dataset adjustment behavior.
- Update an integration test to clear a Postgres-derived schema to avoid unintended BigQuery credential checks during SQL generation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
superset/db_engine_specs/bigquery.py |
Implements schema→URL database propagation so BigQuery can use a default dataset for unqualified table names. |
tests/unit_tests/db_engine_specs/test_bigquery.py |
Adds coverage for the new schema-as-dataset behavior in adjust_engine_params(). |
tests/integration_tests/db_engine_specs/bigquery_tests.py |
Clears table.schema to avoid leaking the Postgres "public" schema into BigQuery URL construction during the test. |
| url = make_url("bigquery://project") | ||
|
|
||
| # Without schema, URL is unchanged | ||
| uri = BigQueryEngineSpec.adjust_engine_params(url, {})[0] | ||
| assert str(uri) == "bigquery://project" |
| if catalog: | ||
| uri = uri.set(host=catalog, database="") | ||
| if schema: | ||
| # Setting database to schema makes it the BigQuery default dataset, | ||
| # so unqualified table names in SQL resolve to schema.table_name. | ||
| uri = uri.set(database=schema) |
SUMMARY
BigQueryEngineSpec.adjust_engine_params()ignored theschemaparameter. BigQuery requires fully qualified table names (project.dataset.table) unless a default dataset is configured via the SQLAlchemy URLdatabasecomponent (bigquery://project/dataset). Without this, any SQL with unqualified table names fails:Fix: propagate
schemato the URLdatabasecomponent so unqualified table names resolve toschema.table_name.When both
catalogandschemaare provided:catalogsets the host (project),schemasets the database (default dataset), givingbigquery://catalog/schema.Also fixes
test_calculated_column_in_order_by: thebirth_namesfixture table hasschema="public"from the Postgres test database; after the fix, that was being passed to BigQuery as the default dataset, triggering a real GCP credential check. Clearstable.schemabeforeget_query_strto keep the test focused on ORDER BY SQL generation.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A — backend-only change.
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION