Add BigQuery query client and dialect support#1839
Add BigQuery query client and dialect support#1839shangyian merged 17 commits intoDataJunction:mainfrom
Conversation
Implements a direct BigQuery integration following the same pattern as the existing Snowflake client. Adds `BigQueryClient` for table introspection via INFORMATION_SCHEMA, registers `bigquery` as a supported dialect with sqlglot transpilation, and exposes it as an optional install extra (`datajunction-server[bigquery]`). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
✅ Deploy Preview for thriving-cassata-78ae72 canceled.
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Import QueryJobConfig and ScalarQueryParameter at module level so tests can patch them (accessing via bigquery=None failed) - Fix BIGNUMERIC/BIGDECIMAL to use DecimalType(38, 38) since DJ's DecimalType caps max_precision at 38 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
shangyian
left a comment
There was a problem hiding this comment.
Hi @colinmf, thanks for your contribution!
A few thoughts here:
DJ's design intentionally separates semantic layer concerns (eg datajunction-server) from query execution (the query service, which is packaged up in datajunction-query). I see that right now you've only implemented the schema introspection part of the BigQueryClient - that's consistent with how Snowflake is handled in query_clients/snowflake.py, so this looks good.
That said, if the intent is to eventually support query execution through this client too, that's where I'd push back. Query execution in the semantic layer creates scaling problems since they have very different resource profiles. If that comes up, the right home would a BigQuery query service implementation in datajunction-query.
datajunction-server/datajunction_server/query_clients/bigquery.py
Outdated
Show resolved
Hide resolved
Mirrors SnowflakeClient's _get_database_from_engine approach: parses the GCP project from the engine URI netloc (bigquery://my-gcp-project) so different DJ catalogs can point to different GCP projects. Also adds BigQuery env config example to .env and updates tests. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Add engine URI project resolution to BigQueryClient Mirrors SnowflakeClient's _get_database_from_engine approach: parses the GCP project from the engine URI netloc (bigquery://my-gcp-project) so different DJ catalogs can point to different GCP projects. Also adds BigQuery env config example to .env and updates tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Address review comments on BigQueryClient - Add sqlglot to bigquery extra for dialect transpilation support - Add BIGQUERY_AVAILABLE import coverage tests (True/False paths) - Add BigQuery config documentation with examples to QueryClientConfig - Remove redundant 0-based index comment in get_columns_for_table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…_client improvements - Keep comprehensive _get_project_from_engine (host, path, query param fallbacks) - Use _get_client(project=...) from fork to pass resolved project to BigQuery client - Merge test suites: retain all URI parsing tests + fork's _get_client injection test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add BigQuery as a supported engine type in the query service, following the existing Snowflake pattern. Supports project config via extra_params, credentials via config or GOOGLE_APPLICATION_CREDENTIALS env var, and Application Default Credentials as fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thanks for the review Agreed, and thanks for the guidance. The For query execution, we've added BigQuery support in |
Cover credentials path, location, env var fallback, error handling, multi-row and empty results in datajunction-query. Add client project override, location, factory with all options, unsupported type, engine URI project override, and credentials precedence tests in datajunction-server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap BigQuery rows in iter() to match Stream (Iterator) type. Apply ruff format to test file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mock QueryJobConfig and ScalarQueryParameter which are None in CI (google-cloud-bigquery not installed), matching the pattern used by other get_columns_for_table tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover empty path segment fallthrough (131->134) and query params without project key (136->145) to reach 100% branch coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DJ generates SQL with catalog-prefixed table names (e.g. my_catalog.dataset.table) but BigQuery interprets three-part names as project.dataset.table. Since the BQ client already has the project configured, strip the catalog prefix so BigQuery receives dataset.table references. Also remove the GOOGLE_APPLICATION_CREDENTIALS env var fallback from credentials_path — let bigquery.Client() handle ADC natively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shangyian Thanks for approving, Let me know how does the merge/release work ? if i need to do something on myend |
Summary
Adds full BigQuery support to DataJunction — schema introspection in
datajunction-serverand query execution indatajunction-query.datajunction-server: BigQuery schema introspection & dialect
BigQueryClient— direct query client implementingBaseQueryServiceClient(same pattern asSnowflakeClient)INFORMATION_SCHEMA.COLUMNSwith parameterized queriesColumnTypeBIGQUERYdialect — added toDialectenum, registered withSQLGlotTranspilationPlugin_create_configured_query_client()supportsbigquerytypepip install 'datajunction-server[bigquery]'datajunction-query: BigQuery query execution
BIGQUERYengine type — added toEngineTypeenum indjqs/config.pyrun_bigquery_query()— executes queries viagoogle.cloud.bigquery.Client, following therun_snowflake_querypatterncredentials_pathfrom engineextra_params,GOOGLE_APPLICATION_CREDENTIALSenv var fallback, or Application Default Credentialsrun_query()routesEngineType.BIGQUERYto the new functionFiles changed
query_clients/bigquery.pyBigQueryClientwith column introspection and type mappingmodels/dialect.pyBIGQUERY = "bigquery"enum valuetranspilation.pyquery_clients/__init__.pyBigQueryClientutils.pybigquerycase in_create_configured_query_client()pyproject.tomlbigquery = ["google-cloud-bigquery>=3.0.0"]optional extratests/.../bigquery_query_client_test.pybigquery.pytests/utils_test.pydjqs/config.pyBIGQUERY = "bigquery"engine typedjqs/engine.pyrun_bigquery_query()+ routing inrun_query()pyproject.tomlgoogle-cloud-bigquery>=3.11.0dependencytests/api/queries_test.pytests/config.djqs.ymlDJ terminology mapping
Configuration
Test plan
bigquery.py)engine.py)🤖 Generated with Claude Code