Add BigQuery query client and dialect support by colinmf · Pull Request #1839 · DataJunction/dj

colinmf · 2026-03-07T14:39:41Z

Summary

Adds full BigQuery support to DataJunction — schema introspection in datajunction-server and query execution in datajunction-query.

datajunction-server: BigQuery schema introspection & dialect

BigQueryClient — direct query client implementing BaseQueryServiceClient (same pattern as SnowflakeClient)
- Introspects columns via INFORMATION_SCHEMA.COLUMNS with parameterized queries
- Maps all BigQuery types (INT64, FLOAT64, NUMERIC, BIGNUMERIC, STRING, TIMESTAMP, etc.) to DJ ColumnType
- Project resolution order: engine URI → client config → catalog name fallback
- Supports service account credentials (JSON file or dict) and Application Default Credentials
BIGQUERY dialect — added to Dialect enum, registered with SQLGlotTranspilationPlugin
Config factory — _create_configured_query_client() supports bigquery type
Optional install — pip install 'datajunction-server[bigquery]'

datajunction-query: BigQuery query execution

BIGQUERY engine type — added to EngineType enum in djqs/config.py
run_bigquery_query() — executes queries via google.cloud.bigquery.Client, following the run_snowflake_query pattern
Credential handling — credentials_path from engine extra_params, GOOGLE_APPLICATION_CREDENTIALS env var fallback, or Application Default Credentials
Routing — run_query() routes EngineType.BIGQUERY to the new function

Files changed

Component	File	Change
server	`query_clients/bigquery.py`	New `BigQueryClient` with column introspection and type mapping
server	`models/dialect.py`	`BIGQUERY = "bigquery"` enum value
server	`transpilation.py`	Register BigQuery with SQLGlot plugin
server	`query_clients/__init__.py`	Lazy import for `BigQueryClient`
server	`utils.py`	`bigquery` case in `_create_configured_query_client()`
server	`pyproject.toml`	`bigquery = ["google-cloud-bigquery>=3.0.0"]` optional extra
server	`tests/.../bigquery_query_client_test.py`	38 tests — 100% branch coverage on `bigquery.py`
server	`tests/utils_test.py`	Factory tests for BigQuery client creation
djqs	`djqs/config.py`	`BIGQUERY = "bigquery"` engine type
djqs	`djqs/engine.py`	`run_bigquery_query()` + routing in `run_query()`
djqs	`pyproject.toml`	`google-cloud-bigquery>=3.11.0` dependency
djqs	`tests/api/queries_test.py`	7 tests — credentials, location, env var, errors, empty results
djqs	`tests/config.djqs.yml`	BigQuery test engine and catalog

DJ terminology mapping

DJ concept	BigQuery equivalent
catalog	GCP project
schema	BigQuery dataset
table	table name

Configuration

# Server — schema introspection
QUERY_CLIENT__TYPE=bigquery
QUERY_CLIENT__CONNECTION__PROJECT=my-gcp-project
QUERY_CLIENT__CONNECTION__CREDENTIALS_PATH=/path/to/sa.json  # optional

# Query service — engine extra_params
# project, credentials_path, location

Test plan

38 server BigQuery tests pass (100% branch coverage on bigquery.py)
7 djqs BigQuery tests pass (100% coverage on engine.py)
All pre-commit hooks pass (ruff, mypy, format)
Live-tested SQL generation with deployed BigQuery nodes — single metrics, multi-metric, dimensions, filters, cubes all generate valid BigQuery dialect SQL
End-to-end query execution against a real BigQuery instance (requires GCP credentials)

🤖 Generated with Claude Code

Implements a direct BigQuery integration following the same pattern as the existing Snowflake client. Adds `BigQueryClient` for table introspection via INFORMATION_SCHEMA, registers `bigquery` as a supported dialect with sqlglot transpilation, and exposes it as an optional install extra (`datajunction-server[bigquery]`). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

netlify · 2026-03-07T14:39:45Z

✅ Deploy Preview for thriving-cassata-78ae72 canceled.

Name	Link
🔨 Latest commit	`432927c`
🔍 Latest deploy log	https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/69acb04c9799940008e76172

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Import QueryJobConfig and ScalarQueryParameter at module level so tests can patch them (accessing via bigquery=None failed) - Fix BIGNUMERIC/BIGDECIMAL to use DecimalType(38, 38) since DJ's DecimalType caps max_precision at 38 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

shangyian

Hi @colinmf, thanks for your contribution!

A few thoughts here:

DJ's design intentionally separates semantic layer concerns (eg datajunction-server) from query execution (the query service, which is packaged up in datajunction-query). I see that right now you've only implemented the schema introspection part of the BigQueryClient - that's consistent with how Snowflake is handled in query_clients/snowflake.py, so this looks good.

That said, if the intent is to eventually support query execution through this client too, that's where I'd push back. Query execution in the semantic layer creates scaling problems since they have very different resource profiles. If that comes up, the right home would a BigQuery query service implementation in datajunction-query.

datajunction-server/pyproject.toml

datajunction-server/datajunction_server/query_clients/bigquery.py

Mirrors SnowflakeClient's _get_database_from_engine approach: parses the GCP project from the engine URI netloc (bigquery://my-gcp-project) so different DJ catalogs can point to different GCP projects. Also adds BigQuery env config example to .env and updates tests. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add engine URI project resolution to BigQueryClient Mirrors SnowflakeClient's _get_database_from_engine approach: parses the GCP project from the engine URI netloc (bigquery://my-gcp-project) so different DJ catalogs can point to different GCP projects. Also adds BigQuery env config example to .env and updates tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Address review comments on BigQueryClient - Add sqlglot to bigquery extra for dialect transpilation support - Add BIGQUERY_AVAILABLE import coverage tests (True/False paths) - Add BigQuery config documentation with examples to QueryClientConfig - Remove redundant 0-based index comment in get_columns_for_table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…_client improvements - Keep comprehensive _get_project_from_engine (host, path, query param fallbacks) - Use _get_client(project=...) from fork to pass resolved project to BigQuery client - Merge test suites: retain all URI parsing tests + fork's _get_client injection test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add BigQuery as a supported engine type in the query service, following the existing Snowflake pattern. Supports project config via extra_params, credentials via config or GOOGLE_APPLICATION_CREDENTIALS env var, and Application Default Credentials as fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

colinmf · 2026-03-07T20:34:07Z

Hi @colinmf, thanks for your contribution!

A few thoughts here:

DJ's design intentionally separates semantic layer concerns (eg datajunction-server) from query execution (the query service, which is packaged up in datajunction-query). I see that right now you've only implemented the schema introspection part of the BigQueryClient - that's consistent with how Snowflake is handled in query_clients/snowflake.py, so this looks good.

That said, if the intent is to eventually support query execution through this client too, that's where I'd push back. Query execution in the semantic layer creates scaling problems since they have very different resource profiles. If that comes up, the right home would a BigQuery query service implementation in datajunction-query.

@shangyian

Thanks for the review

Agreed, and thanks for the guidance. The BigQueryClient in datajunction-server remains schema-introspection only (consistent with the Snowflake pattern in query_clients/snowflake.py).

For query execution, we've added BigQuery support in datajunction-query following the existing Snowflake pattern: EngineType.BIGQUERY in config, run_bigquery_query() in engine.py, with credentials via extra_params or GOOGLE_APPLICATION_CREDENTIALS.

Cover credentials path, location, env var fallback, error handling, multi-row and empty results in datajunction-query. Add client project override, location, factory with all options, unsupported type, engine URI project override, and credentials precedence tests in datajunction-server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wrap BigQuery rows in iter() to match Stream (Iterator) type. Apply ruff format to test file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Mock QueryJobConfig and ScalarQueryParameter which are None in CI (google-cloud-bigquery not installed), matching the pattern used by other get_columns_for_table tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Cover empty path segment fallthrough (131->134) and query params without project key (136->145) to reach 100% branch coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

DJ generates SQL with catalog-prefixed table names (e.g. my_catalog.dataset.table) but BigQuery interprets three-part names as project.dataset.table. Since the BQ client already has the project configured, strip the catalog prefix so BigQuery receives dataset.table references. Also remove the GOOGLE_APPLICATION_CREDENTIALS env var fallback from credentials_path — let bigquery.Client() handle ADC natively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

shangyian

This looks good, thanks for addressing the comments @colinmf!

colinmf · 2026-03-08T18:50:37Z

This looks good, thanks for addressing the comments @colinmf!

@shangyian Thanks for approving, Let me know how does the merge/release work ? if i need to do something on myend

colinmf and others added 4 commits March 7, 2026 15:44

Fix pre-commit: ruff format, trailing commas, GraphQL schema

0b27190

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add tests for _get_client and test_connection to reach 100% coverage

0b943cd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add test for BigQuery ImportError path in utils to reach 100% coverage

b04993a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

shangyian reviewed Mar 7, 2026

View reviewed changes

colinmf marked this pull request as draft March 7, 2026 16:53

colinmf mentioned this pull request Mar 7, 2026

Address review comments on BigQueryClient colinmf/dj#2

Merged

colinmf and others added 5 commits March 7, 2026 18:02

Merge branch 'main' into colinmf/bigquery-integration

4bb8636

Add BigQuery project resolution order from engine URI

48ae7b4

colinmf and others added 4 commits March 7, 2026 22:02

Fix mypy type error and ruff formatting in BigQuery query execution

bfd3a0f

Wrap BigQuery rows in iter() to match Stream (Iterator) type. Apply ruff format to test file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add tests for uncovered branches in _get_project_from_engine

5f26afc

Cover empty path segment fallthrough (131->134) and query params without project key (136->145) to reach 100% branch coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

colinmf marked this pull request as ready for review March 7, 2026 22:18

colinmf requested a review from shangyian March 7, 2026 22:19

colinmf and others added 2 commits March 8, 2026 00:03

Remove unused variable flagged by ruff (F841)

432927c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

shangyian approved these changes Mar 8, 2026

View reviewed changes

shangyian merged commit 3d522dc into DataJunction:main Mar 8, 2026
17 checks passed

shangyian mentioned this pull request Mar 13, 2026

Update uv.lock #1872

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BigQuery query client and dialect support#1839

Add BigQuery query client and dialect support#1839
shangyian merged 17 commits intoDataJunction:mainfrom
colinmf:colinmf/bigquery-integration

colinmf commented Mar 7, 2026 •

edited

Loading

Uh oh!

netlify bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

shangyian left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

colinmf commented Mar 7, 2026 •

edited

Loading

Uh oh!

shangyian left a comment

Uh oh!

colinmf commented Mar 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

colinmf commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

datajunction-server: BigQuery schema introspection & dialect

datajunction-query: BigQuery query execution

Files changed

DJ terminology mapping

Configuration

Test plan

Uh oh!

netlify bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for thriving-cassata-78ae72 canceled.

Uh oh!

shangyian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

colinmf commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shangyian left a comment

Choose a reason for hiding this comment

Uh oh!

colinmf commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

colinmf commented Mar 7, 2026 •

edited

Loading

netlify bot commented Mar 7, 2026 •

edited

Loading

colinmf commented Mar 7, 2026 •

edited

Loading

colinmf commented Mar 8, 2026 •

edited

Loading