Allow Bigquery Emulator settings to be set #1017

OTooleMichael · 2023-11-10T15:57:40Z

resolves #358
docs dbt-labs/docs.getdbt.com/#

This expands out optionally allowing to api_endpoint to be set. This is supported Biqquery way of overriding the http endpoint, similar to Snowflake. This is needed to connecting to an emulator/proxy - in a similar way to Snowflake. Issue 358 references this.

Problem

Using a Bigquery emulator is useful in local dev and cannot currently be done via existing config
Setting the api_endpoint is also useful for security and proxying

Solution

This simply adds a key to the config and sets the connection option if set.

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

cla-bot · 2023-11-10T15:57:45Z

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @OTooleMichael

OTooleMichael · 2023-11-10T16:29:36Z

Does anything need to be done to retrigger now that the CLA is signed?

orlevii · 2023-11-14T17:03:57Z

I need the same functionality,
I started to implement the same feature, I would consider adding support for AnonymousCredentials as well (As it's the way big-query emulator suggests)

See my draft:
https://github.com/dbt-labs/dbt-bigquery/pull/1027/files#diff-e4d9ba3b4b6c6c5709431db14344ec0e23226f700e9250f819247ffbb6b112acR360-R361

OTooleMichael · 2023-11-14T17:17:00Z

I need the same functionality, I started to implement the same feature, I would consider adding support for AnonymousCredentials as well (As it's the way big-query emulator suggests)

See my draft: https://github.com/dbt-labs/dbt-bigquery/pull/1027/files#diff-e4d9ba3b4b6c6c5709431db14344ec0e23226f700e9250f819247ffbb6b112acR360-R361

Hey @orlevii - I saw that.
The AnonymousCredentials seem like a larger API change for the DBT (although in the end I'd imagine both are pretty small).

Furthermore the Bigquery emulator works happily with any creds, in the end of the day it just ignores them. I think it suggests that way in its demo code, because in a vacuum where one must pick a Credential type it makes the most sense (DBT though has the other auth's implemented).

And further to that again the emulator you mentioned wouldn't be the only (and is not my only target), so if that is needed it could be a follow up PR.

hopefully the team reviews the PR and I can add or not according to their desires / whatever will get it merged quickest. :)

CyberHippo · 2024-01-22T11:00:27Z

Hi, I would love to see this merged !

mesmacosta · 2024-02-14T16:42:16Z

Hi, got a really similar use case, looking forward getting this merged!

… should be passed on to BQ

jtcohen6 · 2024-03-22T18:30:09Z

@OTooleMichael Thanks for the PR! @MichelleArk and I tried taking this for a spin alongside goccy/bigquery-emulator.

We found a few issues while using the two together:

Creating schemas: bigquery-emulator does not support creating schemas via StandardSQL (Failed to execute 'CreateSchema' statements goccy/bigquery-emulator#167), only via the Python client method (create_dataset). (A few years ago we switched dbt-bigquery to using StandardSQL (Try using SQL for create_schema #183) instead of the client method for schema creation.)
Uploading seeds: It looks like dbt tries creating a table with UNKNOWN type. I suspect one of the client methods required for BQ seed uploading doesn't work as expected. (Maybe it would work to provide data types for the seeds explicitly; we didn't get a chance to try this.)
Getting post-query metadata: After dbt builds a table or runs a select statement, it asks BigQuery for the number of rows produced via client.get_table. This doesn't seem to be supported by the emulator:

  File "/Users/michelleark/.asdf/installs/python/3.11.0/lib/python3.11/site-packages/dbt/adapters/base/impl.py", line 347, in execute
    return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/michelleark/src/dbt-bigquery/dbt/adapters/bigquery/connections.py", line 549, in execute
    query_table = client.get_table(query_job.destination)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/michelleark/.asdf/installs/python/3.11.0/lib/python3.11/site-packages/google/cloud/bigquery/client.py", line 1077, in get_table
    path = table_ref.path
           ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'path'

I don't think (1) + (2) are hard blockers — we could manually create the schema/dataset, and we just avoided using seeds — but I do think (3) makes it impossible to use dbt with the emulator we tried.

Questions:

Are you using goccy/bigquery-emulator, or a different emulator? We haven't done extensive research, but that one seems to be the most feature rich, well maintained, and widely used.
Does it make sense to merge this PR if there aren't any known emulators that we can document supporting? My current inclination is no, though I'm open to hearing disagreement. If there are a handful of backwards-compatible changes we could make to the dbt-bigquery adapter that would fix the issues outlined above (1+2, maybe 3) to work around the limitations in the existing emulators, we could be open to that, but not if the change-set risks introducing regressions to the standard functionality with real Google BigQuery. If there are a set of changes to make within the emulator so that they more closely mirror the real BQ APIs, that would feel even better.

OTooleMichael · 2024-03-23T12:15:31Z

@jtcohen6 and @MichelleArk, thank you both sincerely for taking the time to review the PR, and I apologise for the lingering issue with one of the unit tests. Rest assured, I'll address it promptly.

In response to your queries:

Overall, I believe the PR aligns with our objectives and should be merged. It seems there might be some slight misinterpretation or oversight regarding its purpose. :) This PR essentially enables users to utilise an Emulator; the specifics of the emulator's functionality aren't within DBT's purview. To draw an analogy, consider if BigQuery were to malfunction, like running its SUM() function incorrectly. As another example, which is currently possible, a DBT user employing Postgres could opt for an in-memory PG emulator by changing the host, irrespective of its full functionality (which is often limited). If the emulator fails to replicate BigQuery's behaviour, it's an issue for the emulator's developers to address.

Moreover, there are additional use cases to consider. My primary motivation was facilitating an in-house emulator and proxy setup. Both of these are made achievable with minimal effort through this PR. In my view, the internals of non-DBT server elements don't fall under the direct responsibility of the DBT team.

I've developed a parser that validates SQL post-Jinja processing, effectively identifying references and syntax errors without direct access to BigQuery or data movement. This approach significantly expedites CI/CD processes, often obviating the need for a live connection. At the moment the Snowflake DBT connector's endpoint override feature is serving as workaround - the CI profile is set as Snowflake dialect and the endpoint pointed at the emulator server, then the emulator does the extra work of translating the DBT queries back from the Snowflake dialect to BQ before starting its true validation work.

In a previous project involving Snowflake, I employed similar techniques using in-house SQLFluff rules for security and design linting post-Jinja processing.

Additionally, various proxying use cases emerge, where a server intermediates requests to and from BigQuery, implementing checks for deprecation warnings, security, permissions, and monitoring. For instance:

Dynamically migrating column references in-flight based on business rules.
Enforcing security and permissions beyond BigQuery's capabilities.
Implementing in-flight query approvals or just-in-time decryption designs.
Enforcing query pattern checks for cost or security reasons.

In essence, there are numerous reasons to redirect queries to different endpoints, applicable across CI/CD, development, and production environments. Some are directly related to DBT, while others are broader system requirements. Simplifying wider CI setups by patching a single ENV variable (e.g., BQ_URL) for the entire system, including DBT, Airflow, etc., underscores the versatility and value of this PR.

I'm happy to hop on a call / go through more examples / code if needed

Allow Bigquery Emulator settings to be set

9622652

OTooleMichael requested a review from a team as a code owner November 10, 2023 15:57

OTooleMichael requested a review from McKnight-42 November 10, 2023 15:57

OTooleMichael mentioned this pull request Nov 10, 2023

[CT-1391] [Feature] Add support for BigQuery emulator #358

Open

3 tasks

Switch Connection ordering on api_enpoint

c13d264

cla-bot bot added the cla:yes label Nov 11, 2023

Merge branch 'main' into patch-1

49b848e

colin-rogers-dbt added the ok to test label Dec 4, 2023

Merge branch 'main' into patch-1

85cbcb7

OTooleMichael and others added 10 commits February 27, 2024 17:11

Basic test to lock in that "api_endpoint" is a credentials field that…

b2aa0df

… should be passed on to BQ

Example ApiEndpoint typo

03b6a6d

Merge branch 'main' into patch-1

03514b5

Merge branch 'main' into patch-1

4a853b3

Merge branch 'main' into patch-1

a374786

Merge branch 'main' into patch-1

326d6ae

Linting/Black bump

9478ae9

Correct call signature

a9ee098

Merge branch 'main' into patch-1

9cfb1a6

Merge branch 'main' into patch-1

ca5c885

OTooleMichael added 3 commits March 24, 2024 18:47

Break out client mock args for clarity

28009c9

Clear caching between tests

5d51658

Merge branch 'main' into patch-1

b5ec4a3

graciegoheen assigned MichelleArk and jtcohen6 Mar 26, 2024

OTooleMichael added 6 commits March 27, 2024 09:49

Merge branch 'main' into patch-1

7912f75

Merge branch 'main' into patch-1

c2b2b49

Merge branch 'main' into patch-1

08c033d

Merge branch 'main' into patch-1

bbc2826

Merge branch 'main' into patch-1

eea3241

Merge branch 'main' into patch-1

7673691

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Bigquery Emulator settings to be set #1017

Allow Bigquery Emulator settings to be set #1017

OTooleMichael commented Nov 10, 2023 •

edited by MichelleArk

cla-bot bot commented Nov 10, 2023

OTooleMichael commented Nov 10, 2023

orlevii commented Nov 14, 2023

OTooleMichael commented Nov 14, 2023

CyberHippo commented Jan 22, 2024

mesmacosta commented Feb 14, 2024

jtcohen6 commented Mar 22, 2024 •

edited

OTooleMichael commented Mar 23, 2024

Allow Bigquery Emulator settings to be set #1017

Are you sure you want to change the base?

Allow Bigquery Emulator settings to be set #1017

Conversation

OTooleMichael commented Nov 10, 2023 • edited by MichelleArk

Problem

Solution

Checklist

cla-bot bot commented Nov 10, 2023

OTooleMichael commented Nov 10, 2023

orlevii commented Nov 14, 2023

OTooleMichael commented Nov 14, 2023

CyberHippo commented Jan 22, 2024

mesmacosta commented Feb 14, 2024

jtcohen6 commented Mar 22, 2024 • edited

OTooleMichael commented Mar 23, 2024

OTooleMichael commented Nov 10, 2023 •

edited by MichelleArk

jtcohen6 commented Mar 22, 2024 •

edited