Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1391] [Feature] Add support for BigQuery emulator #358

Open
3 tasks done
bendiktv2 opened this issue Oct 25, 2022 · 11 comments · May be fixed by #1017
Open
3 tasks done

[CT-1391] [Feature] Add support for BigQuery emulator #358

bendiktv2 opened this issue Oct 25, 2022 · 11 comments · May be fixed by #1017
Labels
enhancement New feature or request help_wanted Extra attention is needed

Comments

@bendiktv2
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Add support for profile-configuration and connection to a custom BigQuery endpoint, to allow for emulating BigQuery.

https://docs.getdbt.com/reference/warehouse-setups/bigquery-setup

In addition to "project", "dataset" etc in the profiles.yml, one could specify a "endpoint"-config.

One could then run a BigQuery-emulator (such as https://github.com/goccy/bigquery-emulator), instead of having to run against the real BigQuery.

Note that this is already supported by the bq CLI tool.

Describe alternatives you've considered

No response

Who will this benefit?

This will benefit teams that want to run DBT unit-tests using https://github.com/mjirv/dbt-datamocktool in CI, to allow multiple branches to run at the same time, without conflicting.

Are you interested in contributing this feature?

No response

Anything else?

No response

@bendiktv2 bendiktv2 added enhancement New feature or request triage labels Oct 25, 2022
@github-actions github-actions bot changed the title [Feature] Add support for BigQuery emulator [CT-1391] [Feature] Add support for BigQuery emulator Oct 25, 2022
@jtcohen6 jtcohen6 removed the triage label Oct 30, 2022
@jtcohen6
Copy link
Contributor

@bendiktv2 Thanks for opening!

https://github.com/goccy/bigquery-emulator

This is cool!! I didn't know about GCP emulators at all. Have you been using this for some time? Any gotchas worth calling out?

Note that this is already supported by the bq CLI tool.

I tried looking quickly for docs on this; is it https://cloud.google.com/sdk/gcloud/reference/alpha/emulators?

I have no strong opposition to supporting this — it makes a ton of sense for unit testing, and it's possible because ZetaSQL being published and open source — I'd just want to see us doing some light diligence and internal education before documenting this as an officially supported plugin feature. I expect we'll be getting questions on it from lots of interested folks!

@jtcohen6 jtcohen6 added help_wanted Extra attention is needed awaiting_response labels Oct 30, 2022
@bendiktv2
Copy link
Author

We have not been using it yet, but have been considering it for other purposes (like integration-testing a program that loads data into bigquery), and wanted to check out the feasibility of using it for DBT to reduce the time and complexity in our build-pipeline.

As for the bq-tool, you can see it in use in the goccy/bigquery-emulator Readme: https://github.com/goccy/bigquery-emulator#2-call-endpoint-from-bq-client:

bq --api http://0.0.0.0:9050 query --project_id=test "SELECT * FROM dataset1.table_a WHERE id = 1"

@ustato
Copy link

ustato commented Nov 9, 2022

i’m working on it!
https://github.com/ustato/dbt-bigquery

You can confirm that the command dbt run works well:
https://github.com/ustato/dbt-bigquery-template

Don’t forget to run this command beforehand:

cd dbt-bigquery-template
poetry install

docker run --rm -p 9050:9050 -v "$(pwd)/emulator.yml:/data/emulator.yml" -it ghcr.io/goccy/bigquery-emulator:latest --project=local --data-from-yaml=/data/emulator.yml

You can this:

cd dbt
poetry run dbt --debug run --project-dir . --profiles-dir .

However, I am now wondering how to address this error...
I think I can reproduce it by deleting all the parts about anonymous in emulator.yml .

404 GET http://0.0.0.0:9050/bigquery/v2/projects/local/datasets/anonymous/tables/anonymous?prettyPrint=false: dataset anonymous is not found

@andrewegel
Copy link

Hi there - We too are looking at leveraging dbt-bigquery with the bigquery-emulator - Merely applying the patch ustato@e9a65da on top of 1.4.1 and updating our dbt profiles made this work essentially.

The only issue we ran into was missing projects, datasets, and not having sufficient seed data in the emulator. @ustato My guess is you haven't told the bq-emulatror about the anonymous dataset in emulator.yml that your dbt model is referencing - If you share your dbt model code & emulator.yml I may be able to tell.

@Dileep17
Copy link

Hi, Could you please let me know if big-query emulator is supported already?
As @jtcohen6 mentioned this would enable unit testing without connecting to big-query.

@Dileep17
Copy link

@andrewegel Glad to hear you got dbt-bigquery working with bigquery-emulator. Im trying to get the integration working. I applied patch ustato@e9a65da on dbt-bigquery 1.5.0b3. I'm facing below issues.

  1. When materialization is view, create or replace view .... query is sent from dbt-bigquery to bigquery-emulator and process hangs. bigquery-emulator becomes unresponsive!

  2. When materialization is table, dbt run fails with 'NoneType' object has no attribute 'path' error. Though the expected table is created with correct data, dbt run is failing !

could you please let me know if you faced above and how they are resolved?
If you have public implementation on this integration, please share the link. Thank you in advance.

@mgguo
Copy link

mgguo commented May 23, 2023

@Dileep17 I ran into the same error as you mentioned - 'NoneType' object has no attribute 'path', would you mind sharing how to solve it if you've figured out how to solve it. Thanks in advance!

@Dileep17
Copy link

@mgguo couldn't get that working!

@mgguo
Copy link

mgguo commented May 24, 2023

@Dileep17 Yeah, I think it's because the emulator API didn't return the expected response as the regular gcp bigquery API. i.e. the destination property here is not set => https://github.com/ustato/dbt-bigquery/blob/9dca9ebcd99f199b685bb35bac7eb32ff8e6624b/dbt/adapters/bigquery/connections.py#L497 which seems having caused the NoneType error.

@ustato have you got any chance to look back at this patch? were you able to get it working? thanks!

@andrewegel
Copy link

andrewegel commented May 24, 2023

I did a little sleuthing through code, and from what I can see is from here:

https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/job/base.py#L693-L702

The (part of) response back from BQ-the-real-service shows:

{
        'configuration': {
                'jobType': 'QUERY',
                'query': {
                        'query': 'create or replace table local.foo.bar AS (...)'
                        'destinationTable': {
                                'projectId': 'local',
                                'datasetId': 'foo',
                                'tableId': 'bar'
                        },
                        'priority': 'INTERACTIVE',
                        'useLegacySql': False
                }
        },

Where-as with the emulator the entire configuration.query.destinationTable block is absent:

{
        'configuration': {
                'jobType': 'QUERY',
                'query': {
                        'query': 'create or replace table local.foo.bar AS (...)',
                        'priority': 'INTERACTIVE',
                        'useLegacySql': False
                },
                'labels': {
                        'dbt_invocation_id': 'aa33cc6a-b2ee-4f07-b0b5-0851a473e30f'
                },
        },

This is likely because BQ-the-real-service supports the create or replace table local.foo.bar AS (...) syntax and populates the destinationTable fields in it's response to the clients - DBT-bigquery uses the create or replace table local.foo.bar AS (...) syntax to create new models as opposed to sending the destinationTable (Eg the bq command: bq query --destination_table local.foo.bar 'select * from baz.biz.buzz') with a normal select.

This could either be a feature request in dbt-bigquery to use this pattern (Specifying destinationTable in the API call) over create or replace table local.foo.bar AS (...) syntax, or this could be a bigquery-emulator feature request to have the emulator populate destinationTable in the response when it detects a create or replace table local.foo.bar AS (...) syntax.

Short term work around could be to patch https://github.com/ustato/dbt-bigquery/blob/9dca9ebcd99f199b685bb35bac7eb32ff8e6624b/dbt/adapters/bigquery/connections.py#L497

            if query_job.destination:
                num_rows = client.get_table(query_job.destination).num_rows
            else: 
                num_rows = 0

As this purely seems to be just for reporting rows on the console, as the sample DBT project works with this "patch":

17:34:26.439698 [info ] [Thread-1  ]: 1 of 1 OK created sql table model test_interface.sample ........................ [SELECT (0.0 rows, None processed) in 0.17s]

I opened an issue in that project goccy/bigquery-emulator#197 on the behavior difference.

@OTooleMichael
Copy link

I made a PR for this which is working fine, and is less of a change that PR above.
#1017

Allowing the setting of api_endpoint is also useful for more than just the emulator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants