Allow BigQuery to default on project name #2908

max-sixty · 2020-11-24T00:56:54Z

resolves #2828

Description

As discussed in the issue.

I couldn't quite get my head around the tests; seems like there's lots of nesting there — what's the easiest way to test this?

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

jtcohen6

Nice start @max-sixty!

I ran into an issue testing this out locally.

what's the easiest way to test this?

Fair question. We don't use oauth for our integration tests, since both CircleCI + Azure DevOps connect via service account. I'll think about if we have any way of mocking this.

jtcohen6 · 2020-11-24T18:00:13Z

plugins/bigquery/dbt/adapters/bigquery/connections.py

+    def get_bigquery_defaults(cls):
+        """ Returns (credentials, project_id) """
+        # Cached, because the underlying implementation shells out.
+        return google.auth.default(scopes=cls.SCOPE)


When I test this locally, cls.get_bigquery_defaults() seems to return

(<google.oauth2.credentials.Credentials object at 0x112630ca0>, None)

resulting in the database remaining None below, and returning the following error from the google cloud client:

OSError: Project was not passed and could not be determined from the environment.

I don't think this is an issue on my end, but curious to hear what might be different, if you've managed to get this working locally.

What auth are you using? I'm using ADC — i.e. running gcloud auth login --update-adc.

What do you get from:

In [1]: from google.auth import default In [2]: default() Out[2]: (<google.oauth2.credentials.Credentials at 0x10a19e850>, '{foo_project}')

Ah ok, this was definitely on me:

In [1]: from google.auth import default In [2]: default() Out[2]: (<google.oauth2.credentials.Credentials at 0x10a19e850>, None)

I originally set up local gcloud oauth a while ago, so I updated the auth as you recommended, and the second arg now returns my default project.

Even so, I'm still running into this issue when dbt tries to get information about the None database by running list_None:

google.api_core.exceptions.BadRequest: 400 GET https://bigquery.googleapis.com/bigquery/v2/projects/None/datasets?maxResults=10000: Invalid project ID 'None'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.

I think we may need to fill in the missing database/project earlier on—ideally as a __post_init__ on BigQueryCredentials—because dbt needs the database value in more places than just its connection client. This would also be relevant in case users want to make use of the Jinja context variable target.database/target.project.

Great, good point. I pushed a change. Generally I would have that sort of function at the module level; but put them as methods as it seems to be fairly class-focused atm. Let me know which you prefer.

What's list_None? What's the test you're running locally to test this?

One of the first things dbt does at the start of a compile/run is grab metadata to populate its relational cache. It will log something to the effect of:

Acquiring new bigquery connection "list_[project_name]".

This was showing up for me as:

Acquiring new bigquery connection "list_None".

But now it looks good! target.database/target.project are also working as expected

Great — thanks for checking @jtcohen6 !

jtcohen6

This is looking great! Could you add a changelog entry, and add yourself to the list of contributors?

Looking at #2805, we added some unit tests to mock the BQ connection and ensure that the types and required fields all lined up. We might do well to add the same here, to ensure that dbt is happy with a profile that's missing a database/project.

I don't think we have a way to integration-test this, given our current CI setup. Although it's hardly a substitute for automated testing, I'll start using this default-project profile setup for my local connections to BigQuery.

Also — though I appreciate that you separated out the one-line fix in #2907 — I'd be fine with your pulling that in here, since it more or less touches the same code.

jtcohen6 · 2020-12-01T02:58:01Z

plugins/bigquery/dbt/adapters/bigquery/connections.py

+    def get_bigquery_defaults(cls):
+        """ Returns (credentials, project_id) """
+        # Cached, because the underlying implementation shells out.
+        return google.auth.default(scopes=cls.SCOPE)


One of the first things dbt does at the start of a compile/run is grab metadata to populate its relational cache. It will log something to the effect of:

Acquiring new bigquery connection "list_[project_name]".

This was showing up for me as:

Acquiring new bigquery connection "list_None".

But now it looks good! target.database/target.project are also working as expected

jtcohen6 · 2020-12-01T03:00:18Z

core/dbt/contracts/connection.py

-    database: str
+    # Most DBs have this as required, but BigQuery is Optional, and mypy
+    # doesn't seem to allow overriding the type in `BigQueryCredentials`
+    database: Optional[str]


We also set database to Optional[str] in dbt-spark, which inherits/replaces the base Credentials dataclass, too. I wonder why mypy doesn't mind it there

Great point, thanks @jtcohen6 . I think I'm confusing errors — the error it issued was actually Attributes without a default cannot follow attributes with one. Let's try the latest push

Also does repo actually run mypy?
I see a lot of type annotations in the code, but also a lot of errors when running mypy? Mostly in the test path

Yes, we run mypy, but only on the core/dbt path:
https://github.com/fishtown-analytics/dbt/blob/5ba5271da99bc1ef7fbad2f7d0b45634087300fa/tox.ini#L15

FWIW — if these were in the mypy configs explicitly, then mypy wouldn't raise errors in an editor

jtcohen6 · 2020-12-01T03:04:36Z

plugins/bigquery/dbt/adapters/bigquery/connections.py

+        """ Returns (credentials, project_id) """
+        # This method is copied from ` BigQueryConnectionManager`, because it's
+        # required in both classes.
+        # We could move this & the scopes to the module level.


Generally I would have that sort of function at the module level; but put them as methods as it seems to be fairly class-focused atm. Let me know which you prefer.

Good point. @gshank @kwigley does either of you have a preference here?

I think it makes sense to move this to module level, there's no state involved and it's shared 👍 go for it!

looks like there's just a few small pep8 errors:

plugins/bigquery/dbt/adapters/bigquery/connections.py:48:1: E302 expected 2 blank lines, found 1 plugins/bigquery/dbt/adapters/bigquery/connections.py:50:4: W291 trailing whitespace plugins/bigquery/dbt/adapters/bigquery/connections.py:52:1: W293 blank line contains whitespace plugins/bigquery/dbt/adapters/bigquery/connections.py:53:52: W291 trailing whitespace

plugins/bigquery/dbt/adapters/bigquery/connections.py

Co-authored-by: Kyle Wigley <kwigley44@gmail.com>

max-sixty · 2020-12-02T19:34:46Z

Is this failure a flake or causal? https://app.circleci.com/pipelines/github/fishtown-analytics/dbt/1600/workflows/12269ee7-f010-40a0-bcb5-ba2a957d19fe/jobs/33567

jtcohen6 · 2020-12-02T20:11:15Z

Is this failure a flake or causal? https://app.circleci.com/pipelines/github/fishtown-analytics/dbt/1600/workflows/12269ee7-f010-40a0-bcb5-ba2a957d19fe/jobs/33567

Looks like this was just a fluke. Re-running from failed now

jtcohen6

Nice work @max-sixty! Thanks for delving into our testing flows as well. Just needs a changelog entry and then this is ready to ship

max-sixty · 2020-12-02T22:58:43Z

Great, done!

jtcohen6

very cool! thanks for the contribution :)

max-sixty · 2020-12-03T20:14:04Z

Cheers for the guidance @jtcohen6 !

cla-bot bot added the cla:yes label Nov 24, 2020

max-sixty added 3 commits November 23, 2020 16:58

Allow BigQuery to default on project name

0810f93

Definet database exclusively in contracts/connection.py

b53b437

flake8 seems to sometimes be applied

66bfba2

jtcohen6 reviewed Nov 24, 2020

View reviewed changes

_

62899ef

jtcohen6 reviewed Dec 1, 2020

View reviewed changes

max-sixty added 3 commits November 30, 2020 19:30

_

a1aa2f8

Add test config

1e6f272

Move method to module func

8c0e84d

kwigley reviewed Dec 2, 2020

View reviewed changes

plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved

max-sixty and others added 2 commits December 2, 2020 10:41

Update plugins/bigquery/dbt/adapters/bigquery/connections.py

90a550e

Co-authored-by: Kyle Wigley <kwigley44@gmail.com>

Formatting

08aed63

max-sixty mentioned this pull request Dec 2, 2020

Static checks aren't defined in configs #2926

Closed

jtcohen6 reviewed Dec 2, 2020

View reviewed changes

Merge branch 'dev/kiyoshi-kuromiya' into bq-default-project

c1091ed

max-sixty and others added 2 commits December 2, 2020 14:59

changelog

5b98127

tiny changelog fixup

9d90e0c

jtcohen6 approved these changes Dec 3, 2020

View reviewed changes

jtcohen6 merged commit e7c2422 into dbt-labs:dev/kiyoshi-kuromiya Dec 3, 2020

max-sixty deleted the bq-default-project branch December 3, 2020 20:14

walshie4 mentioned this pull request Mar 17, 2021

Can't access BigQuery tables whose original sources are spreadsheets on Google Drive with the oauth method #3040

Closed

5 tasks

jtcohen6 mentioned this pull request Apr 1, 2021

BigQuery project reversion in 0.19.1 #3218

Closed

5 tasks

jtcohen6 mentioned this pull request Oct 21, 2021

Raise DbtProfileError on invalid default credential dbt-labs/dbt-bigquery#40

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow BigQuery to default on project name #2908

Allow BigQuery to default on project name #2908

max-sixty commented Nov 24, 2020 •

edited

jtcohen6 left a comment •

edited

jtcohen6 Nov 24, 2020

max-sixty Nov 30, 2020

jtcohen6 Nov 30, 2020

max-sixty Dec 1, 2020

max-sixty Dec 1, 2020

jtcohen6 Dec 1, 2020

max-sixty Dec 1, 2020

jtcohen6 left a comment

jtcohen6 Dec 1, 2020

jtcohen6 Dec 1, 2020

max-sixty Dec 1, 2020

max-sixty Dec 1, 2020

jtcohen6 Dec 1, 2020

max-sixty Dec 2, 2020

max-sixty Dec 2, 2020

jtcohen6 Dec 1, 2020

kwigley Dec 1, 2020 •

edited

max-sixty Dec 2, 2020

jtcohen6 Dec 2, 2020

max-sixty commented Dec 2, 2020

jtcohen6 commented Dec 2, 2020

jtcohen6 left a comment

max-sixty commented Dec 2, 2020

jtcohen6 left a comment

max-sixty commented Dec 3, 2020

Allow BigQuery to default on project name #2908

Allow BigQuery to default on project name #2908

Conversation

max-sixty commented Nov 24, 2020 • edited

Description

Checklist

jtcohen6 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kwigley Dec 1, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-sixty commented Dec 2, 2020

jtcohen6 commented Dec 2, 2020

jtcohen6 left a comment

Choose a reason for hiding this comment

max-sixty commented Dec 2, 2020

jtcohen6 left a comment

Choose a reason for hiding this comment

max-sixty commented Dec 3, 2020

max-sixty commented Nov 24, 2020 •

edited

jtcohen6 left a comment •

edited

kwigley Dec 1, 2020 •

edited