Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-162] Upgrade from the __tables__ construct to the information_schema.tables construct #113

Open
Fraser-Isbester opened this issue Feb 4, 2022 · 8 comments · May be fixed by #1213
Open

Comments

@Fraser-Isbester
Copy link

Describe the feature

The use of the [project_id].[dataset_id].tables has been deprecated in favor of [project_id].[dataset_id].information_schema.tables. This is relevant because it is not possible to access the former with metadata only permissions (it requires getData permissions). This would allow secure doc generation and schema-only tests to be run in a lower privilege environment.

Describe alternatives you've considered

  • You can grant the service account dataViewer access to all datasets managed by dbt and all datasets imported as sources. But this could be a massive over privilege.
  • You can individually grant dataViewer to these specific tables but that becomes a huge headache as you end up with a grant for every new dataset and always need to keep it up to date or you'll end up with a failure.

Additional context

None.

Who will this benefit?

Anyone in high-security or high-compliance environments who want to utilize external dbt actors for certain tasks (github actions, for instance.)

Are you interested in contributing this feature?

Sure!

@Fraser-Isbester Fraser-Isbester added enhancement New feature or request triage labels Feb 4, 2022
@github-actions github-actions bot changed the title Upgrade from the __tables__ construct to the information_schema.tables construct [CT-162] Upgrade from the __tables__ construct to the information_schema.tables construct Feb 4, 2022
@VersusFacit
Copy link
Contributor

@Fraser-Isbester Always enjoy a good security-conscious patch. Just to make sure I understand, is it correct that the roles in Bigquery have changed and using tables without information_schema means having to give user too much access permissions?

As for changing the code, I've got one reference:

        concat(project_id, '.', dataset_id, '.', table_id) as relation_id,

dbt/include/bigquery/macros/catalog.sql

We'd have to look at other references to these fields in the codebase to ensure we've got good coverage. We always like a test on these.

You still interested in contributing?

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 7, 2022
@hassan-mention-me
Copy link

PR to fix this issue - #238

@jtcohen6 jtcohen6 removed the Stale label Aug 8, 2022
@sungchun12
Copy link
Contributor

Current macro workaround for this: https://github.com/GeneralMills/gmi_common_dbt_utils/blob/main/macros/bq_catalog.sql

@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2023

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@dbeatty10
Copy link
Contributor

Re-opening since #897 is about the same thing.

@Luiscri
Copy link

Luiscri commented Aug 29, 2023

+1 on this. In my organization we have different teams being responsible for different data sources. I have hit the situation where I've been given access to some tables of a dataset instead of the full dataset, because there are other tables on it which I'm not meant to access. When I try to build the docs for my dbt project, I get an error because I don't have getData permission on the dataset and I can't access the __TABLES__ deprecated table. If INFORMATION_SCHEMA was used instead, this won't be a problem because I could be given the metadataViewer role on the dataset without compromising the data contained on its tables.

@nathangriffiths-cdx
Copy link

nathangriffiths-cdx commented Oct 10, 2023

We have also just run into this issue - our Github Actions service account has the "BigQuery Metadata Viewer" role but dbt docs generate still fails with permissions errors - apparently due to the references to __TABLES__ instead of INFORMATION_SCHEMA.TABLES.

@mikealfare mikealfare added tech_debt and removed enhancement New feature or request labels Feb 13, 2024
@mikealfare mikealfare linked a pull request May 1, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants