Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-365] [Feature] Add DBT support for adding Tag Templates to BigQuery tables and columns #595

Closed
3 tasks done
garsir opened this issue Mar 9, 2023 · 4 comments
Closed
3 tasks done
Assignees
Labels
enhancement New feature or request refinement Product or leadership input needed Stale

Comments

@garsir
Copy link
Contributor

garsir commented Mar 9, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

I would like the functionality to be able to add tag templates to BigQuery tables and columns to be used in Dataplex.

This functionality should be applied on the dbt side to somewhat similarly to how descriptions can currently be added in dbt to tables and columns.

I think they should be applied with a model to column level like this:

models:

  • name: tag_template_table
    columns:
    • name: field
      tag_template:
      • tag_template_name: projects//locations/eu/tagTemplates/template_test
        values:
        • example

and table level like this:

models:

  • name: tag_template_table
  • tag_template:
    - tag_template_name: projects//locations/eu/tagTemplates/template_test
    values:
    - example
    columns:
    • name: field

Tag templates are called from the datacatalog client rather than the bigquery client. This may mean a new datacatalog adapter will need to be added in addition to the BigQuery adapter.

Describe alternatives you've considered

No response

Who will this benefit?

This feature will allow more BigQuery metadata to be configured inside dbt. This will allow users to more comprehensively tag their datasets to create better data dictionaries in dataplex or another tool.

Are you interested in contributing this feature?

yes

Anything else?

No response

@garsir garsir added enhancement New feature or request triage labels Mar 9, 2023
@github-actions github-actions bot changed the title [Feature] Add DBT support for adding Tag Templates to BigQuery tables and columns [ADAP-365] [Feature] Add DBT support for adding Tag Templates to BigQuery tables and columns Mar 9, 2023
@Fleid
Copy link
Contributor

Fleid commented Mar 13, 2023

Hi @garsir, I see this feature is marked as preview in the doc, do you know if that's something recent / still being worked on?

I like it on principle. There's a python library, so it should be straightforward to import.
But I don't see that as a priority and there is already a lot we need to catch up on here.

I'll mark the issue as help_wanted, but we need to hash out the details of the ergonomics before starting work.
I'm not sure I understand the full life cycle of these tags.

@jtcohen6 is that something that should be surfaced on contracts rather than model?

@Fleid Fleid added refinement Product or leadership input needed and removed triage labels Mar 13, 2023
@jonnydford
Copy link

@Fleid
The bit that's marked as an Overview is preview, but Tag Templates (and associating Tags with datasets/tables/columns) is currently GA.

Overviews and Data Stewards aren't currently available in the API, only editable on the GCP Console, so Tag Templates is the only addition we can currently make.

There's a few examples of how these could be used which may help the implementation.

Firstly, on a dataset or table you could tag it with some Data Governance metadata (data owner, data product, has_pii, expiry date, etc).
Or maybe on a table you'd want to set some Data Freshness metadata (SLAs, how many times a day the data should be updated, last updated timestamp).
Or again you may want to add a Source description to a dataset or table (source="database", ingested_by="datastream",ingestion_type="continuous").

For column level the above could also be true, but you could add things like Data Quality.
For each column you can add if it's been deduped, if it's the primary key, number of nulls, any description for it there's any known data quality issues, etc.

As you can see from the above some Tag Template entries will be defined statically (or at least infrequently changed by the DBT project owner) and some could be something you'd want DBT to update for you.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 11, 2023
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refinement Product or leadership input needed Stale
Projects
None yet
Development

No branches or pull requests

3 participants