Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-501] [Feature] Leverage Bigquery Comments at Model level #685

Closed
3 tasks done
matt-winkler opened this issue Apr 27, 2023 · 14 comments
Closed
3 tasks done

[ADAP-501] [Feature] Leverage Bigquery Comments at Model level #685

matt-winkler opened this issue Apr 27, 2023 · 14 comments
Assignees
Labels
enhancement New feature or request refinement Product or leadership input needed Stale

Comments

@matt-winkler
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Related to 3145, current functionality is documented here

Currently, users can specify query comments OOB using a config in the dbt_project.yml:

query-comment:
  job-label: True/False
  comment: 'my comment'
  append: True/False

It's also straightforward to use a custom macro in the comment field which can read arbitrary configs on each model, an update the comment string accordingly.

# in dbt_project.yml
query-comment:
  job-label: True/False
  comment: "{{ query_comment(node)}} "
  append: True/False

# in some_model.sql
{{config(query_labels={"k1": "v1", "k2": "v2"})}}

# in macros/query_comment.sql
{% macro query_comment(node) %}
...
{% for key, value in node.config.query_label.items() %}
  # appends to built in comments for the node.
{% endfor %}
...
{% endmacro %}

This pattern is highly desirable from a cost-tracking perspective with bigquery. The configs here seem fairly straightforward to implement, so we should add this to dbt-core instead of asking users to do it. The goal is to enable users to only specify {{config(query_labels={"k1": "v1", "k2": "v2"})}} instead of having to override the built in query_comment macro.

This should also work on tests.

Describe alternatives you've considered

Continue forcing users to override the built-in.

Who will this benefit?

All dbt-bigquery users.

Are you interested in contributing this feature?

Yes

Anything else?

No response

@matt-winkler matt-winkler added enhancement New feature or request triage labels Apr 27, 2023
@github-actions github-actions bot changed the title [Feature] Leverage Bigquery Comments at Model level [ADAP-501] [Feature] Leverage Bigquery Comments at Model level Apr 27, 2023
@algarbar
Copy link

+1

Present behavior is not consistent across adapters; it should be. Also, the diction of "query comment" is confusing for folks looking for query labels (not comments). Hence, following this design makes the capability more accessible.

As mentioned by the author, the scope of change should be limited to the config override. Thus, no job-label flag override should be required within dbt_project.yml. Inclusion of query_label denotes intent for a model.

@dbeatty10
Copy link
Contributor

Nice proposal and insights @matt-winkler and @algarbar 🤩

How do you see this as similar to or different from the query_tag model config in dbt-snowflake?

Present behavior is not consistent across adapters; it should be

Keen to hear more @algarbar -- what do you think we should do to make it more consistent?

@algarbar
Copy link

@dbeatty10 - Spot on; it is quite similar.

Key distinctions result due to data platform design. Snowflake's query_tag is a string limited to 2000 characters. Often this is input in JSON format for variant casting. BigQuery enables query_labels as {K,V} pairs accessed through an array of structs (E.G. see JOBS_BY_USER). Thus a dbt Snowflake's model would implement config(..., query_tag='{"k1":"v1", "k2":"v2"}'). dbt BigQuery's equivalent of config(..., query_labels={"k1:"v1", "k2:"v2"}). (Note Snowflake's string encapsulation)

RE adapter consistency: dbt-snowflake presently implements Snowflake's query_tag as a model config override option. Within dbt-bigquery, the functional equivalent of BigQuery's query_labels, is not present. Adding that option in the model config (in some manner) would bring the consistency alluded in that comment.

@dbeatty10
Copy link
Contributor

Thanks for this info @algarbar !

Refining the behavior

@matt-winkler and @algarbar:

Would you see the implementation of this feature as the following?

  • Enable a new model-level config named query_labels
  • Accept any of the following:
    • key/value pairs (like {"k1": "v1", "k2": "v2", ...}), or
    • a list of keys (like ["k1", "k2", ...], or
    • a single key (like "k1")
  • When specified, these labels are applied to the materialized table/view and will show up in the JOBS_BY_USER view
  • If (and only if) job-label=True, then these are also rendered within a comment in the SQL that materializes the table/view

@algarbar
Copy link

algarbar commented May 1, 2023

Thank you @dbeatty10 -

Enable a new model-level config named query_labels

  • Correct

Accept any of the following:

  • key/value pairs (like {"k1": "v1", "k2": "v2", ...}),
  • Correct. Another alternative for community consideration is an array of JSON pairs mimicking BQ's design. Example:
    config=(..., query_labels= [{"k1":"v1"},{"k2":"v2"}])
    

When specified, these labels are applied to the materialized table/view and will show up in the JOBS_BY_USER view

  • Small correction. These labels would be applied to the queries NOT the table/view. Table/view labels are already available under the dbt labels option.

If (and only if) job-label=True, then these are also rendered within a comment in the SQL that materializes the table/view

  • There is no relationship of job-label to text injection in queries as an inline comment. That feature is already available in dbt. But, it also is not available as a config override without overloading the query_comment macro in dbt_profiles.yml.
  • It's a great idea though, if you're thinking to also club in this feature. They are highly related.
  • Your referenced link states:
  • If query-comment.job-label is set to true, dbt will include the query comment items, if a dictionary, or the comment string, as job labels on the query it executes.

  • The important bit here^^ is that it's still just changing labels, not comments.

@dbeatty10
Copy link
Contributor

🙏 Thanks for all the helpful responses @algarbar !

Interested to hear your thoughts @matt-winkler, especially as it relates to query comments since that seemed to be your motivating use case.

@matt-winkler
Copy link
Contributor Author

Hey @dbeatty10 I think it makes good sense to have an interface on the BQ side (without requiring the end user to override / set custom macros), such that the experience is similar to snowflake. The function of this in BQ is similar enough / the same as on Snowflake that we should just do it.

@dbeatty10 dbeatty10 added the refinement Product or leadership input needed label May 3, 2023
@dbeatty10 dbeatty10 self-assigned this May 3, 2023
@dbeatty10
Copy link
Contributor

We need to do a little more refining on this before we can specify the acceptance criteria.

@matt-winkler does the following sound like your overall intent and key exit criteria?

The overall intent:

  • Enable a model-level query_labels config that works in concert with the pre-existing query-comment behavior

Acceptance criteria:

  • Enable a new model-level config named query_labels
    • e.g.
    {{ config(query_labels={"k1": "v1", "k2": "v2"}) }}
    
  • When query-comment.job-label is True, then convert any query_labels config into a JSON comment within the query
    • e.g.
    /* {"app": "dbt", "dbt_version": "0.15.0rc2", "profile_name": "debug",
    "target_name": "dev", "node_id": "model.dbt2.my_model", "query_labels": {"k1": "v1", "k2": "v2"}} */

@matt-winkler
Copy link
Contributor Author

@dbeatty10 yes, and also we should make sure that this behavior supports analyzing jobs by label in the jobs_by_user view.

@github-actions github-actions bot added the triage label May 5, 2023
@matt-winkler
Copy link
Contributor Author

matt-winkler commented May 9, 2023

@dbeatty10 I think the above is what's needed from an end user perspective. I'd propose the following as additional AC:

  • Default query_comment macro includes logic to add custom query_labels config to the comment
  • Validation of config'd comment
  • integration tests

Any reason not to go ahead and start a PR?

@algarbar
Copy link

Consider: PR379 proposes query_tag for consistency to Snowflake. We could do the same here.

PR to also include query_comment (good suggestion above from @dbeatty10).

Any other consideration?

@matt-winkler
Copy link
Contributor Author

@dbeatty10 Would love your thoughts on requirements / where we go from here.

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 27, 2023
Copy link
Contributor

github-actions bot commented Dec 4, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 4, 2023
@dataders dataders removed the triage label Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refinement Product or leadership input needed Stale
Projects
None yet
Development

No branches or pull requests

4 participants