Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test for query-comment as BigQuery job label #28

Closed
jtcohen6 opened this issue Mar 22, 2021 · 6 comments
Closed

Integration test for query-comment as BigQuery job label #28

jtcohen6 opened this issue Mar 22, 2021 · 6 comments
Labels
enhancement New feature or request good_first_issue Good for newcomers Stale

Comments

@jtcohen6
Copy link
Contributor

Describe the feature

dbt-labs/dbt-core#3145 did something really cool: By setting in dbt_project.yml:

query-comment:
  job-label: true

dbt will supply the key-value pairs from a query comment dictionary (or the full value as one key-value pair, if the query comment is a non-dict string) to BigQuery as a job label for the queries representing node execution:

Screen Shot 2021-03-04 at 12 33 39 PM

In order to really test that this is working, and avoid regression, we should add an integration test that:

  • Runs a model with the default query-comment and checks that the labels appropriately registered, by querying INFORMATION_SCHEMA.JOBS_BY_USER. (We'd want that query to filter on user_email and creation_time.)
  • Runs a model with a stringified query-comment, checks the same
  • Runs a model with a stringified and too-long (>128 char) comment, ensures an error is returned
  • Runs a model with job-label: false and ensures that only the default job label (invocation_id) is applied

Describe alternatives you've considered

Not integration testing this.

dbt-labs/dbt-core#3145 added unit testing for the python dict/string handling functions, so the nuts and bolts are in place. In order for this to be a feature that dbt-bigquery users can feel comfortable relying on, we need to ensure that it will continue working in future releases, which may touch query-comment functionality in indirect ways.

@anthonymichaelclark
Copy link

@jtcohen6 I'd be willing to take a crack at this!

Can I ask a question about how you'd implement these tests? I started doodling around, hit a bit of a block around running the query you suggested. This function stub in my test...

class TestQueryComment(DBTIntegrationTest):
...
    def get_job_labels(self):
       # TODO -- change query, just making sure it runs first
        query = "SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT" 
        vals = self.run_sql_bigquery(query, fetch="all")
        return vals

...raises an AttributeError at line 313 here

https://github.com/fishtown-analytics/dbt/blob/e775f2b38e52e8b9f25f8ac98c942d22b2c8c55d/plugins/bigquery/dbt/adapters/bigquery/connections.py#L306-L316

Did a little digging and looks like the reason is that the query_header is None on BigQueryConnectionManager when you run a query using DBTIntegrationTest.run_sql. Did a little further digging and looks like query_header is actually set here

https://github.com/fishtown-analytics/dbt/blob/e775f2b38e52e8b9f25f8ac98c942d22b2c8c55d/core/dbt/parser/manifest.py#L135-L151

Which would explain why I can do a dbt run but can't run_sql -- we don't pass through a manifest in the second case. Any suggestions on how to proceed?

@jtcohen6
Copy link
Contributor Author

@anthonymichaelclark Glad to have you on board!

It doesn't surprise me to hear that the integration test run_sql_bigquery method runs without a manifest, and therefore without a query_comment. That should be okay, so long as you're just trying to grab the values put in JOBS_BY_PROJECT by the previous dbt run step.

I haven't had a chance to dive into the code here just yet, so just from quickly looking—I don't follow why line 313 would raise an AttributeError, if the condition in line 312 gives the all-clear?

@anthonymichaelclark
Copy link

anthonymichaelclark commented May 25, 2021

@jtcohen6 Ah, should've included that bit in my test stub -- I've set the project_config in the test class like so:

class TestQueryComment(DBTIntegrationTest):
...
    @property
    def project_config(self):
        return {
            'config-version': 2,
            'query-comment': {
                'job-label': True
            }
        }

So when I call run_sql, it's using that project_config, so the connection instance ends up having the query_comment and query_comment.job_label attributes from line 312 in connections.py, but not query_header.comment.query_comment

https://github.com/fishtown-analytics/dbt/blob/e775f2b38e52e8b9f25f8ac98c942d22b2c8c55d/plugins/bigquery/dbt/adapters/bigquery/connections.py#L306-L316

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jun 1, 2021

@anthonymichaelclark Ah... because query_header is defined for standard tasks that have access to a manifest, but not for the run_sql method that we define as a convenience during integration tests. That does feel needlessly confusing, but I suppose it's an artifact of how we've wired up the test suite.

I suppose one straightforward way of fixing this, enough to unblock the integration test without changing any functionality in practice, would be to update the conditional logic:

if (
    self.profile.query_comment and
    self.profile.query_comment.job_label and
    self.query_header.comment.query_comment
): 

That doesn't seem outlandish to me. What do you think?

@anthonymichaelclark
Copy link

@jtcohen6 Yeah that'd work! Wasn't sure if I should be looking for a "deeper" fix -- I'll do that^ and get on with writing the tests

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core Oct 12, 2021
@jtcohen6 jtcohen6 added enhancement New feature or request good_first_issue Good for newcomers labels Oct 12, 2021
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Good for newcomers Stale
Projects
None yet
Development

No branches or pull requests

2 participants