Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the snowflake query id in the run results output #7

Closed
jsnb-devoted opened this issue Oct 7, 2021 · 2 comments · Fixed by #109
Closed

Include the snowflake query id in the run results output #7

jsnb-devoted opened this issue Oct 7, 2021 · 2 comments · Fixed by #109
Labels
enhancement New feature or request good_first_issue Good for newcomers

Comments

@jsnb-devoted
Copy link

jsnb-devoted commented Oct 7, 2021

Describe the feature

As an analytics engineer I want a quick way to get the query id of a model run (or even test/seed/snapshot) so I can quickly see the query in the snowflake UI

Describe alternatives you've considered

We've implemented a macro that runs in the post-hook that will get the full url to the query and log it to console. This works great except that we have to call SELECT last_query_id() on every model run and more importantly the hook does not execute if the query fails

I acknowledge that the compiled SQL is available as an alternative but we run airflow on kubernetes so the artifacts are not readily available. We do persist the artifacts in s3 but I think it would be an improved experience to go from airflow log to snowflake ui in one click.

Additional context

I can see that the snowflake query id is being logged for failed queries if you enable debug mode but the output is far too verbose for this use case. I would want just the query url/id logged on every run so that a user can go from log => snowflake ui with the fewest steps possible.

Who will this benefit?

I think this would benefit all dbt-snowflake users that use the snowflake ui for debugging/performance monitoring.

Are you interested in contributing this feature?

I wouldn't mind contributing but I would need some guidance on what option is most amenable to dbt's design. I'm not sure if the plugins are meant to interact with the run results functionality in dbt core.

@jtcohen6
Copy link
Contributor

Hey @jsnb-devoted, thanks for opening this awesome issue, and so sorry for the delay getting back to you!

Good news is, I think this is going to be quite straightforward to implement—a good first issue, even—and I'd welcome a contribution for it :)

I'm not sure if the plugins are meant to interact with the run results functionality in dbt core.

There's a carved-out place for just this sort of thing, in run_results.json, called AdapterResponse. Today, this includes things like the success/error code and number of rows_affected. As it turns out, it would be quite easy to add in query_id as well, since it's available right on the cursor that we use to execute queries.

We'll need to define a new subclass, SnowflakeAdapterReponse, in connections.py (similar to how dbt-bigquery does this):

@dataclass
class SnowflakeAdapterResponse(AdapterResponse):
    query_id: Optional[str] = None

Then, we can simply update the get_response method to return a SnowflakeAdapterResponse that includes query_id:

    @classmethod
    def get_response(cls, cursor) -> SnowflakeAdapterResponse:
        code = cursor.sqlstate

        if code is None:
            code = 'SUCCESS'

        return SnowflakeAdapterResponse(
            _message="{} {}".format(code, cursor.rowcount),
            rows_affected=cursor.rowcount,
            code=code,
            query_id=cursor._sfqid
        )

It doesn't look like we have automated tests to verify the contents of AdapterResponse today (since it varies by adapter, and can be anything). Ultimately, testing locally and verifying is probably enough.

One limitation to call out: AdapterResponse only records information about the last query run. In materializations with multiple queries—on Snowflake, that's delete+insert incremental models—we'll only get the query_id of the insert. We also won't get IDs for any hooks that run. That's a limitation across all dbt adapters, and something we need to think about in the future, when we have appetite to significantly rework materializations.

@jtcohen6 jtcohen6 added good_first_issue Good for newcomers and removed triage labels Oct 22, 2021
@jsnb-devoted
Copy link
Author

@jtcohen6 thanks so much! I'll try to carve out some time to contribute. I took a quick look at the BigQuery implementation and it looks pretty straight forward.

I hear you re: the multiple query materializations. The macro we wrote to log the queries had the same limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Good for newcomers
Projects
None yet
2 participants