Include the snowflake query id in the run results output #7

jsnb-devoted · 2021-10-07T20:25:32Z

Describe the feature

As an analytics engineer I want a quick way to get the query id of a model run (or even test/seed/snapshot) so I can quickly see the query in the snowflake UI

Describe alternatives you've considered

We've implemented a macro that runs in the post-hook that will get the full url to the query and log it to console. This works great except that we have to call SELECT last_query_id() on every model run and more importantly the hook does not execute if the query fails

I acknowledge that the compiled SQL is available as an alternative but we run airflow on kubernetes so the artifacts are not readily available. We do persist the artifacts in s3 but I think it would be an improved experience to go from airflow log to snowflake ui in one click.

Additional context

I can see that the snowflake query id is being logged for failed queries if you enable debug mode but the output is far too verbose for this use case. I would want just the query url/id logged on every run so that a user can go from log => snowflake ui with the fewest steps possible.

Who will this benefit?

I think this would benefit all dbt-snowflake users that use the snowflake ui for debugging/performance monitoring.

Are you interested in contributing this feature?

I wouldn't mind contributing but I would need some guidance on what option is most amenable to dbt's design. I'm not sure if the plugins are meant to interact with the run results functionality in dbt core.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-10-22T16:56:40Z

Hey @jsnb-devoted, thanks for opening this awesome issue, and so sorry for the delay getting back to you!

Good news is, I think this is going to be quite straightforward to implement—a good first issue, even—and I'd welcome a contribution for it :)

I'm not sure if the plugins are meant to interact with the run results functionality in dbt core.

There's a carved-out place for just this sort of thing, in run_results.json, called AdapterResponse. Today, this includes things like the success/error code and number of rows_affected. As it turns out, it would be quite easy to add in query_id as well, since it's available right on the cursor that we use to execute queries.

We'll need to define a new subclass, SnowflakeAdapterReponse, in connections.py (similar to how dbt-bigquery does this):

@dataclass
class SnowflakeAdapterResponse(AdapterResponse):
    query_id: Optional[str] = None

Then, we can simply update the get_response method to return a SnowflakeAdapterResponse that includes query_id:

    @classmethod
    def get_response(cls, cursor) -> SnowflakeAdapterResponse:
        code = cursor.sqlstate

        if code is None:
            code = 'SUCCESS'

        return SnowflakeAdapterResponse(
            _message="{} {}".format(code, cursor.rowcount),
            rows_affected=cursor.rowcount,
            code=code,
            query_id=cursor._sfqid
        )

It doesn't look like we have automated tests to verify the contents of AdapterResponse today (since it varies by adapter, and can be anything). Ultimately, testing locally and verifying is probably enough.

One limitation to call out: AdapterResponse only records information about the last query run. In materializations with multiple queries—on Snowflake, that's delete+insert incremental models—we'll only get the query_id of the insert. We also won't get IDs for any hooks that run. That's a limitation across all dbt adapters, and something we need to think about in the future, when we have appetite to significantly rework materializations.

jsnb-devoted · 2021-10-22T20:27:14Z

@jtcohen6 thanks so much! I'll try to carve out some time to contribute. I took a quick look at the BigQuery implementation and it looks pretty straight forward.

I hear you re: the multiple query materializations. The macro we wrote to log the queries had the same limitation.

jsnb-devoted added the enhancement New feature or request label Oct 7, 2021

jtcohen6 added the triage label Oct 8, 2021

jsnb-devoted mentioned this issue Oct 18, 2021

[Feature] Log snowflake query id / url in the run results dbt-labs/dbt-core#4088

Closed

1 task

jtcohen6 added good_first_issue Good for newcomers and removed triage labels Oct 22, 2021

jtcohen6 mentioned this issue Oct 25, 2021

Configurable node status in log output dbt-labs/dbt-core#2580

Closed

joshuataylor mentioned this issue Nov 3, 2021

Add query_id to response data for run_result #40

Closed

4 tasks

jtcohen6 mentioned this issue Dec 16, 2021

Provide jobIds behind model creation in run_results.json dbt-labs/dbt-bigquery#92

Closed

jtcohen6 mentioned this issue Jan 17, 2022

[CT-45] [Feature] Expose Snowflake query id in case of dbt test failure #82

Closed

1 task

VersusFacit mentioned this issue Mar 14, 2022

Add query_id to adapter payload and tests. #109

Merged

4 tasks

VersusFacit closed this as completed in #109 Mar 14, 2022

akaul mentioned this issue Jan 9, 2024

[ADAP-1091] [Feature] Include Snowflake session_id in run results response #881

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include the snowflake query id in the run results output #7

Include the snowflake query id in the run results output #7

jsnb-devoted commented Oct 7, 2021 •

edited

jtcohen6 commented Oct 22, 2021

jsnb-devoted commented Oct 22, 2021

Include the snowflake query id in the run results output #7

Include the snowflake query id in the run results output #7

Comments

jsnb-devoted commented Oct 7, 2021 • edited

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

jtcohen6 commented Oct 22, 2021

jsnb-devoted commented Oct 22, 2021

jsnb-devoted commented Oct 7, 2021 •

edited