Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-704] [CT-703] [Feature] Display the Query ID in the printed logs when dbt runs #165

Closed
1 task done
mpavillet opened this issue May 31, 2022 · 4 comments
Closed
1 task done
Labels
enhancement New feature or request Stale

Comments

@mpavillet
Copy link

Is this your first time opening an issue?

Describe the Feature

I have been hunting down some performance issues recently and was having a hard time finding the query related to my model.
It would be extremely useful to display the query_id in the printed logs:
image

Describe alternatives you've considered

I am using Snowflake and look for queries fired by DBT using the search history (or query history tables)

Who will this benefit?

Any dbt user who would like to get more insights into the query fired by dbt.
It depends on the database but will usually give useful information such as the explain plan.
This helps anyone who needs to investigate what's happening in the database.

Are you interested in contributing this feature?

I'd need help on where to start but happy to help!

Anything else?

https://getdbt.slack.com/archives/C2JRTGLLS/p1653515842370159

@mpavillet mpavillet added enhancement New feature or request triage labels May 31, 2022
@github-actions github-actions bot changed the title [Feature] Display the Query ID in the printed logs when dbt runs [CT-703] [Feature] Display the Query ID in the printed logs when dbt runs May 31, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 1, 2022

@mpavillet Thanks for opening! I think this issue is most relevant to the dbt-snowflake adapter, so I'm going to transfer the issue over there. dbt-core already has some constructs in place that allow adapters to enrich logs and result objects with this kind of metadata.

As of v1.1 (#109), the "adapter response" object in structured logs and the run_results artifact includes the query_id:

      "thread_id": "Thread-1",
      "execution_time": 2.0358169078826904,
      "adapter_response": {
        "_message": "SUCCESS 1",
        "code": "SUCCESS",
        "rows_affected": 1,
        "query_id": "01a4a750-0401-8e90-000d-3783085d0572"
      },
      "message": "SUCCESS 1",
      "failures": null,
      "unique_id": "model.testy.some_model"
    }

We also include the query ID in debug-level logs (logs/dbt.log) any time a query encounters an error, e.g. when a model fails to build:

except Exception as e:
if isinstance(e, snowflake.connector.errors.Error):
logger.debug("Snowflake query id: {}".format(e.sfqid))

10:01:01.033411 [debug] [Thread-1  ]: On model.testy.some_model: /* {"app": "dbt", "dbt_version": "1.3.0a1", "profile_name": "sandbox-snowflake", "target_name": "dev", "node_id": "model.testy.some_model"} */

  create or replace  view ad_hoc.dev_jerco.some_model 
  
   as (
    select asljasdflj as id
  );
10:01:01.033506 [debug] [Thread-1  ]: Opening a new connection, currently in state closed
10:01:02.398978 [debug] [Thread-1  ]: Snowflake adapter: Snowflake query id: 01a4a759-0401-8e90-000d-3783085d05ae
10:01:02.399485 [debug] [Thread-1  ]: Snowflake adapter: Snowflake error: 000904 (42000): SQL compilation error: error line 4 at position 11
invalid identifier 'ASLJASDFLJ'
10:01:02.400060 [debug] [Thread-1  ]: finished collecting timing info
10:01:02.400388 [debug] [Thread-1  ]: On model.testy.some_model: Close
10:01:02.862960 [debug] [Thread-1  ]: Database Error in model some_model (models/some_model.sql)
  000904 (42000): SQL compilation error: error line 4 at position 11
  invalid identifier 'ASLJASDFLJ'
  compiled SQL at target/run/testy/models/some_model.sql

I realize that's less helpful when trying to debug queries that have succeeded, but run for longer than expected. For that, there's the query_id included in run_results.json, as well as two other tools that can be useful to you when searching in QUERY_HISTORY:

  • query tags, for filtering the history view
  • query comments/headers, which will appear in the QUERY_TEXT column of the history view, and include more detailed information about the specific model running

Do you feel like that gets you what you're looking for here?

@jtcohen6 jtcohen6 removed the triage label Jun 1, 2022
@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core Jun 1, 2022
@github-actions github-actions bot changed the title [CT-703] [Feature] Display the Query ID in the printed logs when dbt runs [CT-704] [CT-703] [Feature] Display the Query ID in the printed logs when dbt runs Jun 1, 2022
@mpavillet
Copy link
Author

Thank you for the answer Jeremy. I am able to find the query id and used the query history table/UI for this, this feature is more to make one's life easier.
I will have a look at the run_result file!

@robscriva
Copy link
Contributor

Just wanting to chime in here as we leverage query_tag to do the reverse, ie tag the queries in Snowflake with some model info. Some info on what we're sticking in our query_tag here: #133

One of the challenges with the model <-> query_id relationship is that it's a one to many, ie incremental models generate multiple queries, ie a CTAS+MERGE.

@jtcohen6 apologies for not searching for it, are both query_ids captured in that instance?

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

3 participants