Provide jobIds behind model creation in run_results.json #92

ggiill · 2019-09-03T21:22:40Z

In BigQuery, each time SQL is executed, a job is created and assigned a unique ID. (Example: job_9JVepH9O1bzemNfB6xZzxz_mfFkY) When DBT runs it checks for the results/status of these jobs.

We've found the run_results.json file extremely useful, and being able to tie these results (errors, compiled SQL, variables, tags, etc.) back to the specific job that executed in BigQuery for additional information and troubleshooting would make run_results.json even more useful.

Describe alternatives you've considered

You could try to look at a combination of DBT logs and BigQuery logs in Stackdriver Logging and try to approximate which exact job corresponded to the model creation by timestamp proximity or query (compiled SQL), but those methods are not deterministic.

Additional context

This is specific to BigQuery for now, but I am sure there are parallels in other databases (transaction ID? query ID?).

Another aspect to consider would be whether to provide multiple jobIds if there is a macro called in the model that executes additional queries against the database.

Who will this benefit?

Anyone that wants to tie run results back to specific jobs for additional debugging would benefit from this feature.

The text was updated successfully, but these errors were encountered:

bodschut · 2020-10-07T08:26:00Z

I also think this would be an interesting enhancement. Any plans to pick this up for a future release?

jtcohen6 · 2020-10-07T13:49:18Z

Another aspect to consider would be whether to provide multiple jobIds if there is a macro called in the model that executes additional queries against the database.

This would be our biggest challenge with implementing this today. In the meantime, we do include:

the invocation_id + node unique_id in a prepended query comment by default
the invocation_id as a job label (as of Add invocation_id to BigQuery jobs dbt-core#2809)

Both of those should allow you to tie dbt's run_results.json back to BigQuery's INFORMATION_SCHEMA.JOBS_BY_*.

bodschut · 2020-10-07T14:37:20Z

Thanks for the info @jtcohen6

This is indeed the primary goal to link to JOBS_BY_* views. Having the invocation_id in the job labels will solve this so there is no need for me to have the job_id in the logs or run_results.json file.

In what dbt release will the invocation_id label be included?

github-actions · 2021-12-16T01:48:10Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

jtcohen6 · 2021-12-16T10:02:52Z

Missed this when I transferred BigQuery-specific issues a few months ago. This is analogous to dbt-labs/dbt-snowflake#7, and many of the same implementation details apply. I'm going to mark this a good first issue if anyone is interested in picking it up

akshaan · 2022-01-13T06:27:06Z

@jtcohen6 I'd love to take a crack at implementing this. Do you have any tips on where to get started / what an implementation would look like?

jtcohen6 · 2022-03-10T09:55:07Z

@akshaan Sorry for the delayed response here! If you're still interested in contributing, we'd still love to have the change.

The implementation here should look very very similar to the change proposed in dbt-labs/dbt-snowflake#40. It's just a matter of adding job_id as an attribute of BigQueryAdapterResponse here:

@dataclass
class BigQueryAdapterResponse(AdapterResponse):
    bytes_processed: Optional[int] = None
    job_id: Optional[str] = None  # should this be Optional, or str = '' ?

And then accessing the job_id off the QueryJob further down:

        # for each statement type
        job_id = query_job.job_id

        # and then
        response = BigQueryAdapterResponse(
            _message=message,
            rows_affected=num_rows,
            code=code,
            bytes_processed=bytes_processed,
            job_id=job_id
        )

jtcohen6 transferred this issue from dbt-labs/dbt-core Dec 16, 2021

jtcohen6 added enhancement New feature or request good_first_issue Good for newcomers labels Dec 16, 2021

jtcohen6 mentioned this issue Feb 4, 2022

[CT-297] get_response should always return AdapterResponse dbt-labs/dbt-core#4499

Closed

Kayrnt mentioned this issue Jul 19, 2022

Add job_id/project_id to adapter response to enable easy job linking #225

Merged

4 tasks

McKnight-42 mentioned this issue Jul 25, 2022

[CT-904] Add test integration test for BigQuery-specific adapter_response contents #235

Closed

McKnight-42 closed this as completed in #225 Aug 4, 2022

Kayrnt mentioned this issue Aug 4, 2022

Add job_id/project_id to adapter response to enable easy job linking (fixed) #250

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide jobIds behind model creation in run_results.json #92

Provide jobIds behind model creation in run_results.json #92

ggiill commented Sep 3, 2019

bodschut commented Oct 7, 2020

jtcohen6 commented Oct 7, 2020

bodschut commented Oct 7, 2020

github-actions bot commented Dec 16, 2021

jtcohen6 commented Dec 16, 2021

akshaan commented Jan 13, 2022

jtcohen6 commented Mar 10, 2022

Provide jobIds behind model creation in run_results.json #92

Provide jobIds behind model creation in run_results.json #92

Comments

ggiill commented Sep 3, 2019

Describe alternatives you've considered

Additional context

Who will this benefit?

bodschut commented Oct 7, 2020

jtcohen6 commented Oct 7, 2020

bodschut commented Oct 7, 2020

github-actions bot commented Dec 16, 2021

jtcohen6 commented Dec 16, 2021

akshaan commented Jan 13, 2022

jtcohen6 commented Mar 10, 2022