Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide jobIds behind model creation in run_results.json #92

Closed
ggiill opened this issue Sep 3, 2019 · 7 comments · Fixed by #225 or #250
Closed

Provide jobIds behind model creation in run_results.json #92

ggiill opened this issue Sep 3, 2019 · 7 comments · Fixed by #225 or #250
Labels
enhancement New feature or request good_first_issue Good for newcomers

Comments

@ggiill
Copy link

ggiill commented Sep 3, 2019

In BigQuery, each time SQL is executed, a job is created and assigned a unique ID. (Example: job_9JVepH9O1bzemNfB6xZzxz_mfFkY) When DBT runs it checks for the results/status of these jobs.

We've found the run_results.json file extremely useful, and being able to tie these results (errors, compiled SQL, variables, tags, etc.) back to the specific job that executed in BigQuery for additional information and troubleshooting would make run_results.json even more useful.

Describe alternatives you've considered

You could try to look at a combination of DBT logs and BigQuery logs in Stackdriver Logging and try to approximate which exact job corresponded to the model creation by timestamp proximity or query (compiled SQL), but those methods are not deterministic.

Additional context

This is specific to BigQuery for now, but I am sure there are parallels in other databases (transaction ID? query ID?).

Another aspect to consider would be whether to provide multiple jobIds if there is a macro called in the model that executes additional queries against the database.

Who will this benefit?

Anyone that wants to tie run results back to specific jobs for additional debugging would benefit from this feature.

@bodschut
Copy link
Contributor

bodschut commented Oct 7, 2020

I also think this would be an interesting enhancement. Any plans to pick this up for a future release?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 7, 2020

Another aspect to consider would be whether to provide multiple jobIds if there is a macro called in the model that executes additional queries against the database.

This would be our biggest challenge with implementing this today. In the meantime, we do include:

Both of those should allow you to tie dbt's run_results.json back to BigQuery's INFORMATION_SCHEMA.JOBS_BY_*.

@bodschut
Copy link
Contributor

bodschut commented Oct 7, 2020

Thanks for the info @jtcohen6

This is indeed the primary goal to link to JOBS_BY_* views. Having the invocation_id in the job labels will solve this so there is no need for me to have the job_id in the logs or run_results.json file.

In what dbt release will the invocation_id label be included?

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core Dec 16, 2021
@jtcohen6
Copy link
Contributor

Missed this when I transferred BigQuery-specific issues a few months ago. This is analogous to dbt-labs/dbt-snowflake#7, and many of the same implementation details apply. I'm going to mark this a good first issue if anyone is interested in picking it up

@jtcohen6 jtcohen6 added enhancement New feature or request good_first_issue Good for newcomers labels Dec 16, 2021
@akshaan
Copy link

akshaan commented Jan 13, 2022

@jtcohen6 I'd love to take a crack at implementing this. Do you have any tips on where to get started / what an implementation would look like?

@jtcohen6
Copy link
Contributor

@akshaan Sorry for the delayed response here! If you're still interested in contributing, we'd still love to have the change.

The implementation here should look very very similar to the change proposed in dbt-labs/dbt-snowflake#40. It's just a matter of adding job_id as an attribute of BigQueryAdapterResponse here:

@dataclass
class BigQueryAdapterResponse(AdapterResponse):
    bytes_processed: Optional[int] = None
    job_id: Optional[str] = None  # should this be Optional, or str = '' ?

And then accessing the job_id off the QueryJob further down:

        # for each statement type
        job_id = query_job.job_id

        # and then
        response = BigQueryAdapterResponse(
            _message=message,
            rows_affected=num_rows,
            code=code,
            bytes_processed=bytes_processed,
            job_id=job_id
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Good for newcomers
Projects
None yet
4 participants