Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise a more helpful error if BigQuery job label is too long #3612

Closed
1 of 5 tasks
edbizarro opened this issue Jul 22, 2021 · 6 comments · Fixed by #3703
Closed
1 of 5 tasks

Raise a more helpful error if BigQuery job label is too long #3612

edbizarro opened this issue Jul 22, 2021 · 6 comments · Fixed by #3703
Assignees
Labels
bigquery enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!
Milestone

Comments

@edbizarro
Copy link

Describe the bug

After migrating to v0.20.0 my jobs keep failing with the following error:

Database Error in model stg_facebookads__ads (models/staging/facebook_ads/stg_facebookads__ads.sql)                                                                                                         
  Label value "__database____REDACTED____schema____dev_edbizarro____identifier____stg_facebookads__ads__" has invalid characters.                                                                         
  compiled SQL at target/run/insights_lab/models/staging/facebook_ads/stg_facebookads__ads.sql

Steps To Reproduce

Config labels in dbt_project.yml

query-comment:
  comment: "{{ query_comment(node) }}"
  append: false
  job-label: true

Run any model

Expected behavior

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

 ❯ dkc exec dbt dbt --version                                                                                                                                                                    [17:52:32]
installed version: 0.20.0
   latest version: 0.20.0

Up to date!

Plugins:
  - redshift: 0.20.0
  - snowflake: 0.20.0
  - bigquery: 0.20.0
  - postgres: 0.20.0

The operating system you're using:
Arch Linux

The output of python --version:

Running through official docker image

Additional context

Add any other context about the problem here.

@edbizarro edbizarro added bug Something isn't working triage labels Jul 22, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 27, 2021

@edbizarro I see you switched on query-comment.job-label, which is new in v0.20.0. The way that feature works:

  • Try reading your query comment as a dictionary. If it can, pass each key-value pair as a separate job label. "Sanitize" each value by replacing whitespace and special characters with underscores.
  • If your query comment can't be read as a dictionary, pass the entire (sanitized) query comment as a single job label.

The latter is what's happening here. Unfortunately, the error BigQuery is giving back is a bit misleading:

Label value "__database____REDACTED____schema____dev_edbizarro____identifier____stg_facebookads__ads__" has invalid characters.

The real issue here is that labels are limited to 63 characters in length (docs), and this string is 89 characters in length. If I shorten the string to 63 characters, everything works just fine.

In the original PR for this feature, we discussed potential approaches for handling too-long labels: #3145 (comment). The options are:

  1. Truncate, hash, or otherwise handle the label length within dbt. This would happen silently, and could result in indistinguishable label values.
  2. Raise an error within dbt.
  3. Do nothing, and return any errors from BigQuery.

We picked the third option. Given the lack of clarity BigQuery's error message, and the ensuing confusion indicated by this issue, I think there's good reason to prefer the second: I think we should raise a compilation error any time query-comment.job-label is switched on and a label value would be >63 characters.

That should be a straightforward change. Is it something you'd be interested in contributing @edbizarro?

In any event, you'll need to work around this error by:

  1. Refactoring your query_comment macro to return a dictionary
  2. Refactoring your query_comment macro to return a shorter string (post-sanitization)
  3. Switching off query-comment.job-label

@jtcohen6 jtcohen6 added bigquery enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! and removed bug Something isn't working triage labels Jul 27, 2021
@jtcohen6 jtcohen6 changed the title Database Error in model: Label value "..." has invalid characters Raise a more helpful error if BigQuery job label is too long Jul 28, 2021
@sungchun12
Copy link
Contributor

@jtcohen6 I'm happy to take this issue on if @edbizarro doesn't want to!

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 5, 2021

@sungchun12 I'd love that!

@sungchun12
Copy link
Contributor

@jtcohen6 This is officially in my personal backlog that I'll spend the next week or so focusing on!

I'll be diving into #3145. And I'm assuming the solution to this problem will be extending the validations and raising specific error messages in functions like: this and/or create another function dedicated to verifying string length after sanitization.

@sungchun12 sungchun12 self-assigned this Aug 5, 2021
@edbizarro
Copy link
Author

edbizarro commented Aug 6, 2021

@sungchun12 Sure! I really want to take this challenge but unfortunately I'm little short in time this next weeks so i would love that someone taking on this, thanks!

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 6, 2021

@sungchun12 Yes! I think _sanitize_label raising a ValidationException if passed a string longer than 63 characters will get the job done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants