Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expect_row_values_to_have_recent_data fails in BigQuery for TIMESTAMP columns #104

Closed
kdw2126 opened this issue Jul 25, 2021 · 15 comments · Fixed by #147
Closed

expect_row_values_to_have_recent_data fails in BigQuery for TIMESTAMP columns #104

kdw2126 opened this issue Jul 25, 2021 · 15 comments · Fixed by #147
Labels
bug Something isn't working

Comments

@kdw2126
Copy link
Contributor

kdw2126 commented Jul 25, 2021

If you add a expect_row_values_to_have_recent_data argument to a TIMESTAMP column, you will get a failure driven by two characteristics of the existing implementation:

  • This comparison fails because BigQuery does not allow TIMESTAMP columns to be compared with DATE objects using the <= operator.
  • This comparison fails because BigQuery does not allow TIMESTAMP columns to be compared with DATESTAMP using the <= operator.

Given that a master time zone is specified in vars under dbt_project.yml, it seems possible to rewrite these comparisons using the appropriate recasting statements to allow columns of a TIMESTAMP type to be compatible with the expect_row_values_to_have_recent_data test.

@clausherther
Copy link
Contributor

Thanks for this issue @kdw2126 ! This timestamp/date issue has been a problem for a while, I think even dbt_utils.recency has this. I'd be happy for any PRs if you have ideas around this, but I'll mull this over next week when I have some spare cycles.

@clausherther
Copy link
Contributor

Closed via #145

@ivan-toriya
Copy link
Contributor

Closed via #145

It didn't fix the issue.

No matching signature for operator < for argument types: TIMESTAMP, DATETIME. Supported signature: ANY < ANY

@clausherther
Copy link
Contributor

@nirakon 😩 it passed tests on BQ, but I'll reopen this and look into it. I think this more of an issue with dbt_date.now() returning a datetime.

@clausherther clausherther reopened this Feb 25, 2022
@bascheibler
Copy link

I think it's the dbt_utils.dateadd macro that is returning a datetime. It calls the datetime_add function from BigQuery, which returns a datetime data type.

So the inner CTE ("max_recency") casts it as timestamp, but the outer CTE ("dbt_internal_test") casts it as datetime.

I'm using the version 0.5.2 with a date and had a different issue: if the column in my dbt model is a DATE, it returns the following error: No matching signature for operator <= for argument types: DATE, TIMESTAMP. Supported signature: ANY <= ANY.

I think I'll rollback to version 0.5.1 while this issue is not closed (or maybe change my column to a TIMESTAMP in case this macro won't be compatible with dates).

@clausherther
Copy link
Contributor

@bascheibler curious, does the dbt_utils.recency test work for you? This is really just a version of that, with some more options. So, if dbt_utils.recency works as a work around, I'd recommend that for now.
Dealing with date types, particularly datetime related ones is always really tricky across platforms.

@bascheibler
Copy link

It works! Thank you for the hint, @clausherther.
Unfortunately, dbt_utils.recency is a schema test. So the documentation doesn't list this test under the column description. It just shows that the model depends on this test. But it's a good workaround for the moment and I can keep using the latest version of dbt_expectations. I'll keep watching this issue just in case the fix also works for dates. Thanks!

@clausherther
Copy link
Contributor

@bascheibler good to hear! One gotcha with recency (and motivation behind expect_row_values_to_have_recent_data) is that if you have events in the future (e.g. bad dates from devices etc) in your test data, recency will pass even though you might have stale data prior to today.

@clausherther
Copy link
Contributor

clausherther commented Mar 9, 2022

@AdeelK93 tagging you here in the issue instead of the closed PR. Can you describe your error a bit more including the error message and the data type of column in your case? This has been a pretty elusive bug and very much depends on the data type of the column you're testing on.
The issue, I think, is that the tests checks to make sure we're only testing past data, in the event that your data contains future dates, which would give you a false negative. The dbt_utils.recency test has this potential issue. So, we use dbt_date.now to filter the incoming data. dbt_date.now in BigQuery returns a timestamp as a result of timestamp(datetime({{ column }}, '{{ target_tz}}')). So, depending on what's on the left side of {{ column_name }} <= {{ dbt_date.now() }} this can work fine or fail.

@clausherther
Copy link
Contributor

One option is to make the test more configurable to either let users specify a data type (date or timestamp etc) and/or whether or not to check for future dates at all.

@AdeelK93
Copy link

AdeelK93 commented Mar 9, 2022

Sure, my source data has a date column named date, which worked fine in the old version of the test. In the new version, I must cast the date as a timestamp to get the upper part of the query to work, but then as soon as I do that, the lower part of the query fails as a timestamp is being compared to a datetime.

image

@clausherther
Copy link
Contributor

@AdeelK93 Blerg. That is super helpful, thanks! Will try to get to the bottom of it this week. Sorry about the hiccup.

@clausherther
Copy link
Contributor

(Looks like this is also an issue...

coalesce(max_date, '{{ default_start_date }}') < ...

)

@pan-kadams
Copy link

pan-kadams commented Jun 3, 2022

hi all, I am getting the same error when trying to run dbt_expectations.expect_grouped_row_values_to_have_recent_data (but notably not on dbt_expectations.test_expect_row_values_to_have_recent_data) on BigQuery.

The specific error I am getting is:
No matching signature for operator < for argument types: TIMESTAMP, DATETIME. Supported signature: ANY < ANY at [38:9]

I am currently using dbt_expectations 0.5.6. Let me know if there is more information I need to provide.

@clausherther
Copy link
Contributor

@pan-kadams thanks for pointing this out! Would you mind opening a new issue for expect_grouped_row_values_to_have_recent_data along with your test config if possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants