Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enablement due to dbt-utils' "dispatchifiction" #17

Merged
merged 30 commits into from Jan 14, 2021
Merged

Conversation

dataders
Copy link
Contributor

@dataders dataders commented Jan 7, 2021

With the two PRs below, we can now do the following two things:

support as many macros as we can.

as of now, all 39 of dbt-utils's macros are fully working except for the 15 listed below:

additionally the following macros work on Azure SQL & SQL Server, but due to a synapse bug are disabled, because the ITs don't work.

  • dateadd()
  • datediff()
  • split_part()
  • last_day()
  • test-hash()

dbt-utils's integration testing

The goal of getting this repo working with dbt-utils' integration testing was to make it very easy to:

  • test that our shimmed macros, and
  • keep said macros up-to-date as dbt-utils changes.

One challenge with using dbt-utils' integration testing was that it is a bit of a RepRap in that it includes tests (schema and data) that are needed to evaluate whether the other macros are working. So in some cases, a macro works in TSQL with only a small change, but we had to disable the model corresponding to the macro in the dbt_project.yml anyway because there are tests used on the macro that weren't compatible with TSQL.

dependent, now-merged PRs:

@dataders
Copy link
Contributor Author

dataders commented Jan 7, 2021

@jtcohen6 any idea as to how we can tackle these tests? Maybe wrap the majority of logic into standalone macros that we can write our own versions of?
https://github.com/swanderz/dbt-utils/pull/1/files

@jtcohen6
Copy link
Collaborator

jtcohen6 commented Jan 7, 2021

@jtcohen6 any idea as to how we can tackle these tests? Maybe wrap the majority of logic into standalone macros that we can write our own versions of?
https://github.com/swanderz/dbt-utils/pull/1/files

@swanderz We could abstract those in {{ select_one() }} {{ limit_zero() }} macros, but your proposed changes also seem like reasonable rewrites to me that wouldn't break any mainline behavior. Want to add them to dbt-labs/dbt-utils#310?

@dataders
Copy link
Contributor Author

dataders commented Jan 7, 2021

Want to add them to fishtown-analytics/dbt-utils#310?

sure, though i'm leery of the size of that PR, so maybe in a follow-on 🤷 ?

@jtcohen6
Copy link
Collaborator

jtcohen6 commented Jan 7, 2021

Want to add them to fishtown-analytics/dbt-utils#310?

sure, though i'm leery of the size of that PR, so maybe in a follow-on 🤷 ?

Feels closely related, and that PR isn't too big (talking about "t-sql hacks that shouldn't break mainline behavior," not "dispatch-ify ALL THE MACROS!")

@dataders dataders marked this pull request as ready for review January 11, 2021 16:56
@alittlesliceoftom
Copy link
Contributor

Could I be really boring and request:

  1. Slightly more description of the PR - it seems we're (a) enabling a bunch more things due to the other PRs and (b) also adding some new functionality e.g. time datatypes. Presumably these are fixes in response to tests we can now run due to (a)?
  2. resolve the merge conflict

@dataders
Copy link
Contributor Author

Could I be really boring and request:

1. Slightly more description of the PR - it seems we're (a) enabling a bunch more things due to the other PRs and (b) also adding some new functionality e.g. time datatypes. Presumably these are fixes in response to tests we can now run due to (a)?

2. resolve the merge conflict

done. and done.

@dataders
Copy link
Contributor Author

@mikaelene what do you think?

Copy link
Contributor

@alittlesliceoftom alittlesliceoftom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @swanderz ,

This all looks really good. I haven't marked last 4 (test files) as 'viewed' as I didn't fully understand them as I haven't worked with those macros. But I definitely think this is all good to pull in.

@@ -1,5 +1,5 @@
{% macro sqlserver__hash(field) %}
hashbytes('md5', {{field}})
convert(varchar(50), hashbytes('md5', {{field}}), 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swanderz just noting that this will limit hashable field size to 50 (bytes not chars I think). Is this intended behaviour? Could use string?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this as well. I had the same in the old sqlserver-utils package. Maybe we can have varchar(max)? Most cases will be hashing one column, but if you want to look for diffs, like watching for changes, you may include a lot of columns making the hashes longer then 50

@@ -1,3 +1,4 @@
[submodule "dbt-utils"]
path = dbt-utils
url = https://github.com/fishtown-analytics/dbt-utils
branch = master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting this may result in changes in dbt-utils breaking things on our side without an explicit commit as not pegged to a certain version.

@alittlesliceoftom
Copy link
Contributor

Hey @swanderz ,

This all looks really good. I haven't marked last 4 (test files) as 'viewed' as I didn't fully understand them as I haven't worked with those macros. But I definitely think this is all good to pull in.

You might consider adding some docstrings to the tests to help others (and yourself when you forget what you did) to understand them as I think they're quite complex to reason about if looking at from fresh

@mikaelene
Copy link

@mikaelene what do you think?

I think this looks great on an overall level. Don't have time to go through all but really great work!

@dataders
Copy link
Contributor Author

Hey @swanderz ,
This all looks really good. I haven't marked last 4 (test files) as 'viewed' as I didn't fully understand them as I haven't worked with those macros. But I definitely think this is all good to pull in.

You might consider adding some docstrings to the tests to help others (and yourself when you forget what you did) to understand them as I think they're quite complex to reason about if looking at from fresh

Some context about the bottom 3schema_test/ macros you refer to. one of the coolest things about dbt-utils are their integration tests, which are really unit tests. For many macros, there's two seed models: an input and an output. The testing process is that the macro is run on the input model and then, the output is compared to output seed.

The challenge is that some arguments defined in schema tests for some testing models in dbt-utils/integration_tests/models/schema_tests/schema.yml use things that are not supported by TSQL. Things like <> for not equal, or something = false (TSQL doesn't have true or false values. So rather than making a PR to dbt-utils and changing the args, it's better to just capture them inside of our dispatched macros and swap them for the TQL equivalent.

TL;DR the changes are just to make the integration testing work, and shouldn't ever affect end users unless I'm missing something

@dataders dataders merged commit cede755 into master Jan 14, 2021
@dataders dataders deleted the post-dispatchify-PR branch November 12, 2021 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants