Added tsql_utils.surrogate_key() macro #32
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here is a suggested attempt at addressing the two issues I have raised related to the default surrogate_key() implementation: #26 and #25
This creates a a separate tsql_utils.surrogate_key() macro in addition to the default
dbt_utils.surrogate_key()
macro that allows for more t-sql specific customization.This macro adds three additional features:
1. Field column type
It allows you to customize the column type into which the fields are cast. One use case for this is if you have
nvarchar
columns that you want to generate a surrogate key for. Sincedbt_utils.surrogate_key()
by default casts them intovarchar
, it's possible to get duplicate surrogate keys.With this macro you can use
tsql_utils.surrogate_key(["col"], col_type="nvarchar(4000))
to solve the problem2. Generate binary hash
The current tsql_utils adapters cause
dbt_utils.surrogate_key()
to generate a hash that is stored as a varcharstring that uses 32 bytes of data.
This macro allows you to use
tsql_utils.surrogate_key(["col"], use_binary_hash=True)
.This will keep the key as varbinary that only uses 16 bytes of data. This will reduce space in the database and can potentially increase join performance, but the column has to be converted into varchar before it can be used in Power BI for relationships.
To help with that issue, this PR provides a second macro
cast_hash_to_str()
that allows you to convert thevarbinary
surrogate keys tovarchar
inside your report views before importing them into Power BI to allow relationships on your surrogate key columns.3. Adjust default values through dbt_project vars:
You can also customize both settings through variables in your dbt_project.yml:
Let me know what you think of this and if you would like me to make any adjustments.