Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tsql_utils.surrogate_key() macro #32

Merged
merged 4 commits into from
Apr 29, 2021

Conversation

infused-kim
Copy link
Contributor

Here is a suggested attempt at addressing the two issues I have raised related to the default surrogate_key() implementation: #26 and #25

This creates a a separate tsql_utils.surrogate_key() macro in addition to the default dbt_utils.surrogate_key() macro that allows for more t-sql specific customization.

This macro adds three additional features:

1. Field column type

It allows you to customize the column type into which the fields are cast. One use case for this is if you have nvarchar columns that you want to generate a surrogate key for. Since dbt_utils.surrogate_key() by default casts them into varchar, it's possible to get duplicate surrogate keys.

With this macro you can use tsql_utils.surrogate_key(["col"], col_type="nvarchar(4000)) to solve the problem

2. Generate binary hash

The current tsql_utils adapters cause dbt_utils.surrogate_key() to generate a hash that is stored as a varchar
string that uses 32 bytes of data.

This macro allows you to use tsql_utils.surrogate_key(["col"], use_binary_hash=True).

This will keep the key as varbinary that only uses 16 bytes of data. This will reduce space in the database and can potentially increase join performance, but the column has to be converted into varchar before it can be used in Power BI for relationships.

To help with that issue, this PR provides a second macro cast_hash_to_str() that allows you to convert the varbinary surrogate keys to varchar inside your report views before importing them into Power BI to allow relationships on your surrogate key columns.

3. Adjust default values through dbt_project vars:

You can also customize both settings through variables in your dbt_project.yml:

vars:
  dbt_utils_dispatch_list: ['tsql_utils']
  tsql_utils_surrogate_key_col_type: 'nvarchar(1234)'
  tsql_utils_surrogate_key_use_binary_hash: True

Let me know what you think of this and if you would like me to make any adjustments.

Copy link
Contributor

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @alittlesliceoftom what do you think?

@dataders
Copy link
Contributor

@infused-kim can you add a 0.6.6 section to the CHANGELOG.md?

@dataders dataders merged commit b916816 into dbt-msft:main Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants