Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Allow hashed_columns to see derived columns to allow concatenated business keys #9

Closed
balmasi opened this issue Jun 29, 2020 · 3 comments
Assignees
Labels
feature This is is requesting a new feature
Milestone

Comments

@balmasi
Copy link

balmasi commented Jun 29, 2020

I've come across a few instances where I need to hash a derived column in my staging area. Sometimes this is because I simply want to append a manual string to my BK, other times I want to rename something in my staging for use down the line, but I don't have access to it in the hashed_columns section so I have to write SQL or use a field name I'm renaming (or be forced to create an extra model to feed the stage macro)

for example, I have to do the following:

          hstg_sfdc__lead_core:
            vars:
              source_model:
                salesforce: Lead
              hashed_columns:
                HUB_FUNNEL_HK:
                  - "(SELECT 'sfdc.Lead' AS SCOPE_BK)" # SCOPE_BK
                  - Website # DOMAIN_BK        
              derived_columns:
                SCOPE_BK: '!sfdc.Lead'
                DOMAIN_BK: Website

What I'd like to do:

          hstg_sfdc__lead_core:
            vars:
              source_model:
                salesforce: Lead
              hashed_columns:
                HUB_FUNNEL_HK:
                  - SCOPE_BK
                  - DOMAIN_BK
              derived_columns:
                SCOPE_BK: '!sfdc.Lead'
                DOMAIN_BK: Website

Additionally, I'm finding an enterprise-wide unique natural key is hilariously difficult. For example imagine I have an HR system, Contractor system, and a partner company system and I want to represent a single concept of Employee.

image

As you can see the unique keys here are different in each case, so my Business Key has to be a concatenation of multiple keys in the source system. This is what is referred to as src_nk in your hub macro.

In the above example, if I had a 2-part key, passing both keys as src_nk to the hub macro would result in 2 columns SCOPE_BK and DOMAIN_BK, however, I'm only after a single concatenated column.

It would be amazing if dbtvault could support this concatenation as a first-class concept (using the same transformation built into the hash function)

In this case, I might simply have, in each of the staging configs that feed into the hub, something like:

          hstg_sfdc__lead_core:
            vars:
              source_model:
                salesforce: Lead
              hashed_columns:
                HUB_FUNNEL_HK: FUNNEL_BK
              derived_columns:
                HUB_FUNNEL_BK: 
                  - '!sfdc.Lead'
                  -  Website

This would generate a table that looks like this:

HUB_FUNNEL_HK HUB_FUNNEL_BK
98a7sd8a9s7da9fgag7a98a7sda sfdc.Lead||abc.com
98a7sd8a9gsdf3ggag7a98a7sda sfdc.Lead||def.com
98a7sd8a9s7da9fgag7a98a123a sfdc.Lead||efg.com

Currently, I don't even have access to the hash concat macro by itself cause it's a part of the hash macro.

Anyway, I know this is kinda multiple issues, but I wanted to put it out there to see if others felt similar pains.

Of course, as with everything else dbt, you can work around this by adding a separate model before your dbtvault one, but I felt I was doing this all the time and something that can be easily factored out.

@DVAlexHiggs
Copy link
Member

Great ideas. Both of these features are quite and straight forward to implement. Derived columns bring accessible by hashed columns is something we already have in the backlog, and the derived column concatenation syntax is certainly very useful!

@DVAlexHiggs DVAlexHiggs self-assigned this Sep 23, 2020
@DVAlexHiggs DVAlexHiggs added the feature This is is requesting a new feature label Sep 23, 2020
@DVAlexHiggs DVAlexHiggs added this to the 0.8 milestone Sep 23, 2020
@DVAlexHiggs DVAlexHiggs changed the title Allow hashed_columns to see derived columns to allow concatenated business keys [FEATURE] Allow hashed_columns to see derived columns to allow concatenated business keys Sep 23, 2020
@DVAlexHiggs DVAlexHiggs modified the milestones: v0.8, 0.7.1 Nov 24, 2020
@DVAlexHiggs
Copy link
Member

Added in v0.7.1 Close if working as expected

@DVAlexHiggs
Copy link
Member

DVAlexHiggs commented Jan 13, 2021

Closing as we have implemented the primary suggested feature. The concatenation of columns in the derived_columns configuration has been separated into a new issue (#20) and will be added to the next release (surprise: it's already been developed!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature This is is requesting a new feature
Projects
None yet
Development

No branches or pull requests

2 participants