New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Link record is NULL and not loaded when some FKs are NULL #65
Comments
Thanks for this report, I will investigate this and get back to you shortly P.S I edited your report but only for formatting and such to make it easier to read. |
I edited it also. Because problem was not the amount of hat signs. It seems to be how the ifnull and nullif functions are used. Almost like hashkey is wanted to be null, when the actual foreign key value is null. Is this intended? How you should model those links then, if you cannot add those nullable foreign keys in there and the link row is completely ignored (even if there is some other foreign keys that should be active)? |
Hello. I've talked to our internal Data Vault 2.0 expert on this. Let me explain. For a Link, the business will likely have an idea of what keys (foreign or primary) are necessary for business meaning, and which are not. This is the concept of mandatory and optional keys, i.e. values the business requires and values the business doesn't mind being We don't want to load any records into a link which are truly We provide these tokens as part of the staging process, and not as part of the Link logic itself. This allows us to still handle the base or 'simple' case of not loading a Link record if any keys are The above explanation is Data Vault 2.0 Best practise, and dbtvault does support this approach, however our documentation/guidance on this is missing, and we would like to add helper macros, syntactic sugar and make this approach easier for the user. Whilst not a bug, this will be an issue for many users, and is something we want to improve. I will leave this issue here to help other users, and as a way of tracking changes aimed at improving this functionality. Thank you for the report and hope this helps! |
Thanks, I think this is adequate explanation for me at least. I checked the datavault material. In my material the values were -1 and -2: Dealing with Null Business Keys Null Business Key Arrival • Hash = BK Arrives NULL but is Optional • ZERO KEY is the concept, but it should never be "ZERO" • Pre-Populate all Tables in the Data Vault with two zero |
I have to get back little bit in this one. I have an issue because the dbtvault hashing is converting empty strings as null. Because for example in SAP, empty string could actually mean something. At the moment I'm not converting empty strings as zero keys, but dbtvault hashing is interpreting it as null and not showing those values in link table. I can get round of this, if I implement the same empty string conversion as you have. But then the auditability is lost. Because the empty string key might have in source system some actual meaning and description. Now it is converted as zero key, which meaning is totally different... |
In Data Vault 2.0, this is the standard, An empty string generally means absolutely nothing, and they often imply data quality issues. If they do mean something, you should convert them to tokens/codes prior to staging. This would be a layer (tables or views) between your raw stage and the prepared stage (dbtvault staging). This would mean that by the time these values get to the dbtvault staging layer for hashing, they are no longer empty strings. To clarify, you should give the empty strings the same business meaning in Data Vault, ensuring they have codes for these different business meanings. This also solves the issue of empty strings meaning different things in different places; in this world, you would just have different codes instead. |
Ok. Probably I have to add -3 for empty string as a token. Then I can preserve the auditability. Thanks for clarification. |
Use -1 as a token for missing expected (mandatory), -2 for missing optional.
If you don't know if it was mandatory or not, use -1
|
It is not missing value. It is empty string which has actual key attach to it. It could be something like id='' and description='Description for empty string'. If I use the same values as for null, auditability is broken. Because null means that value is not given at all and the empty string has real world values attached to it. That's why I need a third token as Alex suggested. |
Is it a reference (lookup) value?
|
The token approach should only be used for cases where you have genuine missing values. For actual values that are empty strings you should use reference data/lookups which Neil is getting at, or apply business rules downstream. |
Closing due to age and resolution. |
Describe the bug
Link record is NULL and not loaded when some FKs are NULL
Versions
dbt: 0.20.2
dbtvault: 0.7.7
snowflake: 5.34.1
To Reproduce
Steps to reproduce the behavior:
The text was updated successfully, but these errors were encountered: