-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] stage macro does not support columns with reserved keywords as names #23
Comments
Workaround for the time being: for the
If you need the column in your staging layer, you can set |
Thank you for reporting this issue. This is not possible in snowflake (using reserved words as column names), even if quotes are used to force it. i.e. This seems to be a BQ only issue. I think this issue highlights a data quality issue in the dataset; Playing devil's advocate, data should not have reserved words for column names in the first place, and therefore it's not our responsibility to fix it. Saying this, we cannot always help source data quality issues and we should have the option to fix the problem in the staging layer. Is it adequate that you create a derived column which aliases the reserved word column, and this is used as the column in place of the reserved word column in the hashed column definition? If not, we can look at a more permanent workaround/fix for when dbtvault supports BigQuery |
The source data can have whatever column names it wants. The question is, do you want dbtvault to natively support these special column names in the stage macro or not? The bigger issue is that It's not just hashing, this "bug" forces you not to be able to use |
SELECT "TEST_COL" AS `ORDER` does work in Snowflake. I've been able to re-produce the issue in our test bed now, and you're right about this being an issue. I still feel that this is a source data issue, and if users have column names which are reserved words then they have a bigger data quality issue. We've never seen this before in the wild and I feel like it's quite a niche issue, however, it is nonetheless an issue with a relatively simple fix, so we will add it to the backlog as a quality of life improvement. |
I'm genuinely surprised to hear this cause there's no rule that says the source system uses a SQL datastore. Perhaps I'm missing a layer in my DV implementation for "hard business rules" like this as Dan calls them. I was under the impression that in most cases I could just shove it through dbtvault without a previous layer 😆 |
Apologies, I haven't made myself clear. dbtvault should handle this case, I'm simply suggesting that it might imply other issues with your data if you have reserved word column names in the source system, and it may be something developers would be inclined to fix prior to generating the raw vault (dbtvault or no dbtvault). In saying this, developers should be able to fix it OR not need to fix it because dbtvault should understand that this is possible and handle it on their behalf, by qualifying column names in the stage. |
Currently working on this and #28 in this branch: https://github.com/Datavault-UK/dbtvault/tree/feat/escape-col-name |
We've got a fix for this coming in 0.7.10. |
Hi all. This is now supported in v0.8.0! Read the docs here |
Describe the bug
Let's say your source contains a column called "order", if you try to run the stage macro with that as a hashed column (or even simply with include_source_columns = true) it will fail
Versions
dbt: 0.18.1
dbtvault: (BQ fork a.k.a 0.7.0), however looking at the code it should be present in the latest version as well
Expected behavior
I would expect all queried columns to have be qualified with the table name
Screenshots
The text was updated successfully, but these errors were encountered: