-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bigquery uniqueness test fails on column name matching table name #33
Comments
@abuckenheimer You're totally right. This is tougher than it should be today because a test doesn't actually have a good link between itself and the model it's defined on. It's going to be even trickier after the introduction of the At the time, I was thinking that it was important for the subquery to be aliased the same as the model identifier, in case the generic test definition depended on using the model identifier as a column alias. But it wouldn't really be a generic test if it did, now would it? Here are the options I can think of:
{% macro default__test_unique(model, column_name) %}
select
model.{{ column_name }},
count(*) as n_records
from (select * from {{ model }}) model
where model.{{ column_name }} is not null
group by model.{{ column_name }}
having count(*) > 1
{% endmacro %} Option 3 is something you can do right now, in your own project, to get this working in the meantime. But I'm leaning toward option 2, which would require adjusting this (gnarly) code: In order to always alias, and always use the same alias (just the word def build_model_str(self):
targ = self.target
cfg_where = "config.get('where')"
alias = "model"
if isinstance(self.target, UnparsedNodeUpdate):
identifier = self.target.name
target_str = f"{{{{ ref('{targ.name}') }}}}"
elif isinstance(self.target, UnpatchedSourceDefinition):
target_str = f"{{{{ source('{targ.source.name}', '{targ.table.name}') }}}}"
unfiltered = f"{target_str} {alias}"
filtered = f"(select * from {target_str} where {{{{{cfg_where}}}}}) {alias}"
return f"{{% if {cfg_where} %}}{filtered}{{% else %}}{unfiltered}{{% endif %}}" From there, it's as simple as adding I don't love using What do you think? If you're on board with the proposal, we may be able to sneak it in for v0.20.0rc2, since it's a relevant tweak to code that's changing in v0.20. |
I think your onto something with the |
ah actually I may be conflating your approaches 2 and 3. I guess what your saying for 2 is that users should always qualify model columns with the |
Yes, I think you're right, we should pick an alias that's unlikely to clash. It would be a shame to fix this issue (name + column clash) and create another! There's another option, which now feels so obvious that I'm embarrassed I didn't think of it earlier. We could rewrite generic tests to use an "import" CTE, similar to our style guide: with dbt_test__target as (
select * from {{ model }}
)
select
{{ column_name }},
count(*) as n_records
from dbt_test__target
where {{ column_name }} is not null
group by {{ column_name }}
having count(*) > 1 I don't think we'd need to qualify the column name at all, assuming there's no change the column is also named |
love it |
@abuckenheimer Any interest in a contribution? :) |
Resolved by #10 |
Describe the bug
schema test
unique
cannot be used on columns in bigquery where the column name has the same name as the table. Bigquery yields the following error:Steps To Reproduce
Expected behavior
using an alias for the table name disambiguates the reference for bigquery and allows the test to run succesfully
Screenshots and log output
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
on wsl
The output of
python --version
:3.9.2
Additional context
Found this issue in dbt_utils technically and will have to file an issue there as well but it holds for a classic uniqueness test as well
The text was updated successfully, but these errors were encountered: