Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined behaviour with ambiguous column names #895

Closed
mklopets opened this issue Jul 31, 2022 · 3 comments
Closed

Undefined behaviour with ambiguous column names #895

mklopets opened this issue Jul 31, 2022 · 3 comments

Comments

@mklopets
Copy link
Collaborator

In some cases, PRQL does nicely warn us whenever we're referencing a field name that could have multiple meanings:

Ambiguous reference. Could be from either of `option_a`, `option_b`

However, in others (and I haven't quite understood the difference yet), it doesn't. In this example, common_fieldname exists both in CTEs x and y. At the end of the query, we select it without prefixing it with the table name, and it gets compiled to y.common_fieldname. This runs perfectly fine in the DB, but can cause unintended behaviour – we had a query like this return data from the wrong table, as we didn't realise there was a common column name between them.

table x = (
  from orig_x
  select [x_id, common_fieldname]
)

table y = (
  from orig_y
  select [y_id, x_id, common_fieldname]
)

from x
join y [x_id]

# common_fieldname could be coming from either table now
# it just randomly happens to come from `y`
select [x_id, common_fieldname] # same issue when filtering

the compiled (partial) output is

SELECT
  x,
  y.common_fieldname

...but it could just as well be x.common_fieldname. IMO, this should instead be a case where the compiler complains and asks you to be more specific.

@aljazerzen
Copy link
Member

I believe that this is caused by the same bug as #820 - common_fieldname is deemed to be the same column in both table definitions.

Part of the codebase that causes this will be redone in ~a week, so I won't be fixing this now.

@max-sixty
Copy link
Member

Now with #908 — a stopgap while we redo some of the Semantic part of the compiler — this now compiles to:

        WITH x AS (
          SELECT
            x_id,
            common_fieldname
          FROM
            orig_x
        ),
        y AS (
          SELECT
            y_id,
            x_id,
            common_fieldname
          FROM
            orig_y
        )
        SELECT
          x.*,
          y.*,
          x_id
        FROM
          x
          JOIN y USING(x_id)

...which is not as bad I think — likely the SQL engine is going to complain that there are conflicting columns.

It's not great — we can & should warn ourselves — but I hope it's OK until the Semantic rewrite @mklopets ?

@mklopets
Copy link
Collaborator Author

mklopets commented Dec 4, 2022

This was fixed with the semantic rewrite / 0.3.0 🎉

@mklopets mklopets closed this as completed Dec 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants