Skip to content

test: Test aliases with BigQuery CTEs. #11439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

DiegoAlbertoTorres
Copy link
Contributor

@DiegoAlbertoTorres DiegoAlbertoTorres commented Jul 5, 2025

Description of changes

Introduces a test to validate use of expression aliasing with CTEs in BigQuery. This functionality is currently broken (and I hope I can fix it, creating this PR first to show the test in the issue).

Issues closed

Resolves #11440

@github-actions github-actions bot added the tests Issues or PRs related to tests label Jul 5, 2025
@github-actions github-actions bot added the sql Backends that generate SQL label Jul 6, 2025
@DiegoAlbertoTorres DiegoAlbertoTorres force-pushed the fix-bq-cte-aliasing branch 2 times, most recently from eb1524f to 57dd49a Compare July 6, 2025 22:45
@cpcloud cpcloud force-pushed the fix-bq-cte-aliasing branch from 57dd49a to 5f28a20 Compare July 7, 2025 16:11
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, couple of small changes.

Comment on lines 1610 to 1613
sg.exp.CTE(
alias=sg.to_identifier(name, quoted=self.quoted),
this=compiled_ibis_expr,
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the main change in this part of the code.

Is the issue that in the previous implementation compiled_query might reference compiled_ibis_expr or its CTEs, which would break topological ordering of CTE definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this change and the one in translate are about the same, just one happens during compilation and this one is during schema validation.

Really the main source of the issue is that sqlglot's with_ produces code with different semantics in some backends (including BQ) in these two cases: (1) with_select_statement.with_(alias, query) and (2) select_statement.with_(alias, query). In (1), sqlglot appends query as the last CTE in with_select_statement, which means that in backends with top-to-bottom CTE evaluation the alias cannot be used by any of the other CTEs. In (2), sqlglot will turn the vanilla select statement into a with-select, and make query into a CTE that is always evaluated before the select's body.

My change ensures case (2) is always used by popping the top-level with arg from all the sqlglot expressions, then re-assembling them into a single with-select statement with all relevant CTEs in the right order and the body of the user query as the select part of the statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, not sure I'm following why (1) is a problem.

It seems like (1) is a scenario that is tough to get into. I realize that your tests are probably covering this, so just bear with me.

It sounds like maybe you're suggesting that it's possible for with_select_statement to already be referencing alias before its with_ method is called, which seems super weird.

Do you have an example of that scenario, or can you point me to which covers this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the var names I picked above are a bit confusing, so just to make sure we have the right ordering in our head let's use test_embedded_cte_with_alias_simple as the example: expr is the user Ibis expression with an alias, and the user calls .sql(<with statement>) on it, where the with statement contains a reference to the alias.

Without this PR, Ibis will call parsed_user_query.with_(parsed_ibis_expr), which is how you end up with the weird case where with_select_statement already references the alias.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small nits to replace sg.exp with sge, since we use that everywhere else (or try to anyway!)

this = results[cte]
if "alias" in this.args:
this = this.this
modified_cte = sg.exp.CTE(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
modified_cte = sg.exp.CTE(
modified_cte = sge.CTE(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!


ctes = [
*compiled_ibis_expr.ctes,
sg.exp.CTE(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sg.exp.CTE(
sge.CTE(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql Backends that generate SQL tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Table aliasing broken with BQ CTEs
2 participants