Skip to content

Conversation

JanJakes
Copy link
Member

@JanJakes JanJakes commented Sep 1, 2025

Resolves #228.
Closes #231.

@adamziel
Copy link
Collaborator

adamziel commented Sep 2, 2025

That's much simpler than the alternative, thank you @JanJakes

@JanJakes JanJakes force-pushed the order-by-disambiguation branch from 5093169 to cefd5ba Compare September 3, 2025 09:25
@JanJakes JanJakes force-pushed the order-by-disambiguation branch from b6cc64d to 0903628 Compare September 3, 2025 09:33
@adamziel
Copy link
Collaborator

adamziel commented Sep 3, 2025

@JanJakes is this one ready for a review?

@JanJakes JanJakes marked this pull request as ready for review September 3, 2025 12:32
@JanJakes JanJakes requested a review from adamziel September 3, 2025 12:32
@JanJakes
Copy link
Member Author

JanJakes commented Sep 3, 2025

@adamziel It should be ready now.

I was also suspecting that we may need to reuse this for UPDATE and DELETE statements, but it looks like we're good:

You can also perform UPDATE operations covering multiple tables. However, you cannot use ORDER BY or LIMIT with a multiple-table UPDATE.

From https://dev.mysql.com/doc/refman/8.0/en/update.html.

You cannot use ORDER BY or LIMIT in a multiple-table DELETE.

From https://dev.mysql.com/doc/refman/9.1/en/delete.html.

* SELECT t1.name AS col, t2.name AS col ... ORDER BY col
*/
$this->assertQuery( 'SELECT t1.name AS col, t2.name AS col FROM t1 JOIN t2 ON t2.id = t1.id ORDER BY col' );
}
Copy link
Collaborator

@adamziel adamziel Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also test syntactically tricky cases such as:

WITH temp(name) AS (
  VALUES
    ('a'),
    ('b')
)
SELECT t1.name, (SELECT name from temp WHERE id = 5) FROM t1 ORDER BY name;

WITH temp(name, b) AS (SELECT * FROM t2)
SELECT t1.name, (SELECT name from temp WHERE id = 5) FROM t1 ORDER BY name;

SELECT * FROM (
	SELECT t1.name, t2.name FROM t1 JOIN t2 ON t2.id = t1.id ORDER BY name
) ORDER BY col

It's fine if they fail, we need to set the line somewhere. Let's just make sure we account for them in the test suite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel The first one doesn't seem to be a valid MySQL syntax (something about the VALUES part). The last one, we already have very similar. I'll add a test using the WITH syntax.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, the first one is SQLite

);

/*
* The following query fails with "ambiguous column name" in MySQL, but
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sometimes it's MySQL that fails with "ambiguous column name" and sometimes it's SQLite? I've noticed there's a different comment that talks about SQLite failing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both MySQL and SQLite throw that error, and they slightly differ in edge cases:

  1. SQLite doesn't use the SELECT list to try to disambiguate. That's what we're emulating.
  2. Both, interestingly, allow multiple identical explicit aliases (1 AS col, 2 AS col, 3 AS col) — but here they may slightly differ in what they would sort by with ORDER BY col.
  3. MySQL doesn't allow a special case when multiple columns share the same explicit alias. Even if these are entirely different columns (id and name, for example). SQLite is fine with that and handles it the same as in point 2.

We're not emulating anything for 2 and 3 — just passing it to SQLite "as is," because it can handle it, just with these edge case variations.

* SELECT t1.name FROM t1 JOIN t2 ON t2.t1_id = t1.id ORDER BY name
*
* This is because MySQL first considers the "name" column that was used
* in the SELECT list. If it is unambiguous, it will be used in ORDER BY.
Copy link
Collaborator

@adamziel adamziel Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also do this for all SELECT clauses such as JOIN, HAVING, GROUP BY, etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel Oh, good catch! I tried it in JOIN and WHERE and MySQL didn't disambiguate there. But in HAVING and GROUP BY it seems to work. Let me check if it's easy to reuse the disambiguation map for these cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it working. It's nicely reusable for both of these using a private method. I'll polish the code a bit and will push the changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WINDOW might be another (unlikely) one, but I'm not sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this one also disambiguates in MySQL:

SELECT
  t1.name,
  SUM(t1.name) OVER (PARTITION BY name) FROM t1 JOIN t2 ON t2.id = t1.id;

There are quite a few nuances with ORDER BY and HAVING already, so for now, I'd leave this out, I guess.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, makes sense. We don't have to solve everything at once. The queries we don't fix will fail so we'll get clear feedback if and when that comes up

// Support also parenthesized column references (e.g. "(t.id)").
$select_item_expr = $this->unnest_parenthesized_expression( $select_item_expr );

// Consider only simple and parenthesized column references.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine for now. Let's just leave a clear TODO to revisit this once a use-case reveals itself. It may take some months or years and a clear callout will be a huge gift once it happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, what I mean is here "Consider only simple and parenthesized column references, because that's what MySQL does." As soon as you do any other expression, e.g. SELECT col + 1 ... ORDER BY col, it won't recognize the column and disambiguate. It will only recognize it as SELECT id, SELECT (id), etc. The same for the ORDER BY clause.

I suspect MySQL may be just normalizing these extra no-op parentheses at some point and that's why it works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh, gotcha! Let's document that, then

Copy link
Collaborator

@adamziel adamziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of it and that it only takes computing cycles when the query needs disambiguation and not pre-emptively like the other PR did. I left a few, rather small notes. This looks great overall – thank you!

@@ -37,14 +37,16 @@ services:
- ../:/var/www/src/wp-content/plugins/sqlite-database-integration

php:
image: wordpressdevelop/php:8.3-fpm
# PHP temporarily pinned to 8.3.10, see: https://github.com/WordPress/wordpress-develop/pull/9602
Copy link
Collaborator

@adamziel adamziel Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, good catch!

}

/**
* Unnest parenthesized MySQL expression node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really glad this came up. Nice!

@JanJakes
Copy link
Member Author

JanJakes commented Sep 5, 2025

@adamziel I pushed the GROUP BY and HAVING support and some more complex cases like CTEs, UNIONs, etc.

$having_clause = null;
if ( $group_by || $having ) {
$select_item_list = $node->get_first_child_node( 'selectItemList' );
$disambiguation_map = $this->create_select_item_disambiguation_map( $select_item_list );
Copy link
Collaborator

@adamziel adamziel Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an inline comment here to say something like "Disambiguate column references. See the disambiguate_item method below for more context."

if ( $child instanceof WP_Parser_Node && 'groupByClause' === $child->rule_name ) {
$parts[] = $group_by_clause;
} elseif ( $child instanceof WP_Parser_Node && 'havingClause' === $child->rule_name ) {
// Translate "HAVING ..." without "GROUP BY ..." to "GROUP BY 1 HAVING ...".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explain why we're doing that

$having_clause = null;
if ( $group_by || $having ) {
$select_item_list = $node->get_first_child_node( 'selectItemList' );
$disambiguation_map = $this->create_select_item_disambiguation_map( $select_item_list );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A local variable is much simpler than the stack-based approach proposed by GPT, nice!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel The HAVING behavior made me think about how we could pass translation logic customizations down the recursion (as GPT did) for cases when we want to disambiguate more deeply nested nodes (e.g., column references in HAVING). I have an idea for an API that could maybe solve it a bit more universally, without per-case stacks, etc. Something like this:

$having_clause = $this->translate_with_customization(
	$having_expr,
	function ( $node ) use ( $disambiguation_map ) {
		if ( $node instanceof WP_Parser_Node && 'columnRef' === $node->rule_name ) {
			return $this->disambiguate_item( $disambiguation_map, $node );
		}
		return $this->translate( $node );
	}
);

This way, we could pass the $disambiguation_map deep down the tree while keeping the modification collocated with the code that handles it at the higher level.

$parts = array();
foreach ( $node->get_children() as $child ) {
if ( $child instanceof WP_Parser_Node && 'orderClause' === $child->rule_name ) {
$parts[] = 'ORDER BY ' . implode( ', ', $disambiguated_order_list );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, complex expressions that don't need disambiguation continue to work here, right? As in ORDER BY ABS((SELECT 4)) etc.? I assume yes since all the tests pass, but I still want to ask.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return $unnested === $children[0] ? $node : $unnested;
}

// Unnest "OPEN_PAR_SYMBOL exprList CLOSE_PAR_SYMBOL" to "exprList".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we unnest things like SELECT ABS((SELECT MIN(balance) FROM accounts)) AS smallest_debt;? Or leave them alone?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel The unnest logic now only descends some basic expression nodes (expr, boolPri, predicate, 'bitExpr', simpleExpr), so in this case, it wouldn't do anything, as it contains some function call node on some level before reaching the OPEN_PAR_SYMBOL exprList CLOSE_PAR_SYMBOL.

I'm suspecting that maybe the MySQL parser removes all extra parentheses in some preprocessing, but who knows. We could try to find it in their codebase and then improve accordingly. For example, when you do SELECT (((name))) in MySQL, PDO will report that the select column name is just name. We don't do this part yet; I will create a ticket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were to pass here directly the function call arguments node, then something like that could happen, but in the current usage, it should not be possible—a function call will always be under an expression, and the arguments list will always be under a function call node, etc.

'SELECT t1.name FROM t1 INTERSECT SELECT t2.name FROM t2 ORDER BY name'
);

// Test a complex query with CTEs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thank you

// With INTERSECT, the column should not be disambiguated.
$this->assertQuery(
'SELECT `t1`.`name` FROM `t1` INTERSECT SELECT `t2`.`name` FROM `t2` ORDER BY `name`',
'SELECT t1.name FROM t1 INTERSECT SELECT t2.name FROM t2 ORDER BY name'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about INTERSECT

Copy link
Collaborator

@adamziel adamziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only left minor questions and nitpicks. This PR seems good to go and it's a quality work that covers a lot of bases. It's weird how such a nuance required so much research! Thank you for all the great work on this @JanJakes

@JanJakes
Copy link
Member Author

JanJakes commented Sep 5, 2025

@adamziel Added some more docs, and it should be ready now. Thanks!

@adamziel adamziel merged commit f3b4a75 into develop Sep 5, 2025
14 checks passed
@adamziel adamziel deleted the order-by-disambiguation branch September 5, 2025 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query Error: ambiguous column name in ORDER BY clause
3 participants