Implement disambiguation for unqualified columns in `ORDER BY` #232

JanJakes · 2025-09-01T15:35:01Z

Resolves #228.
Closes #231.

adamziel · 2025-09-02T14:02:20Z

That's much simpler than the alternative, thank you @JanJakes

…n names

See: WordPress/wordpress-develop#9602

adamziel · 2025-09-03T12:01:27Z

@JanJakes is this one ready for a review?

JanJakes · 2025-09-03T12:34:21Z

@adamziel It should be ready now.

I was also suspecting that we may need to reuse this for UPDATE and DELETE statements, but it looks like we're good:

You can also perform UPDATE operations covering multiple tables. However, you cannot use ORDER BY or LIMIT with a multiple-table UPDATE.

From https://dev.mysql.com/doc/refman/8.0/en/update.html.

You cannot use ORDER BY or LIMIT in a multiple-table DELETE.

From https://dev.mysql.com/doc/refman/9.1/en/delete.html.

adamziel · 2025-09-03T12:46:39Z

tests/WP_SQLite_Driver_Tests.php

+		 *        SELECT t1.name AS col, t2.name AS col ... ORDER BY col
+		 */
+		$this->assertQuery( 'SELECT t1.name AS col, t2.name AS col FROM t1 JOIN t2 ON t2.id = t1.id ORDER BY col' );
+	}


Let's also test syntactically tricky cases such as:

WITH temp(name) AS ( VALUES ('a'), ('b') ) SELECT t1.name, (SELECT name from temp WHERE id = 5) FROM t1 ORDER BY name; WITH temp(name, b) AS (SELECT * FROM t2) SELECT t1.name, (SELECT name from temp WHERE id = 5) FROM t1 ORDER BY name; SELECT * FROM ( SELECT t1.name, t2.name FROM t1 JOIN t2 ON t2.id = t1.id ORDER BY name ) ORDER BY col

It's fine if they fail, we need to set the line somewhere. Let's just make sure we account for them in the test suite.

@adamziel The first one doesn't seem to be a valid MySQL syntax (something about the VALUES part). The last one, we already have very similar. I'll add a test using the WITH syntax.

Oh, sorry, the first one is SQLite

adamziel · 2025-09-03T12:47:50Z

tests/WP_SQLite_Driver_Tests.php

+		);
+
+		/*
+		 * The following query fails with "ambiguous column name" in MySQL, but


So sometimes it's MySQL that fails with "ambiguous column name" and sometimes it's SQLite? I've noticed there's a different comment that talks about SQLite failing.

Yes, both MySQL and SQLite throw that error, and they slightly differ in edge cases:

SQLite doesn't use the SELECT list to try to disambiguate. That's what we're emulating.

Both, interestingly, allow multiple identical explicit aliases (1 AS col, 2 AS col, 3 AS col) — but here they may slightly differ in what they would sort by with ORDER BY col.

MySQL doesn't allow a special case when multiple columns share the same explicit alias. Even if these are entirely different columns (id and name, for example). SQLite is fine with that and handles it the same as in point 2.

We're not emulating anything for 2 and 3 — just passing it to SQLite "as is," because it can handle it, just with these edge case variations.

adamziel · 2025-09-03T12:51:18Z

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

+		 *   SELECT t1.name FROM t1 JOIN t2 ON t2.t1_id = t1.id ORDER BY name
+		 *
+		 * This is because MySQL first considers the "name" column that was used
+		 * in the SELECT list. If it is unambiguous, it will be used in ORDER BY.


Should we also do this for all SELECT clauses such as JOIN, HAVING, GROUP BY, etc?

@adamziel Oh, good catch! I tried it in JOIN and WHERE and MySQL didn't disambiguate there. But in HAVING and GROUP BY it seems to work. Let me check if it's easy to reuse the disambiguation map for these cases.

Got it working. It's nicely reusable for both of these using a private method. I'll polish the code a bit and will push the changes.

WINDOW might be another (unlikely) one, but I'm not sure

Yeah, this one also disambiguates in MySQL:

SELECT t1.name, SUM(t1.name) OVER (PARTITION BY name) FROM t1 JOIN t2 ON t2.id = t1.id;

There are quite a few nuances with ORDER BY and HAVING already, so for now, I'd leave this out, I guess.

Yup, makes sense. We don't have to solve everything at once. The queries we don't fix will fail so we'll get clear feedback if and when that comes up

adamziel · 2025-09-03T12:53:22Z

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

+				// Support also parenthesized column references (e.g. "(t.id)").
+				$select_item_expr = $this->unnest_parenthesized_expression( $select_item_expr );
+
+				// Consider only simple and parenthesized column references.


It should be fine for now. Let's just leave a clear TODO to revisit this once a use-case reveals itself. It may take some months or years and a clear callout will be a huge gift once it happens.

Oh, what I mean is here "Consider only simple and parenthesized column references, because that's what MySQL does." As soon as you do any other expression, e.g. SELECT col + 1 ... ORDER BY col, it won't recognize the column and disambiguate. It will only recognize it as SELECT id, SELECT (id), etc. The same for the ORDER BY clause.

I suspect MySQL may be just normalizing these extra no-op parentheses at some point and that's why it works.

Ooooh, gotcha! Let's document that, then

adamziel

I like the simplicity of it and that it only takes computing cycles when the query needs disambiguation and not pre-emptively like the other PR did. I left a few, rather small notes. This looks great overall – thank you!

adamziel · 2025-09-03T12:55:18Z

wp-setup.sh

@@ -37,14 +37,16 @@ services:
      - ../:/var/www/src/wp-content/plugins/sqlite-database-integration

  php:
-    image: wordpressdevelop/php:8.3-fpm
+    # PHP temporarily pinned to 8.3.10, see: https://github.com/WordPress/wordpress-develop/pull/9602


Aha, good catch!

adamziel · 2025-09-03T12:55:50Z

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

+	}
+
+	/**
+	 * Unnest parenthesized MySQL expression node.


I'm really glad this came up. Nice!

JanJakes · 2025-09-05T09:33:35Z

@adamziel I pushed the GROUP BY and HAVING support and some more complex cases like CTEs, UNIONs, etc.

adamziel · 2025-09-05T10:29:57Z

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

+		$having_clause   = null;
+		if ( $group_by || $having ) {
+			$select_item_list   = $node->get_first_child_node( 'selectItemList' );
+			$disambiguation_map = $this->create_select_item_disambiguation_map( $select_item_list );


Let's add an inline comment here to say something like "Disambiguate column references. See the disambiguate_item method below for more context."

adamziel · 2025-09-05T10:30:32Z

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

+				if ( $child instanceof WP_Parser_Node && 'groupByClause' === $child->rule_name ) {
+					$parts[] = $group_by_clause;
+				} elseif ( $child instanceof WP_Parser_Node && 'havingClause' === $child->rule_name ) {
+					// Translate "HAVING ..." without "GROUP BY ..." to "GROUP BY 1 HAVING ...".


Let's explain why we're doing that

adamziel · 2025-09-05T10:32:16Z

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

+		$having_clause   = null;
+		if ( $group_by || $having ) {
+			$select_item_list   = $node->get_first_child_node( 'selectItemList' );
+			$disambiguation_map = $this->create_select_item_disambiguation_map( $select_item_list );


A local variable is much simpler than the stack-based approach proposed by GPT, nice!

@adamziel The HAVING behavior made me think about how we could pass translation logic customizations down the recursion (as GPT did) for cases when we want to disambiguate more deeply nested nodes (e.g., column references in HAVING). I have an idea for an API that could maybe solve it a bit more universally, without per-case stacks, etc. Something like this:

$having_clause = $this->translate_with_customization( $having_expr, function ( $node ) use ( $disambiguation_map ) { if ( $node instanceof WP_Parser_Node && 'columnRef' === $node->rule_name ) { return $this->disambiguate_item( $disambiguation_map, $node ); } return $this->translate( $node ); } );

This way, we could pass the $disambiguation_map deep down the tree while keeping the modification collocated with the code that handles it at the higher level.

adamziel · 2025-09-05T10:33:27Z

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

+			$parts = array();
+			foreach ( $node->get_children() as $child ) {
+				if ( $child instanceof WP_Parser_Node && 'orderClause' === $child->rule_name ) {
+					$parts[] = 'ORDER BY ' . implode( ', ', $disambiguated_order_list );


Just to make sure, complex expressions that don't need disambiguation continue to work here, right? As in ORDER BY ABS((SELECT 4)) etc.? I assume yes since all the tests pass, but I still want to ask.

Yeah, it's the ?? in https://github.com/WordPress/sqlite-database-integration/pull/232/files/9d2c53660a9b5407828dd4f875ff98c01a113d83#diff-f0696e42750cef2f20fda5ea479b1ac09836a9c72d76ca0722f0a7af7879a011R3008 that handles that. (I also tried that query just to be sure.)

adamziel · 2025-09-05T10:35:50Z

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

+			return $unnested === $children[0] ? $node : $unnested;
+		}
+
+		// Unnest "OPEN_PAR_SYMBOL exprList CLOSE_PAR_SYMBOL" to "exprList".


Do we unnest things like SELECT ABS((SELECT MIN(balance) FROM accounts)) AS smallest_debt;? Or leave them alone?

@adamziel The unnest logic now only descends some basic expression nodes (expr, boolPri, predicate, 'bitExpr', simpleExpr), so in this case, it wouldn't do anything, as it contains some function call node on some level before reaching the OPEN_PAR_SYMBOL exprList CLOSE_PAR_SYMBOL.

I'm suspecting that maybe the MySQL parser removes all extra parentheses in some preprocessing, but who knows. We could try to find it in their codebase and then improve accordingly. For example, when you do SELECT (((name))) in MySQL, PDO will report that the select column name is just name. We don't do this part yet; I will create a ticket.

If I were to pass here directly the function call arguments node, then something like that could happen, but in the current usage, it should not be possible—a function call will always be under an expression, and the arguments list will always be under a function call node, etc.

adamziel · 2025-09-05T10:36:59Z

tests/WP_SQLite_Driver_Translation_Tests.php

+			'SELECT t1.name FROM t1 INTERSECT SELECT t2.name FROM t2 ORDER BY name'
+		);
+
+		// Test a complex query with CTEs.


nice, thank you

adamziel · 2025-09-05T10:37:11Z

tests/WP_SQLite_Driver_Translation_Tests.php

+		// With INTERSECT, the column should not be disambiguated.
+		$this->assertQuery(
+			'SELECT `t1`.`name` FROM `t1` INTERSECT SELECT `t2`.`name` FROM `t2` ORDER BY `name`',
+			'SELECT t1.name FROM t1 INTERSECT SELECT t2.name FROM t2 ORDER BY name'


TIL about INTERSECT

adamziel

I only left minor questions and nitpicks. This PR seems good to go and it's a quality work that covers a lot of bases. It's weird how such a nuance required so much research! Thank you for all the great work on this @JanJakes

JanJakes · 2025-09-05T11:19:58Z

@adamziel Added some more docs, and it should be ready now. Thanks!

JanJakes added 2 commits September 1, 2025 17:30

Implement disambiguation for unqualified columns in ORDER BY

f872535

Support ORDER BY disambiguation also for parenthesised column names

4268dea

JanJakes mentioned this pull request Sep 1, 2025

Resolve ambiguous column names in ORDER BY #231

Closed

JanJakes force-pushed the order-by-disambiguation branch from 5093169 to cefd5ba Compare September 3, 2025 09:25

JanJakes added 5 commits September 3, 2025 11:32

Improve readability and docs, add tests

3d05c34

Support ORDER BY disambiguation also for parenthesised ORDER BY colum…

d5add40

…n names

Add a TODO for database-qualified identifiers

9b35d57

Add a fix for SSL issues on the CI

b19dd1f

See: WordPress/wordpress-develop#9602

Add PHPCS exclusion for e2e tests node_modules

0903628

JanJakes force-pushed the order-by-disambiguation branch from b6cc64d to 0903628 Compare September 3, 2025 09:33

Add more tests and document edge cases

e716007

JanJakes marked this pull request as ready for review September 3, 2025 12:32

JanJakes requested a review from adamziel September 3, 2025 12:32

adamziel reviewed Sep 3, 2025

View reviewed changes

adamziel approved these changes Sep 3, 2025

View reviewed changes

adamziel reviewed Sep 3, 2025

View reviewed changes

JanJakes added 2 commits September 4, 2025 11:22

Extract and split disambiguation logic for reuse in GROUP BY and HAVING

9c3afe6

Implement disambiguation for unqualified columns in GROUP BY and HAVING

946320a

aristath approved these changes Sep 5, 2025

View reviewed changes

Fix disambiguation behavior with CTEs, UNION, EXCEPT, and INTERSECT

9d2c536

adamziel reviewed Sep 5, 2025

View reviewed changes

adamziel approved these changes Sep 5, 2025

View reviewed changes

Add more docs, reorder translate methods as per the grammar

f98f1fd

adamziel merged commit f3b4a75 into develop Sep 5, 2025
14 checks passed

adamziel deleted the order-by-disambiguation branch September 5, 2025 11:41

Implement disambiguation for unqualified columns in ORDER BY #232

Implement disambiguation for unqualified columns in ORDER BY #232

Conversation

JanJakes commented Sep 1, 2025

Uh oh!

adamziel commented Sep 2, 2025

Uh oh!

adamziel commented Sep 3, 2025

Uh oh!

JanJakes commented Sep 3, 2025

Uh oh!

adamziel Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamziel Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamziel left a comment

Choose a reason for hiding this comment

Uh oh!

adamziel Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanJakes commented Sep 5, 2025

Uh oh!

adamziel Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamziel left a comment

Choose a reason for hiding this comment

Implement disambiguation for unqualified columns in `ORDER BY` #232

Implement disambiguation for unqualified columns in `ORDER BY` #232

adamziel Sep 3, 2025 •

edited

Loading

adamziel Sep 3, 2025 •

edited

Loading

adamziel Sep 3, 2025 •

edited

Loading

adamziel Sep 5, 2025 •

edited

Loading