Skip to content

UNION ALL with ORDER BY results are inconsistent #5970

@stuartcarnie

Description

@stuartcarnie

Describe the bug

The results for a UNION ALL that specifies an ORDER BY is sometimes sorted incorrectly.

Note

I bisected to the first bad commit being 5fc91cc via #5661

To Reproduce

Using datafusion-cli (version 22.0) execute;

WITH
  m0(time, tag0, f64) AS (VALUES (1667181600000000000, 'val00', 10.1), (1667181610000000000, 'val00', 21.2), (1667181620000000000, 'val00', 11.2), (1667181630000000000, 'val00', 19.2), (1667181600000000000, 'val01', 11.3), (1667181600000000000, 'val02', 10.4), (1667181610000000000, 'val00', 18.9)),

  m1(time, tag0, f64) AS (VALUES (1667181600000000000, 'val00', 100.5), (1667181610000000000, 'val00', 200.6), (1667181600000000000, 'val01', 101.7)),

  t AS (
    SELECT 'm0' as "iox::measurement", tag0, 0::timestamp as time, COUNT(f64), SUM(f64), stddev(f64) FROM m0 GROUP BY 1, 2, 3), 

  u AS (
    SELECT 'm1' as "iox::measurement", tag0, 0::timestamp as time, COUNT(f64), SUM(f64), stddev(f64) FROM m1 GROUP BY 1, 2, 3)
SELECT * FROM t
UNION ALL
SELECT * FROM u
ORDER BY 1, 2, 3;

Expected behavior

Output should be sorted on the first 3 columns:

+------------------+-------+---------------------+------------+----------+-------------------+
| iox::measurement | tag0  | time                | COUNT(f64) | SUM(f64) | STDDEV(f64)       |
+------------------+-------+---------------------+------------+----------+-------------------+
| m0               | val00 | 1970-01-01T00:00:00 | 5          | 80.6     | 5.085961069453836 |
| m0               | val01 | 1970-01-01T00:00:00 | 1          | 11.3     |                   |
| m0               | val02 | 1970-01-01T00:00:00 | 1          | 10.4     |                   |
| m1               | val00 | 1970-01-01T00:00:00 | 2          | 301.1    | 70.7813887967734  |
| m1               | val01 | 1970-01-01T00:00:00 | 1          | 101.7    |                   |
+------------------+-------+---------------------+------------+----------+-------------------+

Actual behavior

Output is sometimes inconsistent, noting that the sort order is incorrect:

+------------------+-------+---------------------+------------+----------+-------------------+
| iox::measurement | tag0  | time                | COUNT(f64) | SUM(f64) | STDDEV(f64)       |
+------------------+-------+---------------------+------------+----------+-------------------+
| m0               | val00 | 1970-01-01T00:00:00 | 5          | 80.6     | 5.085961069453836 |
| m0               | val01 | 1970-01-01T00:00:00 | 1          | 11.3     |                   |
| m1               | val00 | 1970-01-01T00:00:00 | 2          | 301.1    | 70.7813887967734  |
| m0               | val02 | 1970-01-01T00:00:00 | 1          | 10.4     |                   |
| m1               | val01 | 1970-01-01T00:00:00 | 1          | 101.7    |                   |
+------------------+-------+---------------------+------------+----------+-------------------+

Additional context

DataFusion versions 18, 19, 20 and 21 work consistently after numerous executions of the same query

The following also works on 22

WITH
  m0(time, tag0, f64) AS (VALUES (1667181600000000000, 'val00', 10.1), (1667181610000000000, 'val00', 21.2), (1667181620000000000, 'val00', 11.2), (1667181630000000000, 'val00', 19.2), (1667181600000000000, 'val01', 11.3), (1667181600000000000, 'val02', 10.4), (1667181610000000000, 'val00', 18.9)),
  m1(time, tag0, f64) AS (VALUES (1667181600000000000, 'val00', 100.5), (1667181610000000000, 'val00', 200.6), (1667181600000000000, 'val01', 101.7)),
  t AS (SELECT 'm0' as "iox::measurement", tag0, 0::timestamp as time, COUNT(f64), SUM(f64), stddev(f64) FROM m0 GROUP BY 1, 2, 3), 
  u AS (SELECT 'm1' as "iox::measurement", tag0, 0::timestamp as time, COUNT(f64), SUM(f64), stddev(f64) FROM m1 GROUP BY 1, 2, 3), 
-- add another CTE
  v AS (
  SELECT * FROM t
  UNION ALL
  SELECT * FROM u)
SELECT * FROM v
ORDER BY 1, 2, 3;

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions