ORDER BY picks wrong column of deduplication #11855

soerenwolfers · 2024-04-26T21:44:24Z

What happens?

ORDER BY doesn't work right with deduplication.

Ideally, I'd want duckdb to throw if I accidentally use the same column identifier twice rather than silently continuing ( #11520) . However, in the absence of that I'd expect that the ORDER BY x actually orders by the column that's later called x.

To Reproduce

CREATE OR REPLACE TABLE df AS (SELECT random() a FROM range(10));
SELECT unnest(range(10)) AS r2, unnest(ARRAY(FROM df)) AS r2 ORDER BY r2;

┌───────┬─────────────────────┐
│  r2   │        r2_1         │
│ int64 │       double        │
├───────┼─────────────────────┤
│     9 │ 0.06163644092157483 │
│     5 │ 0.22715973923914135 │
│     6 │  0.2555760988034308 │
│     1 │  0.4000373675953597 │
│     8 │  0.5027215098962188 │
│     0 │  0.5886263893917203 │
│     2 │  0.7522072196006775 │
│     3 │  0.8781230258755386 │
│     7 │  0.9168853838928044 │
│     4 │  0.9957989219110459 │
├───────┴─────────────────────┤
│ 10 rows           2 columns │
└─────────────────────────────┘

Would have expected ordering by the column that's actually called r2 after deduplication.

OS:

Linux

DuckDB Version:

0.10.2

DuckDB Client:

Python

Full Name:

Soeren Wolfers

Affiliation:

G-Research

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Yes, I have

The text was updated successfully, but these errors were encountered:

soerenwolfers added the needs triage label Apr 26, 2024

szarnyasg added the reproduced label Apr 28, 2024

duckdblabs-bot removed the needs triage label Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORDER BY picks wrong column of deduplication #11855

ORDER BY picks wrong column of deduplication #11855

soerenwolfers commented Apr 26, 2024 •

edited by szarnyasg

ORDER BY picks wrong column of deduplication #11855

ORDER BY picks wrong column of deduplication #11855

Comments

soerenwolfers commented Apr 26, 2024 • edited by szarnyasg

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

soerenwolfers commented Apr 26, 2024 •

edited by szarnyasg