Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORDER BY picks wrong column of deduplication #11855

Open
2 tasks done
soerenwolfers opened this issue Apr 26, 2024 · 0 comments
Open
2 tasks done

ORDER BY picks wrong column of deduplication #11855

soerenwolfers opened this issue Apr 26, 2024 · 0 comments

Comments

@soerenwolfers
Copy link

soerenwolfers commented Apr 26, 2024

What happens?

ORDER BY doesn't work right with deduplication.

Ideally, I'd want duckdb to throw if I accidentally use the same column identifier twice rather than silently continuing ( #11520) . However, in the absence of that I'd expect that the ORDER BY x actually orders by the column that's later called x.

To Reproduce

CREATE OR REPLACE TABLE df AS (SELECT random() a FROM range(10));
SELECT unnest(range(10)) AS r2, unnest(ARRAY(FROM df)) AS r2 ORDER BY r2;
┌───────┬─────────────────────┐
│  r2   │        r2_1         │
│ int64 │       double        │
├───────┼─────────────────────┤
│     9 │ 0.06163644092157483 │
│     5 │ 0.22715973923914135 │
│     6 │  0.2555760988034308 │
│     1 │  0.4000373675953597 │
│     8 │  0.5027215098962188 │
│     0 │  0.5886263893917203 │
│     2 │  0.7522072196006775 │
│     3 │  0.8781230258755386 │
│     7 │  0.9168853838928044 │
│     4 │  0.9957989219110459 │
├───────┴─────────────────────┤
│ 10 rows           2 columns │
└─────────────────────────────┘

Would have expected ordering by the column that's actually called r2 after deduplication.

OS:

Linux

DuckDB Version:

0.10.2

DuckDB Client:

Python

Full Name:

Soeren Wolfers

Affiliation:

G-Research

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants