Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING] Switch to NULLS LAST as default null sorting order, instead of NULLS FIRST #7174

Merged
merged 6 commits into from Apr 21, 2023

Conversation

Mytherin
Copy link
Collaborator

This PR changes the default null sorting order from NULLS FIRST to NULLS LAST. That means that when ordering values without a specified null ordering clause, NULL values always end up after all other values:

CREATE TABLE integers(i integer);
INSERT INTO integers VALUES (1), (2), (3), (NULL);
SELECT * FROM integers ORDER BY i;
┌───────┐
│   i   │
│ int32 │
├───────┤
│     1 │
│     2 │
│     3 │
│  NULL │
└───────┘
SELECT * FROM integers ORDER BY i DESC;
┌───────┐
│   i   │
│ int32 │
├───────┤
│     3 │
│     2 │
│     1 │
│  NULL │
└───────┘

NULLS LAST is more intuitive when using ORDER BY in conjunction with LIMIT. With NULLS FIRST, the following queries return NULL values instead of the desired top-n results:

SELECT * FROM integers ORDER BY i NULLS FIRST LIMIT 1;
┌───────┐
│   i   │
│ int32 │
├───────┤
│  NULL │
└───────┘
SELECT * FROM integers ORDER BY i DESC NULLS FIRST LIMIT 1;
┌───────┐
│   i   │
│ int32 │
├───────┤
│  NULL │
└───────┘

With NULLS LAST, the desired values are returned instead:

SELECT * FROM integers ORDER BY i NULLS LAST LIMIT 1;
┌───────┐
│   i   │
│ int32 │
├───────┤
│     1 │
└───────┘
SELECT * FROM integers ORDER BY i DESC NULLS LAST LIMIT 1;
┌───────┐
│   i   │
│ int32 │
├───────┤
│     3 │
└───────┘

This fixes a common footgun/source of confusion for users.

SQL Standard & Other Systems

Unfortunately null ordering is extremely inconsistent across systems, and is not defined in the SQL standard if NULLS FIRST or NULLS LAST are not explicitly provided.

There are four different options for NULL ordering:

  1. NULLS FIRST when ASC, NULLS LAST when DESC
  2. NULLS LAST when ASC, NULLS FIRST when DESC
  3. Always NULLS FIRST
  4. Always NULLS LAST

Previously we implemented option (3) - this PR switches us to option (4). This option seems the least problematic since it avoids the top-n pitfall described above for both ASC/DESC ordering - and is consistent with more other systems, particularly systems in the analytics space.

Other Systems
(1) NULLS FIRST when ASC, NULLS LAST when DESC

MySQL, SQLite, BigQuery

(2) NULLS LAST when ASC, NULLS FIRST when DESC

PostgreSQL, Snowflake

(3) Always NULLS FIRST

DuckDB (current)

(4) Always NULLS LAST

DuckDB (new), Presto/Trino, Pandas, ClickHouse

default_null_order

This PR also expands the default_null_order option so that it can correctly model all of these settings:

SET default_null_order='nulls_first_on_asc_last_on_desc';
-- or
SET default_null_order='sqlite';

SET default_null_order='nulls_last_on_asc_first_on_desc';
--or
SET default_null_order='postgres';

In addition to the currently available settings:

SET default_null_order='nulls_first';
SET default_null_order='nulls_last';

In order to revert to the previous behavior of the system, the following setting can be used:

SET default_null_order='nulls_first';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant