Skip to content

Conversation

@watford-ep
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

ORDER BY with a CASE statement didn't always work, raising a sanity check error in SortPreservingMergeExec. The plan showed that the partitions all had the same ordering, but for whatever reason they were not detected as being equal. Using a single partition succeeded always.

What changes are included in this PR?

The changes are non-obvious and I spent a lot of time bisecting/debug printing and landed on a failure in bounds checking with boolean interval arithmetic. Returning UNCERTAIN if either leg of the interval is NULL resolves the upstream issue where CASE statements end up being deemed Unordered.

My rust-fu is hobbyist at best, so while this appears to resolve my issue I cannot for-certain exclaim that I've solved it all (Claude 4.5 agrees with my fix, but that's not an indication its any good). I'm also reasonably certain my unit tests are more ham fisted than necessary.

Are these changes tested?

  1. Yes, unit tests have been added.

Are there any user-facing changes?

This does not change any behavior beyond resolving a bug with a valid SQL statement.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate labels Oct 28, 2025
@watford-ep
Copy link
Contributor Author

watford-ep commented Oct 28, 2025

I've also confirmed this fixes my issue with the following query issued by npgsql (nb: there is a correlated subquery which I fixup to use MAX(rngtypid) in the projection):

SELECT ns.nspname, t.oid, t.typname, t.typtype, t.typnotnull, t.elemtypoid
    FROM (
        -- Arrays have typtype=b - this subquery identifies them by their typreceive and converts their typtype to a
        -- We first do this for the type (innerest-most subquery), and then for its element type
        -- This also returns the array element, range subtype and domain base type as elemtypoid
        SELECT
            typ.oid, typ.typnamespace, typ.typname, typ.typtype, typ.typrelid, typ.typnotnull, typ.relkind,
            elemtyp.oid AS elemtypoid, elemtyp.typname AS elemtypname, elemcls.relkind AS elemrelkind,
            CASE WHEN elemproc.proname='array_recv' THEN 'a' ELSE elemtyp.typtype END AS elemtyptype
        FROM (
            SELECT typ.oid, typnamespace, typname, typrelid, typnotnull, relkind, typelem AS elemoid,
                CASE WHEN proc.proname='array_recv' THEN 'a' ELSE typ.typtype END AS typtype,
                CASE
                    WHEN proc.proname='array_recv' THEN typ.typelem
                    WHEN typ.typtype='r' THEN rngsubtype
                    WHEN typ.typtype='m' THEN (SELECT rngtypid FROM pg_range WHERE rngmultitypid = typ.oid)
                    WHEN typ.typtype='d' THEN typ.typbasetype
                END AS elemtypoid
            FROM pg_type AS typ
            LEFT JOIN pg_class AS cls ON (cls.oid = typ.typrelid)
            LEFT JOIN pg_proc AS proc ON proc.oid = typ.typreceive
            LEFT JOIN pg_range ON (pg_range.rngtypid = typ.oid)
        ) AS typ
        LEFT JOIN pg_type AS elemtyp ON elemtyp.oid = elemtypoid
        LEFT JOIN pg_class AS elemcls ON (elemcls.oid = elemtyp.typrelid)
        LEFT JOIN pg_proc AS elemproc ON elemproc.oid = elemtyp.typreceive
    ) AS t
    JOIN pg_namespace AS ns ON (ns.oid = typnamespace)
    WHERE
        typtype IN ('b', 'r', 'm', 'e', 'd') OR -- Base, range, multirange, enum, domain
        (typtype = 'c' AND relkind='c') OR -- User-defined free-standing composites (not table composites) by default
        (typtype = 'p' AND typname IN ('record', 'void')) OR -- Some special supported pseudo-types
        (typtype = 'a' AND (  -- Array of...
            elemtyptype IN ('b', 'r', 'm', 'e', 'd') OR -- Array of base, range, multirange, enum, domain
            (elemtyptype = 'p' AND elemtypname IN ('record', 'void')) OR -- Arrays of special supported pseudo-types
            (elemtyptype = 'c' AND elemrelkind='c') -- Array of user-defined free-standing composites (not table composites) by default
        ))
    ORDER BY CASE
           WHEN typtype IN ('b', 'e', 'p') THEN 0           -- First base types, enums, pseudo-types
           WHEN typtype = 'r' THEN 1                        -- Ranges after
           WHEN typtype = 'm' THEN 2                        -- Multiranges after
           WHEN typtype = 'c' THEN 3                        -- Composites after
           WHEN typtype = 'd' AND elemtyptype <> 'a' THEN 4 -- Domains over non-arrays after
           WHEN typtype = 'a' THEN 5                        -- Arrays after
           WHEN typtype = 'd' AND elemtyptype = 'a' THEN 6  -- Domains over arrays last
    END

Copy link
Member

@asolimando asolimando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have left a few comments, but consider that I am new to Datafusion so they might be off-mark due to missing context

WITH typ(oid, typnamespace, typname, typtype) AS (
SELECT * FROM (VALUES (1, 10, 't1', 'b'))
UNION ALL SELECT * FROM (VALUES (2, NULL, 't2', 'b'))
UNION ALL SELECT * FROM (VALUES (3, 12, 't3', NULL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe adding a non-NULL value which is retained by the filter but not matching any WHEN branch like m? We are exercising the implicit ELSE null with rows having typtype as NULL value, but it might be worth checking with explicit values too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in dffb0c2

// TODO: bounds MAY be certain if either LHS or RHS is certainly false

// Return UNCERTAIN when intervals don't have concrete boolean bounds
_ => Ok(Self::UNCERTAIN),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This make sense to me, under the classic 3-valued logic, in such cases it's fine to return unknown.

This method should be used with care as it would probably be unsafe for simplifications, as the TODO suggests we are not handling cases like FALSE AND UNKNOWN which is FALSE, or TRUE OR UNKNOWN which is TRUE.

Depending on where we invoke this method, we need to be aware of the unknown-as-false semantics (filters) and unknown-as-null semantics (order by, projections, etc.), and decide if it is safe to use this method or not.

Since we were previously erroring out, I think it's in any case an improvement but just wanted to point that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find places that would be confused by this change, but there is quite a bit of logic around Interval and tracing the implications are a bit beyond me as a first time datafusion contributor. In a good way it seems Interval::UNCERTAIN is handled uniformly from what I found. There also seems to be test coverage in the optimizer if those TODOs were executed on (test_simplify_with_guarantee, test_inequalities_maybe_null, etc).

@alamb @asolimando I'm happy to explore adding those bounds here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting...I started by adding tests for and/or:

#[test]
fn test_and_null_boolean_intervals() -> Result<()> {
    let null_interval =
        Interval::try_new(ScalarValue::Boolean(None), ScalarValue::Boolean(None))?;

    let and_result = null_interval.and(&Interval::CERTAINLY_FALSE)?;
    assert_eq!(and_result, Interval::CERTAINLY_FALSE);

    let and_result = null_interval.and(&Interval::CERTAINLY_TRUE)?;
    assert_eq!(and_result, Interval::UNCERTAIN);

    let and_result = null_interval.and(&null_interval)?;
    assert_eq!(and_result, Interval::UNCERTAIN);

    Ok(())
}

#[test]
fn test_or_null_boolean_intervals() -> Result<()> {
    let null_interval =
        Interval::try_new(ScalarValue::Boolean(None), ScalarValue::Boolean(None))?;

    let or_result = null_interval.or(&Interval::CERTAINLY_FALSE)?;
    assert_eq!(or_result, Interval::UNCERTAIN);

    let or_result = null_interval.or(&Interval::CERTAINLY_TRUE)?;
    assert_eq!(or_result, Interval::CERTAINLY_TRUE);

    let or_result = null_interval.or(&null_interval)?;
    assert_eq!(or_result, Interval::UNCERTAIN);

    Ok(())
}

And they already worked without any code changes, so I think my TODO statements were premature and unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@watford-ep sounds good! Can you also add the reverse (true or null where you now test null or true) just to make sure that we short-circuit the right way independently from the operands' order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) and removed core Core DataFusion crate labels Oct 29, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @watford-ep and @asolimando -- I agree this looks like a good fix.

I had one question about one of the tests which seems unrelated to the rest of this PR, but otherwise 👍

I also took the liberty of pushing a commit that moves the test to sqllogictests as I think that is easier to understand in the context of this project

let df = ctx.sql(sql).await?;
let plan = df.create_physical_plan().await;

assert!(plan.is_ok(), "planning failed: {:?}", plan.err());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this test fails with an assert without the code in this PR

---- physical_optimizer::sanity_checker::test_order_by_case_requirement stdout ----

thread 'physical_optimizer::sanity_checker::test_order_by_case_requirement' panicked at datafusion/core/tests/physical_optimizer/sanity_checker.rs:404:5:
planning failed: Some(Context("SanityCheckPlan", Plan("Plan: [\"SortPreservingMergeExec: [CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST]\", \"  SortExec: expr=[CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST], preserve_partitioning=[true]\", \"    CoalesceBatchesExec: target_batch_size=8192\", \"      HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(oid@0, typnamespace@1)], projection=[nspname@1, oid@2, typname@4, typtype@5]\", \"        ProjectionExec: expr=[column1@0 as oid, column2@1 as nspname]\", \"          DataSourceExec: partitions=1, partition_sizes=[1]\", \"        ProjectionExec: expr=[column1@0 as oid, column2@1 as typnamespace, column3@2 as typname, column4@3 as typtype]\", \"          UnionExec\", \"            CoalesceBatchesExec: target_batch_size=8192\", \"              FilterExec: column4@3 IN (SET) ([b, r, m, e, d])\", \"                DataSourceExec: partitions=1, partition_sizes=[1]\", \"            ProjectionExec: expr=[column1@0 as column1, CAST(column2@1 AS Int64) as column2, column3@2 as column3, column4@3 as column4]\", \"              RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1\", \"                CoalesceBatchesExec: target_batch_size=8192\", \"                  FilterExec: column4@3 IN (SET) ([b, r, m, e, d])\", \"                    DataSourceExec: partitions=1, partition_sizes=[1]\", \"            ProjectionExec: expr=[column1@0 as column1, column2@1 as column2, column3@2 as column3, CAST(column4@3 AS Utf8) as column4]\", \"              RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1\", \"                CoalesceBatchesExec: target_batch_size=8192\", \"                  FilterExec: CAST(column4@3 AS Utf8) IN (SET) ([b, r, m, e, d])\", \"                    DataSourceExec: partitions=1, partition_sizes=[1]\"] does not satisfy order requirements: [CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST]. Child-0 order: [[CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST]]")))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


WITH typ(oid, typnamespace, typname, typtype) AS (
SELECT * FROM (VALUES (1, 10, 't1', 'b'))
UNION ALL SELECT * FROM (VALUES (2, NULL, 't2', 'b'))
UNION ALL SELECT * FROM (VALUES (3, 12, 't3', NULL))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in dffb0c2

@github-actions github-actions bot removed the physical-expr Changes to the physical-expr crates label Oct 29, 2025
Copy link
Member

@asolimando asolimando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

Thanks again @watford-ep and for reviews @asolimando !

@alamb alamb added this pull request to the merge queue Oct 30, 2025
Merged via the queue into apache:main with commit d2b93f5 Oct 30, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SortPreservingMerge sanity check rejects valid ORDER BY with CASE expression

3 participants