Describe the bug
While comparing simple SQL statements between PostgreSQL and DataFusion, I found several PostgreSQL compatibility mismatches where DataFusion returns a different answer or returns a value where PostgreSQL raises a domain error.
I excluded cases that are only unsupported PostgreSQL functions in DataFusion and minor type/display-format differences.
Environment
PostgreSQL was built locally from source:
PostgreSQL 19devel
commit aa1f93a3387ad619c14cea2b8ed01e6f49cb6600
DataFusion:
datafusion-cli 53.1.0
commit 1ab146ad6cc119c7656ae1def75fd40697e5f94a
Mismatches
1. ^ evaluates as bitwise XOR instead of PostgreSQL exponentiation
PostgreSQL:
DataFusion:
2. SIMILAR TO should treat % as a wildcard
SELECT 'abc' SIMILAR TO 'a%';
PostgreSQL:
DataFusion:
3. replace with an empty search string should be a no-op
SELECT replace('abc', '', 'x');
PostgreSQL:
DataFusion:
4. array_length of an empty array dimension should be NULL
SELECT array_length(array[]::int[], 1);
PostgreSQL:
DataFusion:
5. Negative array subscripts should not index from the end
SELECT (array[10,20,30])[-1];
PostgreSQL:
DataFusion:
6. time + interval should wrap within the 24-hour time domain
SELECT time '23:30' + interval '2 hours';
PostgreSQL:
DataFusion:
7. time - interval should wrap within the 24-hour time domain
SELECT time '01:30' - interval '2 hours';
PostgreSQL:
DataFusion:
8. extract(second ...) should preserve fractional seconds
SELECT extract(second from timestamp '2020-01-01 00:00:12.345678');
PostgreSQL:
DataFusion:
9. extract(milliseconds ...) should preserve fractional milliseconds
SELECT extract(milliseconds from timestamp '2020-01-01 00:00:12.345678');
PostgreSQL:
DataFusion:
10. regexp_count should count empty-pattern matches
SELECT regexp_count('abc', '');
PostgreSQL:
DataFusion:
11. regexp_instr with an empty pattern should return 1
SELECT regexp_instr('abc', '');
PostgreSQL:
DataFusion:
12. regexp_like should honor PostgreSQL multiline flag m
SELECT regexp_like(E'a\nb', '^b', 'm');
PostgreSQL:
DataFusion:
13. regexp_replace should honor PostgreSQL multiline flag m
SELECT regexp_replace(E'a\nb', '^b', 'x', 'm');
PostgreSQL:
DataFusion:
14. round(float8) should match PostgreSQL half-tie behavior
SELECT round(2.5::float8);
PostgreSQL:
DataFusion:
15. factorial(21) should not overflow when PostgreSQL returns a numeric answer
PostgreSQL:
DataFusion:
Overflow happened on FACTORIAL(21)
16. factorial of a negative value should error
PostgreSQL:
ERROR: factorial of a negative number is undefined
DataFusion:
17. sqrt(-1.0::float8) should error, not return NaN
SELECT sqrt((-1.0)::float8);
PostgreSQL:
ERROR: cannot take square root of a negative number
DataFusion:
18. ln(-1.0::float8) should error, not return NaN
SELECT ln((-1.0)::float8);
PostgreSQL:
ERROR: cannot take logarithm of a negative number
DataFusion:
19. log(0.0::float8) should error, not return -inf
PostgreSQL:
ERROR: cannot take logarithm of zero
DataFusion:
20. power(0.0::float8, -1.0::float8) should error, not return infinity
SELECT power(0.0::float8, -1.0::float8);
PostgreSQL:
ERROR: zero raised to a negative power is undefined
DataFusion:
Expected behavior
For PostgreSQL-compatible SQL semantics, DataFusion should either match PostgreSQL's result or raise the same class of domain/semantic error for these simple expressions.
Describe the bug
While comparing simple SQL statements between PostgreSQL and DataFusion, I found several PostgreSQL compatibility mismatches where DataFusion returns a different answer or returns a value where PostgreSQL raises a domain error.
I excluded cases that are only unsupported PostgreSQL functions in DataFusion and minor type/display-format differences.
Environment
PostgreSQL was built locally from source:
DataFusion:
Mismatches
1.
^evaluates as bitwise XOR instead of PostgreSQL exponentiationPostgreSQL:
DataFusion:
2.
SIMILAR TOshould treat%as a wildcardPostgreSQL:
DataFusion:
3.
replacewith an empty search string should be a no-opPostgreSQL:
DataFusion:
4.
array_lengthof an empty array dimension should be NULLPostgreSQL:
DataFusion:
5. Negative array subscripts should not index from the end
PostgreSQL:
DataFusion:
6.
time + intervalshould wrap within the 24-hour time domainPostgreSQL:
DataFusion:
7.
time - intervalshould wrap within the 24-hour time domainPostgreSQL:
DataFusion:
8.
extract(second ...)should preserve fractional secondsPostgreSQL:
DataFusion:
9.
extract(milliseconds ...)should preserve fractional millisecondsPostgreSQL:
DataFusion:
10.
regexp_countshould count empty-pattern matchesPostgreSQL:
DataFusion:
11.
regexp_instrwith an empty pattern should return 1PostgreSQL:
DataFusion:
12.
regexp_likeshould honor PostgreSQL multiline flagmPostgreSQL:
DataFusion:
13.
regexp_replaceshould honor PostgreSQL multiline flagmPostgreSQL:
DataFusion:
14.
round(float8)should match PostgreSQL half-tie behaviorPostgreSQL:
DataFusion:
15.
factorial(21)should not overflow when PostgreSQL returns a numeric answerPostgreSQL:
DataFusion:
16.
factorialof a negative value should errorPostgreSQL:
DataFusion:
17.
sqrt(-1.0::float8)should error, not return NaNPostgreSQL:
DataFusion:
18.
ln(-1.0::float8)should error, not return NaNPostgreSQL:
DataFusion:
19.
log(0.0::float8)should error, not return-infPostgreSQL:
DataFusion:
20.
power(0.0::float8, -1.0::float8)should error, not return infinityPostgreSQL:
DataFusion:
Expected behavior
For PostgreSQL-compatible SQL semantics, DataFusion should either match PostgreSQL's result or raise the same class of domain/semantic error for these simple expressions.