feat: support Spark-compatible `abs` math function part 1 - non-ANSI mode #18205

hsiang-c · 2025-10-21T17:05:13Z

Which issue does this PR close?

Part of [EPIC] Complete datafusion-spark Spark Compatible Functions #15914

Rationale for this change

Apache Spark's abs() behaves differently than DataFusion.
Apache Spark's ANSI-compliant dialect can be toggled by SparkConf spark.sql.ansi.enabled. When ANSI mode is off, arithmetic overflow doesn't throw exception like DataFusion does.
DataFusion Comet can leverage it at fix: re-enable Comet abs datafusion-comet#2595

What changes are included in this PR?

This is the 1st PR to support non-ANSI mode Spark-compatible abs math function
Mimics Apache Spark v4.0.1 abs expression for numeric types only and non-ANSI mode, i.e. spark.sql.ansi.enabled=false

Tasks breakdown

Non-ANSI mode	ANSI mode	ANSI Interval Types
this PR	#18828	TODO

Are these changes tested?

unit tests
sqllogictest: test_files/spark/math/abs.slt

Are there any user-facing changes?

Yes, the abs function can be specified in the SQL.

Arithmetic overflow will NOT be thrown on arithmetic overflow.

hsiang-c · 2025-10-22T00:08:19Z

cc @comphead for code review, thank you.

comphead · 2025-10-22T00:13:42Z

datafusion/sqllogictest/test_files/spark/math/abs.slt

+
+# abs: signed int minimal values
+query IIII
+select abs(c1), abs(c2), abs(c3), abs(c4) from test_nullable_integer where dataset = 'mins'


wondering would be that easier to test like

query II select abs(1), abs(-1) ---- 1 1

?

instead of creating/dropping tables

Doing abs(-128), abs(-32768) and abs(-2147483648) doesn't work b/c type widening.

Doing abs(-128::TINYLINT), abs(-32768::SMALLINT), abs(-2147483648::INT), abs(-9223372036854775808::BIGINT) throws casting error. For example, DataFusion error: Arrow error: Cast error: Can't cast value 128 to type Int8

I think this is a bug in SQL parsing:

> select -128::tinyint; Arrow error: Cast error: Can't cast value 128 to type Int8 > select (-128)::tinyint; +-------------+ | Int64(-128) | +-------------+ | -128 | +-------------+ 1 row(s) fetched. Elapsed 0.003 seconds.

It casts the 128 value without accounting for the negative; might need to raise an issue for this? Not sure if this is intended behaviour or not

So can wrap it in parentheses to ensure the correct precedence, or alternatively use arrow_cast:

> select arrow_cast(-128, 'Int8'); +--------------------------------------+ | arrow_cast(Int64(-128),Utf8("Int8")) | +--------------------------------------+ | -128 | +--------------------------------------+ 1 row(s) fetched. Elapsed 0.007 seconds.

comphead · 2025-10-22T00:15:33Z

datafusion/sqllogictest/test_files/spark/math/abs.slt

+0 0
+1 1
+1 1
+NULL NULL


its better to use inline query, in this example the answers and input data are out of order and it might be more difficult to read

comphead · 2025-10-22T00:18:05Z

datafusion/sqllogictest/test_files/spark/math/abs.slt

 ## PySpark 3.5.5 Result: {"abs(INTERVAL '-1-1' YEAR TO MONTH)": 13, "typeof(abs(INTERVAL '-1-1' YEAR TO MONTH))": 'interval year to month', "typeof(INTERVAL '-1-1' YEAR TO MONTH)": 'interval year to month'}
-#query
-#SELECT abs(INTERVAL '-1-1' YEAR TO MONTH::interval year to month);
+query error DataFusion error: This feature is not implemented: Unsupported SQL type INTERVAL YEAR TO MONTH


Lets create a github ticket to fix this and refer to it in the comments in addition to the error.

Looks like abs works with intervals for Spark only