Skip to content

Conversation

@comphead
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Adding a ansi flag to expand coverage for Spark built in functions, Spark 4.0 sets ansi mode as true by default.
Currently the flag is planned to be used to datafusion-spark crate via ScalarConfigArgs, however it can also be used for DF if ansi mode is in the roadmap

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the common Related to common crate label Nov 11, 2025
@comphead comphead requested a review from Jefffrey November 11, 2025 23:16
@comphead
Copy link
Contributor Author

@Omega359 FYI

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 11, 2025
/// semantics for expressions, casting, and error handling. This includes:
/// - Strict type coercion rules: implicit casts between incompatible types are disallowed.
/// - Standard SQL arithmetic behavior: operations like division by zero or overflow
/// result in runtime errors instead of returning `NULL` or silently truncating values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) In abs(), the value that results in overflow is returned as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hsiang-c let me think how to rephrase it better

Copy link
Contributor

@hsiang-c hsiang-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 12, 2025
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave a note saying it has experimental support, considering it will need to be gradually introduced to functions?

Especially as I think default behaviour for some DataFusion functions lines up more with true than false (e.g. abs overflow will return compute error which is the true behaviour described here)

@comphead
Copy link
Contributor Author

Should we leave a note saying it has experimental support, considering it will need to be gradually introduced to functions?

Especially as I think default behaviour for some DataFusion functions lines up more with true than false (e.g. abs overflow will return compute error which is the true behaviour described here)

Thanks @Jefffrey I think I overwrote those changes saying this is experimental and relevant for Spark crate only, redoing it

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next steps would be to create an issue to retrofit this config onto existing Spark functions 👍

@comphead
Copy link
Contributor Author

Next steps would be to create an issue to retrofit this config onto existing Spark functions 👍

Thats what @hsiang-c is working on with abs

@comphead comphead added this pull request to the merge queue Nov 12, 2025
Merged via the queue into apache:main with commit becc71b Nov 12, 2025
29 checks passed
@Omega359
Copy link
Contributor

Just saw this. I was thinking of a 'spark' extension option to handle this myself.

@comphead
Copy link
Contributor Author

Just saw this. I was thinking of a 'spark' extension option to handle this myself.

Could you please elaborate? Do you mean to have a set of spark related config options?
If so, I was thinking the same thing one day, but ansi may relate to both Spark and DataFusion IMO

@Omega359
Copy link
Contributor

Omega359 commented Nov 12, 2025

Just saw this. I was thinking of a 'spark' extension option to handle this myself.

Could you please elaborate? Do you mean to have a set of spark related config options? If so, I was thinking the same thing one day, but ansi may relate to both Spark and DataFusion IMO

Essentially yes. Spark seems to have a - let's put this politely - abundance of configuration options. I'm sure more than just the ansi one will apply to DF spark functions at some point.

The only real concern I have with having an ansi config option in the main df execution config is that it might start to be applied piecemeal outside of the spark crate. Think implicit casting, normal scalar udfs, etc.

I'm not against that at all but it's a concern for me because I have my doubts DF itself would (or perhaps even should?) support anything but what spark deems strict ansi mode (ansi = true). It's a topic that likely should get much more community involvement if this was to be used outside of the spark crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Introduce ansi enable parameter for execution config

4 participants