Skip to content

Conversation

@codetyri0n
Copy link
Contributor

Which issue does this PR close?

Part of #18613

Rationale for this change

What changes are included in this PR?

refactoring conditional checks with assert_or_internal_err!() or assert_eq_or_internal_err!()

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the spark label Nov 13, 2025
@codetyri0n
Copy link
Contributor Author

CC : @2010YOUY01

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @codetyri0n I'm thinking we need to use exec_err or plan_err instead of internal_err. The last one used for something which is not really expected and classifiable.

assert_eq_or_internal_err!(
args.len(),
2,
"Spark `date_add` function requires 2 arguments"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Spark `date_add` function requires 2 arguments"
"Spark `date_add` function requires 2 arguments, got {}"

This is useful while debugging. Please keep it!

return internal_err!(
"Spark `date_add` function requires 2 arguments, got {}",
args.len()
assert_eq_or_internal_err!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this change does not improve it. We are already sure that the args length is not 2, so there is no need to check it again.
Also now this requires using unreachable!() at line 98

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using take_function_args for patterns like this (unpacking expected no. of arguments), for reference:

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
let [key_type, value_type] = take_function_args("map_from_arrays", arg_types)?;
Ok(map_type_from_key_value_types(
get_element_type(key_type)?,
get_element_type(value_type)?,
))
}

assert_or_internal_err!(
matches!(
other,
ColumnarValue::Array(_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches!(
other,
ColumnarValue::Array(_)
| ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(_))
| ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(None))

?!
Because Some(iv) is also already handled at https://github.com/codetyri0n/datafusion/blob/fcd703b784d3ec4d9976c47a55b52d22c0c08757/datafusion/spark/src/function/datetime/make_interval.rs#L557

return internal_err!(
"Spark `sha1` function requires 1 argument, got {}",
args.len()
assert_eq_or_internal_err!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - not really an improvement

if args.len() != 1 {
return internal_err!("`factorial` expects exactly one argument");
}
assert_eq_or_internal_err!(args.len(), 1, "`factorial` expects exactly one argument");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use take_function_args here

return internal_err!(
"Spark `date_add` function requires 2 arguments, got {}",
args.len()
assert_eq_or_internal_err!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using take_function_args for patterns like this (unpacking expected no. of arguments), for reference:

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
let [key_type, value_type] = take_function_args("map_from_arrays", arg_types)?;
Ok(map_type_from_key_value_types(
get_element_type(key_type)?,
get_element_type(value_type)?,
))
}

Comment on lines 133 to 138
assert_or_internal_err!(
matches!(days_arg.data_type(), DataType::Int8 | DataType::Int16 | DataType::Int32),
"Spark `date_sub` function: argument must be int8, int16, int32, got {:?}",
days_arg.data_type()
);
unreachable!()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the original instead of introducing unreachable!()

assert_or_internal_err should be replacing asserts, not existing internal_errs

@codetyri0n
Copy link
Contributor Author

thanks @Jefffrey @martin-g
I would like to make sure I get my understanding right (for this and future PRs)- please let me know if otherwise :
We would need the refactoring only for the function signature for improvements for the errors with better detailing user for example -
SELECT date_add('2025-01-10');
and it should be returning something like -
Spark date_add function requires 2 arguments, got 1

realised that using this macro is not an improvement for return types and etc.

args.len()
);
};
let [date_arg, days_arg] = take_function_args("date_add", args)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled this with take_function_args as suggested. Noticed a few functions that have not used this, thinking ill include those in a separate PR.

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these refactors should use take_function_args as it is much more ergonomic and handles the error for us anyway

Comment on lines 83 to 84
internal_err!("if should have been simplified to case")
assert_or_internal_err!(false, "if should have been simplified to case");
unreachable!()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is little benefit to this refactor if we introduce an unreachable

Copy link
Contributor Author

@codetyri0n codetyri0n Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, my apologies I forgot to revert this change (and the one below as well 🤦🏽 ). : )

Comment on lines 71 to 76
assert_eq_or_internal_err!(
args.len(),
1,
"Spark `last_day` function requires 1 argument"
);
unreachable!()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed to use take_function_args as well

if args.len() != 1 {
return internal_err!("`factorial` expects exactly one argument");
}
assert_eq_or_internal_err!(args.len(), 1, "`factorial` expects exactly one argument");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use take_function_args here

return internal_err!(
"Spark `crc32` function requires 1 argument, got {}",
args.len()
assert_eq_or_internal_err!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refactor this in #18662 anyway

if args.len() != 1 {
return internal_err!("hex expects exactly one argument");
}
assert_eq_or_internal_err!(args.len(), 1, "hex expects exactly one argument");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use take_function_args

@codetyri0n
Copy link
Contributor Author

codetyri0n commented Nov 14, 2025

this is good to go (CI failure due to No space left on device)

@Jefffrey Jefffrey added this pull request to the merge queue Nov 15, 2025
Merged via the queue into apache:main with commit 17050a9 Nov 15, 2025
28 checks passed
@codetyri0n codetyri0n deleted the refactor_spark_err branch November 19, 2025 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants