Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: remove panics in datafusion-common::scalar by making more operations return Result #7901

Merged

Conversation

junjunjd
Copy link
Contributor

@junjunjd junjunjd commented Oct 22, 2023

Which issue does this PR close?

It removes the majority of panics in datafusion-common::scalar #3313.

Rationale for this change

Important move towards closing #3313
Closes #3313

What changes are included in this PR?

Replace most of the panics in datafusion-common::scalar by internal_err, not_impl_err or other DataFusionError variants

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Oct 22, 2023
@alamb
Copy link
Contributor

alamb commented Oct 23, 2023

The CI appears to be failing. Marking as draft until they are passing. If there are specific questions about this PR, please let us know.

@alamb alamb marked this pull request as draft October 23, 2023 21:22
@alamb
Copy link
Contributor

alamb commented Oct 23, 2023

Thank you for the work @junjunjd

@junjunjd junjunjd force-pushed the chore/remove-panics-in-datafusion-common branch 2 times, most recently from 863ed9a to 4b61807 Compare October 25, 2023 07:05
@junjunjd junjunjd marked this pull request as ready for review October 25, 2023 07:55
@junjunjd
Copy link
Contributor Author

@alamb CI is fixed. This MR is ready for final review.

Copy link
Contributor

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @junjunjd 👍

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic work @junjunjd thanks
I'm thinking if we can get rid of .expect ? Or at least provide more details in expect message?

@junjunjd
Copy link
Contributor Author

junjunjd commented Oct 26, 2023

Epic work @junjunjd thanks I'm thinking if we can get rid of .expect ? Or at least provide more details in expect message?

I removed the .expect in get_min_max_values and get_null_count_values macros.
The rest of the .expect calls exist in tests or examples. It makes sense to use .expect and panic there. The backtrace from a panic as well as the message provided by .expect provide more information on the failure than Result. The Rust book suggests calling unwrap or expect in tests/examples https://doc.rust-lang.org/book/ch09-03-to-panic-or-not-to-panic.html

@junjunjd junjunjd force-pushed the chore/remove-panics-in-datafusion-common branch from 4b61807 to 60d1543 Compare October 26, 2023 05:21
@junjunjd
Copy link
Contributor Author

@alamb @Weijun-H @comphead I addressed the comments. This is ready for another review.

Copy link
Contributor

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get rid of .expect in tests 🤔?

datafusion/common/src/scalar.rs Show resolved Hide resolved
datafusion/common/src/scalar.rs Show resolved Hide resolved
datafusion/common/src/scalar.rs Show resolved Hide resolved
datafusion/common/src/scalar.rs Show resolved Hide resolved
datafusion/common/src/scalar.rs Show resolved Hide resolved
@houqp
Copy link
Member

houqp commented Oct 26, 2023

@Weijun-H @comphead using unwrap and expect in tests is actually the preferred practice, see https://github.com/influxdata/influxdb/blob/main/docs/style_guide.md#dont-return-result-from-test-functions. It makes the test failure easier to parse for a human and the test framework will already provide all the necessary context on failure.

Copy link
Contributor

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @junjunjd

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @junjunjd. We very much appreciate the effort -- I am sorry that the ticket description may have been misleading

Buried in #3313, it says

The goal is not to remove all panics but review and make sure we are using them appropriately. Bonus points for adding documentation for invariants.

Can you explain why you removed the panics that you did? I think most of them are "unreachable" so forcing client code to check for errors that will never happen makes it harder to work with (and is why this PR adds around 500 new lines of code)

@@ -330,9 +330,9 @@ impl PartialOrd for ScalarValue {
let arr2 = list_arr2.value(i);

let lt_res =
arrow::compute::kernels::cmp::lt(&arr1, &arr2).unwrap();
arrow::compute::kernels::cmp::lt(&arr1, &arr2).ok()?;
Copy link
Contributor Author

@junjunjd junjunjd Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This panic is reachable, for example if arr1 and arr2 have different data type, arrow::compute::kernels::cmp::lt will panic.
I think it makes sense to return None here instead of panicking and exiting since user just performs a partial order comparison.
This does not require any code change in client side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The potential downside of returning None rather than panic'ing is that it may mask a real bug and make it harder to track down -- comparing scalars of different types likely means they should have been coerced before

@@ -1970,13 +2020,14 @@ impl ScalarValue {
),
},
ScalarValue::Fixedsizelist(..) => {
unimplemented!("FixedSizeList is not supported yet")
Copy link
Contributor Author

@junjunjd junjunjd Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb What would be the preferred way to handle unimplemented errors in datafusion? There are many places where a NotImplemented error is returned instead of using unimplemented! and panicking. IMO returning an error makes more sense as user can choose to ignore unimplemented errors instead of panicking and exiting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree returning NotYetImplemented is a better choice

@junjunjd
Copy link
Contributor Author

junjunjd commented Nov 3, 2023

@alamb Thanks for the review. The majority of the added lines should be caused by line wrap reformat in tests. I have added comments to the panics I removed in scalar.rs. To summarize, these panics can be categorized into five types in general:

  • panics generated in iter_to_array when the ScalarValues in the iterator are not all the same type. I believe these errors are reachable. Most of the build_* macros defined in the function return an internal error instead of panicking. IMO it makes sense to remove these panics to align with other internal errors returned.
  • typed_cast_* macros called in try_from_array. Since try_from_array is a public function, downcasting the array to certain types can fail depending on what array value the user passes to the function.
    I think it makes sense to return an internal error as this error is reachable and recoverable. This aligns with how downcast error is handled in the downcast_value macro. try_from_array already returns a Result, so this does not require any change in client code.
  • panics generated when with_precision_and_scale is called on decimal arrays. I think these errors are reachable because ScalarValue::try_new_decimal128 allows decimals with precision 0 while arrow-array does not support that. We can update try_new_decimal128 to disallow decimal with precision 0 and establish some other invariants to new_list, eq_array and to_array_of_size so that these error becomes unreachable and these functions can panic. This should remove the impacts on client code.
  • the unimplemented errors. In many other datafusion code, a NotImplemented error is returned instead of using unimplemented!. IMHO returning an error makes more sense as user can choose whether to ignore the unimplemented errors instead of panicking and exiting. Would appreciate your thoughts on this.
  • All the ArrowErrors and the "Invalid dictionary keys type" errors should be unreachable. I will change these back to panics.

@alamb alamb added the api change Changes the API exposed to users of the crate label Nov 3, 2023
@alamb alamb changed the title chore: remove panics in datafusion-common::scalar chore: remove panics in datafusion-common::scalar by making more operations return Result Nov 3, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thank you to @junjunjd for investing so much time, not just in the code but also evaluating the implications of the changes.

In general I think there are tradeoffs between panic and returning Errs -- specifically:

  1. Panic's are not as user friendly, but they stop computation immediately when something "unexpected" happens and thus are often easier to debug and locate the problem
  2. Errs are more user friendly, and can return messages that may help users workaround/fix whatever is wrong.

I realize there is a judgement call required to decide if something is "expected" or not and how much information users can get from error messages vs panics and weighing off a better user experience vs code that is more efficient to debug

I also realize the existing DataFusion codebase is not consistent in its handling of panics and errors.

On the balance I think this PR is an improvement to the code, and therefore I think it could be merged. Users of the affected APIs can simply unwrap the Result and get the same panic behavior as before.

I think it would be ok not to merge it too if other reviewers feel strongly in the other direction.

@@ -330,9 +330,9 @@ impl PartialOrd for ScalarValue {
let arr2 = list_arr2.value(i);

let lt_res =
arrow::compute::kernels::cmp::lt(&arr1, &arr2).unwrap();
arrow::compute::kernels::cmp::lt(&arr1, &arr2).ok()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The potential downside of returning None rather than panic'ing is that it may mask a real bug and make it harder to track down -- comparing scalars of different types likely means they should have been coerced before

@@ -1970,13 +2020,14 @@ impl ScalarValue {
),
},
ScalarValue::Fixedsizelist(..) => {
unimplemented!("FixedSizeList is not supported yet")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree returning NotYetImplemented is a better choice

@alamb
Copy link
Contributor

alamb commented Nov 7, 2023

@junjunjd can you please merge and resolve the conflicts in this PR? Then we can merge it in

@alamb alamb marked this pull request as draft November 9, 2023 17:46
@alamb
Copy link
Contributor

alamb commented Nov 9, 2023

Marking as draft as it needs conflicts resolved prior to being mergable.

@junjunjd junjunjd force-pushed the chore/remove-panics-in-datafusion-common branch from 60d1543 to 691ebf4 Compare November 11, 2023 07:04
@github-actions github-actions bot removed the sql SQL Planner label Nov 11, 2023
@junjunjd junjunjd force-pushed the chore/remove-panics-in-datafusion-common branch from 691ebf4 to 2026514 Compare November 11, 2023 08:13
@junjunjd
Copy link
Contributor Author

@alamb Thank you for the review! I rebased the MR. It is ready for final review/merge.

@junjunjd junjunjd marked this pull request as ready for review November 11, 2023 08:17
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @junjunjd

@alamb alamb merged commit e642cc2 into apache:main Nov 11, 2023
22 checks passed
alamb added a commit to alamb/datafusion that referenced this pull request Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review use of panic in datafusion-common crate
5 participants