Skip to content

Conversation

@retikulum
Copy link
Contributor

Which issue does this PR close?

Part of #3152.

Rationale for this change

It is very clear in the issue but there are different schemas while downcasting an ArrayRef to a related arrow array type. This is the first PR of improving downcasting to arrow array types. As it is seen, this PR is small but I am hoping to see what I am doing right and wrong.

What changes are included in this PR?

  • datafusion\common\src\cast.rs is created. Moreover, I think future downcasting functions will be written in it.
  • Refactor parts where following line is used:
as_any().downcast_ref::<Date32Array>().unwrap()
  • This commit passed related tests.

Are there any user-facing changes?

I am not sure about it.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Oct 28, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @retikulum -- looks neat. I left some suggestions but basically this is the right track I think

what do you think @avantgardnerio ?

Comment on lines +381 to +384
DataType::Date32 => match as_date32_array($ARRAY) {
Ok(array) => Ok($FN(array)?),
Err(e) => Err(e),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to make this even more concise like

Suggested change
DataType::Date32 => match as_date32_array($ARRAY) {
Ok(array) => Ok($FN(array)?),
Err(e) => Err(e),
},
DataType::Date32 => $FN(match as_date32_array($ARRAY)?),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find an appropriate way to implement this. It seems that it has to be wrapped around Ok or Err to return Result.

@avantgardnerio
Copy link
Contributor

what do you think @avantgardnerio ?

I 100% agree with the intent of the PR: even if they "never happen at run time" panics (and unwraps, etc) create cognitive load on the programmer to mentally statically prove that to themselves every time they look at the code, so we should just eliminate them to the extent possible.

As far as implementation, as I become more proficient at Rust, I increasingly see the importance of like this being discoverable... is it possible to make this change in arrow::ArrayRef? Or to do a Date32::TryFrom trait? In the sea of code I wade through every day, anything that autocompletes is a lot more likely to get used.

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @alamb 's feedback, but the real blocker for me is the println! error swallowing.

to.set_date32(col_idx, from.value(row_idx));
match as_date32_array(from) {
Ok(from) => to.set_date32(col_idx, from.value(row_idx)),
Err(e) => println!("{}", e),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO logging here is worse than unwrapping. The function is called write_field_date32() and if a date32 wasn't written to something, abending is preferable to carrying on in an in-determinant state. (of course, returning an Err Result would be even more preferable).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree I prefer "unwrap()" rather than println -- stdout may or may not be connected to something which would be noticed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems reasonable to me. I will fix it and be more careful in future PRs.

@retikulum
Copy link
Contributor Author

Thanks @retikulum -- looks neat. I left some suggestions but basically this is the right track I think

what do you think @avantgardnerio ?

Thanks for great feedback @alamb . Your recommendations generally improve my style to write Rust. I will make necessary changes.

what do you think @avantgardnerio ?

I 100% agree with the intent of the PR: even if they "never happen at run time" panics (and unwraps, etc) create cognitive load on the programmer to mentally statically prove that to themselves every time they look at the code, so we should just eliminate them to the extent possible.

As far as implementation, as I become more proficient at Rust, I increasingly see the importance of like this being discoverable... is it possible to make this change in arrow::ArrayRef? Or to do a Date32::TryFrom trait? In the sea of code I wade through every day, anything that autocompletes is a lot more likely to get used.

Hi @avantgardnerio. Thanks for the feedback and for joining the conversation. As far as I understand, We are on right track but we need to decide how to implement those changes. Also, this is discussed in #3152. @alamb what do you think about implementation? Should we continue creating functions in datafusion\common\src\cast.rs or create a different schema? If we will follow a different path, could you also help me out with how it is implemented?

@alamb
Copy link
Contributor

alamb commented Oct 30, 2022

Thanks for great feedback @alamb . Your recommendations generally improve my style to write Rust. I will make necessary changes.

We are all learning this together! Thank you for all your contributions so far

@retikulum
Copy link
Contributor Author

Since there were no more comments, I fixed the suggested parts.

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see several less panics in our code!

@alamb alamb merged commit eb86b07 into apache:master Oct 31, 2022
@alamb
Copy link
Contributor

alamb commented Oct 31, 2022

I agree -- we are making progress -- I figured I would merge this PR in and we can continue to improve the code as follow on PRs

@ursabot
Copy link

ursabot commented Oct 31, 2022

Benchmark runs are scheduled for baseline = bec3856 and contender = eb86b07. eb86b07 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@retikulum retikulum deleted the 3152_improve_downcast_date32array branch November 1, 2022 14:16
Dandandan pushed a commit to yuuch/arrow-datafusion that referenced this pull request Nov 5, 2022
…pache#4004)

* improve error messages for date32array

* fix clippy errors - change ok_or to ok_or_else

* changes after review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants