-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Implement ScalarValue::Dictionary and preserve type through conversion back/forth to Array
#2891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1b0f169 to
5e08360
Compare
5e08360 to
d133fb1
Compare
701aca9 to
6044b91
Compare
| let month = match ScalarValue::try_from_array(results[0].column(1), 0)? { | ||
| ScalarValue::Utf8(Some(month)) => month, | ||
| s => panic!("Expected count as Int64 found {}", s.get_datatype()), | ||
| let s = ScalarValue::try_from_array(results[0].column(1), 0)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because the dictionary encoding is preserved in the ScalarValue now
| } | ||
| } | ||
|
|
||
| /// return the type to use to accumulate state for the specified input type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this workaround is no longer needed 🎉
Codecov Report
@@ Coverage Diff @@
## master #2891 +/- ##
==========================================
+ Coverage 85.32% 85.33% +0.01%
==========================================
Files 273 273
Lines 49351 49472 +121
==========================================
+ Hits 42108 42217 +109
- Misses 7243 7255 +12
Continue to review full report at Codecov.
|
…ion back/forth to Array
16ecff6 to
5b96042
Compare
| /// struct of nested ScalarValue | ||
| Struct(Option<Vec<ScalarValue>>, Box<Vec<Field>>), | ||
| /// Dictionary type: index type and value | ||
| Dictionary(Box<DataType>, Box<ScalarValue>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core change is to introduce this variant and the rest of the PR is implementing it as well as removing the partial support for Dictionaries that existed previously
| _ => unreachable!("Invalid dictionary keys type: {:?}", key_type), | ||
| } | ||
| } | ||
| // explicitly enumerate unsupported types so newly added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to find that ScalarValue::Dictionary was missing here due to the default _ match.
|
|
||
| check_scalar_cast(ScalarValue::Float64(None), DataType::Int16); | ||
|
|
||
| check_scalar_cast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would fail before this change
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn project_cast_dictionary() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reproducer for #2873
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
|
Benchmark runs are scheduled for baseline = b66dda5 and contender = 60c57cc. 60c57cc is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Draft until:ScalarValuetests alongside implementation, movefrom_slicetodatafusion_core#2914 is mergedWhich issue does this PR close?
Closes #2874
Closes #2875
Closes #2873
Rationale for this change
See #2873 and #2874
TLDR is that on master,
ScalarValueloses the fact that it came from aDictionarytype so casting aScalarto aDictionaryand then converting into an array does not produce aDictionaryArray, causing #2873 and other issuesSupporting
ScalarValue::Dictionarywill also ensure the rest of the DataFusion machinery works as expected and without special case (e.g. specialized scalar value kernels for Dictionaries)What changes are included in this PR?
Implement
ScalarValue::DictionaryAre there any user-facing changes?
New type of
ScalarValue