-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Support GetIndexedFieldExpr for ScalarValue #2196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| )), | ||
| ColumnarValue::Scalar(scalar) => match (scalar.get_datatype(), &self.key) { | ||
| (DataType::List(v), ScalarValue::Int64(Some(i))) => { | ||
| let wrapper = scalar.to_array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I implement a new method that will convert Vec (which ScalarValue holds) for List instead of using the current API and temporarily ArrayRef? For example: as_array_list()?
Thanks
cC @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if initially you might be able to use ScalarValue::to_array_of_size to convert the scalar argument into an ArrayRef and then use the same code as above:
Another alternative might be to use the take kernel with something like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_array uses to_array_of_size under the hood, but I cannot reuse the code above, because it returns an ColumnarValue::Array, but in the case with ColumnarValue::Scalar we need to return ColumnarValue::Scalar.
I did another draft in af8c77f WDYT?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Sorry for the ping, just a friendly reminder that I am still waiting for advice on how to solve it. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry @ovr -- taking a look
|
|
||
| #[tokio::test] | ||
| async fn test_array_index() -> Result<()> { | ||
| test_expression!("([5,4,3,2,1])[1]", "5"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.postgresql.org/docs/current/arrays.html
I verified that the subscripts are 1 based 👍
The array subscript numbers are written within square brackets. By default PostgreSQL uses a one-based numbering convention for arrays, that is, an array of n elements starts with array[1] and ends with array[n].
datafusion/core/tests/sql/expr.rs
Outdated
| test_expression!("([5,4,3,2,1])[1]", "5"); | ||
| test_expression!("([5,4,3,2,1])[5]", "1"); | ||
| test_expression!("([5,4,3,2,1])[100]", "NULL"); | ||
| test_expression!("([5,4,3,2,1])[-1]", "NULL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you want to potentially try nested lists. Something like
| test_expression!("([5,4,3,2,1])[-1]", "NULL"); | |
| test_expression!("([5,4,3,2,1])[-1]", "NULL"); | |
| test_expression!("([[123],[4,5,6]])[2]", "[4,5,6]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, it's not possible to define multi-dimension arrays via SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, Nope, I was wrong.
| ColumnarValue::Scalar(_) => Err(DataFusionError::NotImplemented( | ||
| "field access is not yet implemented for scalar values".to_string(), | ||
| )), | ||
| ColumnarValue::Scalar(scalar) => match (scalar.get_datatype(), &self.key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I fully follow this code -- Since it is creating a ArrayRef from the ColumnarValue::Scalar, I wonder why it can't use the same code as the ColumnarValue::Array case and call to_arrow()?
So for example, rather than
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
let arg = self.arg.evaluate(batch)?;
match arg {
ColumnarValue::Array(array) => match (array.data_type(), &self.key) {
...It could look like:
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
let array = self.arg.evaluate(batch)?
// convert to Arrayref
.to_arrow();
match (array.data_type(), &self.key) {
...That way the same code would be used for the array and scalar cases of ColumnarValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Done.
| } else { | ||
| Ok(Field::new(&i.to_string(), lt.data_type().clone(), false)) | ||
| } | ||
| Ok(Field::new(&i.to_string(), lt.data_type().clone(), true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Rebased ✅ Changed to simplify/reuse logic across Scalar/ArrayRef. cC @alamb |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @ovr.
|
I'll go ahead and merge this later today unless @alamb has any additional feedback |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thanks @ovr

Closes: #2207
Hello!
I've opened a PR as a Draft to indicate that I am working on the task of supporting the array index operator in DF.
Breaking changes
Thanks