We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for comments: https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit?usp=sharing
(dump of the document above)
Rust Arrow supports two main computational models:
Batch Operations, that leverage some form of vectorization
Element-by-element operations, that emerge in more complex operations
This document concerns element-by-element operations, that are common outside of the library (and sometimes in the library).
These operations are programmatically written as:
Downcast the array to its specific type
Initialize buffers
Iterate over indices and perform the operation, appending to the buffers accordingly
Create ArrayData with the required null bitmap, buffers, childs, etc.
return ArrayRef from ArrayData
We can split this process in 3 parts:
Initialization (1 and 2)
Iteration (3)
Finalization (4 and 5)
Currently, the API that we offer to our users is:
as_any() to downcast the array based on its DataType
Builders for all types, that users can initialize, matching the downcasted array
Iterate
Array::value(i)
Array::is_valid(i)/is_null(i)
Finish the builder and wrap the result in an Arc
This API has some issues:
value(i) is unsafe, even though it is not marked as such
builders are usually slow due to the checks that they need to perform
The API is not intuitive
This proposal aims at improving this API in 2 specific ways:
Implement IntoIterator Iterator<Item=T> and Iterator<Item=Option>
Implement FromIterator<Item=T> and Item=Option
so that users can write:
// incoming array let array = Int32Array::from(vec![Some(0), None, Some(2), None, Some(4)]); let array = Arc::new(array) as ArrayRef; let array = array.as_any().downcast_ref::<Int32Array>().unwrap(); // to and from iter, with a +1 let result: Int32Array = array .iter() .map(|e| if let Some(r) = e { Some(r + 1) } else { None }) .collect(); let expected = Int32Array::from(vec![Some(1), None, Some(3), None, Some(5)]); assert_eq!(result, expected);
This results in an API that is:
FromIterator
Reporter: Jorge Leitão / @jorgecarleitao Assignee: Jorge Leitão / @jorgecarleitao
Note: This issue was originally created as ARROW-10030. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered:
Jorge Leitão / @jorgecarleitao: Issue resolved by pull request 8211 #8211
Sorry, something went wrong.
jorgecarleitao
No branches or pull requests
Proposal for comments: https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit?usp=sharing
(dump of the document above)
Rust Arrow supports two main computational models:
Batch Operations, that leverage some form of vectorization
Element-by-element operations, that emerge in more complex operations
This document concerns element-by-element operations, that are common outside of the library (and sometimes in the library).
Element-by-element operations
These operations are programmatically written as:
Downcast the array to its specific type
Initialize buffers
Iterate over indices and perform the operation, appending to the buffers accordingly
Create ArrayData with the required null bitmap, buffers, childs, etc.
return ArrayRef from ArrayData
We can split this process in 3 parts:
Initialization (1 and 2)
Iteration (3)
Finalization (4 and 5)
Currently, the API that we offer to our users is:
as_any() to downcast the array based on its DataType
Builders for all types, that users can initialize, matching the downcasted array
Iterate
Array::value(i)
andArray::is_valid(i)/is_null(i)
Finish the builder and wrap the result in an Arc
This API has some issues:
value(i) is unsafe, even though it is not marked as such
builders are usually slow due to the checks that they need to perform
The API is not intuitive
Proposal
This proposal aims at improving this API in 2 specific ways:
Implement IntoIterator Iterator<Item=T> and Iterator<Item=Option>
Implement FromIterator<Item=T> and Item=Option
so that users can write:
This results in an API that is:
FromIterator
that are efficient in populating the buffers/child etc from an iteratorReporter: Jorge Leitão / @jorgecarleitao
Assignee: Jorge Leitão / @jorgecarleitao
PRs and other links:
Note: This issue was originally created as ARROW-10030. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: