Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] Support fromIter and toIter #26052

Closed
asfimport opened this issue Sep 17, 2020 · 1 comment
Closed

[Rust] Support fromIter and toIter #26052

asfimport opened this issue Sep 17, 2020 · 1 comment

Comments

@asfimport
Copy link
Collaborator

Proposal for comments: https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit?usp=sharing

(dump of the document above)

Rust Arrow supports two main computational models:

  1. Batch Operations, that leverage some form of vectorization

  2. Element-by-element operations, that emerge in more complex operations

    This document concerns element-by-element operations, that are common outside of the library (and sometimes in the library).

    Element-by-element operations

    These operations are programmatically written as:

  3. Downcast the array to its specific type

  4. Initialize buffers

  5. Iterate over indices and perform the operation, appending to the buffers accordingly

  6. Create ArrayData with the required null bitmap, buffers, childs, etc.

  7. return ArrayRef from ArrayData

     

    We can split this process in 3 parts:

  8. Initialization (1 and 2)

  9. Iteration (3)

  10. Finalization (4 and 5)

    Currently, the API that we offer to our users is:

  11. as_any() to downcast the array based on its DataType

  12. Builders for all types, that users can initialize, matching the downcasted array

  13. Iterate

    1. Use for i in (0..array.len())
    2. Use Array::value(i) and Array::is_valid(i)/is_null(i)
    3. use builder.append_value(new_value) or builder.append_null()
  14. Finish the builder and wrap the result in an Arc

    This API has some issues:

  15. value(i) is unsafe, even though it is not marked as such

  16. builders are usually slow due to the checks that they need to perform

  17. The API is not intuitive

    Proposal

    This proposal aims at improving this API in 2 specific ways:

  • Implement IntoIterator Iterator<Item=T> and Iterator<Item=Option>

  • Implement FromIterator<Item=T> and Item=Option

    so that users can write:

    // incoming array
    let array = Int32Array::from(vec![Some(0), None, Some(2), None, Some(4)]);
    let array = Arc::new(array) as ArrayRef;
    let array = array.as_any().downcast_ref::<Int32Array>().unwrap();
    
    // to and from iter, with a +1
    let result: Int32Array = array
        .iter()
        .map(|e| if let Some(r) = e { Some(r + 1) } else { None })
        .collect();
    
    let expected = Int32Array::from(vec![Some(1), None, Some(3), None, Some(5)]); 
    
    assert_eq!(result, expected);

     

    This results in an API that is:

  1. efficient, as it is our responsibility to create FromIterator that are efficient in populating the buffers/child etc from an iterator
  2. Safe, as it does not allow segfaults
  3. Simple, as users do not need to worry about Builders, buffers, etc, only native Rust.

Reporter: Jorge Leitão / @jorgecarleitao
Assignee: Jorge Leitão / @jorgecarleitao

PRs and other links:

Note: This issue was originally created as ARROW-10030. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Jorge Leitão / @jorgecarleitao:
Issue resolved by pull request 8211
#8211

@asfimport asfimport added this to the 2.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants