Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] String and Binary arrays created with incorrect length from unbound iterator #27650

Closed
asfimport opened this issue Feb 26, 2021 · 1 comment

Comments

@asfimport
Copy link
Collaborator

While looking for a way to make loading array data from parquet files faster, I stumbled on an edge case where string and binary arrays are created with an incorrect length from an iterator with no upper bound.

Here is a simple example:

 // iterator that doesn't declare (upper) size bound
        let string_iter = (0..).scan(0usize, |pos, i| { 
            if \*pos < 10 {
                \*pos += 1;
                Some(Some(format!("value {}", i)))
            }
            else {
                // actually returns up to 10 values
                None
            }
        })
        // limited using take()
        .take(100);

        let (lower_size_bound, upper_size_bound) = string_iter.size_hint();
        assert_eq!(lower_size_bound, 0);
        // the upper bound, defined by take above, is 100
        assert_eq!(upper_size_bound, Some(100));
        let string_array: StringArray = string_iter.collect();
        // but the actual number of items in the array is 10
        assert_eq!(string_array.len(), 10);

Fortunately this is easy to fix by using the length of the child offset array and I will be creating a PR for this shortly.

Reporter: Yordan Pavlov / @yordan-pavlov
Assignee: Yordan Pavlov / @yordan-pavlov

PRs and other links:

Note: This issue was originally created as ARROW-11799. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Andrew Lamb / @alamb:
Issue resolved by pull request 9588
#9588

@asfimport asfimport added this to the 4.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant