You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While looking for a way to make loading array data from parquet files faster, I stumbled on an edge case where string and binary arrays are created with an incorrect length from an iterator with no upper bound.
Here is a simple example:
// iterator that doesn't declare (upper) size bound
let string_iter = (0..).scan(0usize, |pos, i| {
if \*pos < 10 {
\*pos += 1;
Some(Some(format!("value {}", i)))
}
else {
// actually returns up to 10 values
None
}
})
// limited using take()
.take(100);
let (lower_size_bound, upper_size_bound) = string_iter.size_hint();
assert_eq!(lower_size_bound, 0);
// the upper bound, defined by take above, is 100
assert_eq!(upper_size_bound, Some(100));
let string_array: StringArray = string_iter.collect();
// but the actual number of items in the array is 10
assert_eq!(string_array.len(), 10);
Fortunately this is easy to fix by using the length of the child offset array and I will be creating a PR for this shortly.
While looking for a way to make loading array data from parquet files faster, I stumbled on an edge case where string and binary arrays are created with an incorrect length from an iterator with no upper bound.
Here is a simple example:
Fortunately this is easy to fix by using the length of the child offset array and I will be creating a PR for this shortly.
Reporter: Yordan Pavlov / @yordan-pavlov
Assignee: Yordan Pavlov / @yordan-pavlov
PRs and other links:
Note: This issue was originally created as ARROW-11799. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: