Skip to content

Conversation

de-vri-es
Copy link
Contributor

This PR changes the way the initial capacity is calculated for PyArray::from_iterator.

If size_hint gives an upper bound, that upper bound is used as the initial capacity. Else, either the minimum bound from the size hint or the original 512 / size_of::<T>() is used, whichever is larger.

Additionally, the PR fixes an off-by-one in the resize condition. The old check would resize the array one element too soon. That meant doubling the array size unnecessarily, even if the size hint gave a correct upper bound .

With these changes, from_iterator is now pretty much equally performant as from_exact_iter. The only remaining difference is that the loop still has to check if the capacity is exceeded, since we can't rely on the size hint for memory safety. Otherwise I would have considered deprecating that function entirely.

Even though the size hint is not guaranteed to be correct,
it is a better than a fixed guess.
kngwyu
kngwyu previously approved these changes Nov 15, 2019
Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kngwyu kngwyu dismissed their stale review November 15, 2019 07:44

Noticed what to be fixed

@kngwyu
Copy link
Member

kngwyu commented Nov 15, 2019

Thank you for your PR.
I don't know much about how size_hint is used, but is there no case where max_len is really large(e.g., (usize::MAX, Some(10)), which is problematic?

Additionally, the PR fixes an off-by-one in the resize condition. The old check would resize the array one element too soon. That meant doubling the array size unnecessarily, even if the size hint gave a correct upper bound .

Hoops, thank you for the fix.

@de-vri-es
Copy link
Contributor Author

I don't know much about how size_hint is used, but is there no case where max_len is really large(e.g., (usize::MAX, Some(10)), which is problematic?

Theoretically, size_hint can return anything. However, an incorrect large upper or lower bound would be a buggy implementation. size_hint is intended exactly for reserving capacity. From the standard library documentation:

size_hint() is primarily intended to be used for optimizations such as reserving space for the elements of the iterator, but must not be trusted to e.g., omit bounds checks in unsafe code. An incorrect implementation of size_hint() should not lead to memory safety violations.

The default implementation of Iterator::size_hint returns (0, None) which still results in the old initial capacity. If an implementation does return incorrect large bounds, I would consider that a bug for that Iterator, not something we should protect against here.

@kngwyu
Copy link
Member

kngwyu commented Nov 16, 2019

That makes sense, thank you for explanation 👍

@kngwyu kngwyu merged commit 15e15aa into PyO3:master Nov 16, 2019
@de-vri-es
Copy link
Contributor Author

Thanks for the review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants