Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix length of Iterators.Stateful with underlying mutable iterator #45924

Merged
merged 13 commits into from
Sep 2, 2022

Conversation

jakobnissen
Copy link
Contributor

Previously, Iterators.Stateful kept track of how many elements have been taken,
and reported its length as the length of the underlying iterator minus the
number of taken elements.
However, this assumes the underlying iterator is immutable. If it is mutable,
then the reported length is the same as the number of remaining elements,
meaning the wrapper Stateful should not subtract the number of taken elements.

After this PR, Stateful instead stores the length of the underlying iterator,
if that can be known, and subtracts one every time an element is taken. Hence,
the number of remaining elements is being kept track of directly.

Fixes #43245

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think it might remain useful to keep the prior behavior also, by using the signbit instead of a Union. Commented my suggestions on how to do that.

base/iterators.jl Outdated Show resolved Hide resolved
base/iterators.jl Outdated Show resolved Hide resolved
Comment on lines 1403 to 1406
rem = s.remaining
if rem !== nothing
s.remaining = rem - 1
end
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rem = s.remaining
if rem !== nothing
s.remaining = rem - 1
end
rem = s.remaining
@assert !iszero(rem)
s.remaining = rem - 1

base/iterators.jl Outdated Show resolved Hide resolved
@jakobnissen
Copy link
Contributor Author

jakobnissen commented Jul 5, 2022

I can make these changes, no problem. But I don't quite understand:

  • What is the advantage of having remaining be a type param instead of just an Int? Given that it will only be used to store an integer? Is it to prevent overflow for iterators who return their length as e.g. BigInt?
  • What is the purpose of keeping the old behaviour? As far as I can tell, all it does it to defer the call to length when the wrapper iterator has no length from instatiation of Stateful until length(::Stateful) is called. It will error either way.
    Edit: Ah, I guess it's to not require that the underlying iterator actually implements length, like the CharStr in the tests I had to modify to make it work. For backwards compatibility?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 5, 2022

  • What is the advantage of having remaining be a type param instead of just an Int? Given that it will only be used to store an integer? Is it to prevent overflow for iterators who return their length as e.g. BigInt?

Mostly just to avoid any possible regression here. Though length is typically expected to be an Int, it isn't strictly required. Someone might theoretically pair it with a truncating iterator (such as via zip), making the length the shorter of this and the other.

  • What is the purpose of keeping the old behaviour? As far as I can tell, all it does it to defer the call to length when the wrapper iterator has no length from instatiation of Stateful until length(::Stateful) is called. It will error either way.
    Edit: Ah, I guess it's to not require that the underlying iterator actually implements length, like the CharStr in the tests I had to modify to make it work. For backwards compatibility?

Yeah, that was my hope

base/iterators.jl Outdated Show resolved Hide resolved
base/iterators.jl Outdated Show resolved Hide resolved
jakobnissen and others added 9 commits July 17, 2022 11:45
Previously, Iterators.Stateful kept track of how many elements have been taken,
and reported its length as the length of the underlying iterator minus the
number of taken elements.
However, this assumes the underlying iterator is immutable. If it is mutable,
then the reported length is the same as the number of remaining elements,
meaning the wrapper Stateful should not subtract the number of taken elements.

After this PR, Stateful instead stores the length of the underlying iterator,
if that can be known, and subtracts one every time an element is taken. Hence,
the number of remaining elements is being kept track of directly.

Fixes JuliaLang#43245
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@jakobnissen
Copy link
Contributor Author

@vtjnash I think this should be good to go, and would be nice to get in v1.9 :)

@vtjnash vtjnash merged commit 6d246df into JuliaLang:master Sep 2, 2022
@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 2, 2022

Thanks for the bump

@jakobnissen jakobnissen deleted the statefullen branch September 2, 2022 20:52
@aviatesk
Copy link
Sponsor Member

Bisected that this introduced a regression reported at #46719.

jakobnissen added a commit to jakobnissen/julia that referenced this pull request Oct 11, 2022
PR JuliaLang#45924 fixed length of Stateful, but this change requires Stateful to call
length on its wrapped iterator on instantiation.
The current implementation of cmp(::AbstractString, ::AbstractString) wraps the
strings in Stateful to efficiently check if one string is longer than the other
without needing an O(N) call to length.
However, with JuliaLang#45924 this length call is done anyway, which led to a performance
regression.

In this PR, the cmp method is changed so it no longer relies on Stateful to do
the same thing.

Fix JuliaLang#46719
aviatesk pushed a commit that referenced this pull request Oct 12, 2022
PR #45924 fixed length of `Stateful`, but this change requires `Stateful` to call
`length` on its wrapped iterator on instantiation.
The current implementation of `cmp(::AbstractString, ::AbstractString)` wraps the
strings in `Stateful` to efficiently check if one string is longer than the other
without needing an O(N) call to `length`.
However, with #45924 this length call is done anyway, which led to a performance
regression.

In this PR, the `cmp` method is changed so it no longer relies on `Stateful` to do
the same thing.

Fix #46719
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Length of iterators.Stateful can be wrong
3 participants