Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator: remove `rewind`. Implement `cycle` by storing elements in an array #7440

Merged

Conversation

@asterite
Copy link
Member

commented Feb 15, 2019

This is a controversial PR. It removes the #rewind method from all existing Iterators.

The reasons for this PR are explained in this comment but I'll explain it again here.

The reason why I originally added Iterator#rewind was:

  • I saw that Ruby has Enumerable#cycle
  • I thought of a way to implement it and the first idea I had was to be able to rewind iterators

Later on I found our that cycle was actually implemented in Ruby by building an internal Array of the elements in the Enumerable.

I think the advantages of implementing cycle like this, and removing rewind are:

  • some iterators can't be rewinded (like Iterator.of)
  • some iterators can be rewinded, but they actually don't end up rewinding anything (this is case of STDIN.each_line, which doesn't give an error, we can't detect the error and it simply works wrong)
  • implementing Iterator is simple: just implement next, no need to think about rewinding

This is a breaking change. However, I seriously doubt anyone out there is using rewind on Iterator (but maybe they are using cycle). I'm not sure, of course, but in any case I'd prefer to remove it.

@Sija

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2019

This is a breaking change. However, I seriously doubt anyone out there is using rewind on Iterator (but maybe they are using cycle). I'm not sure, of course, but in any case I'd prefer to remove it.

https://github.com/search?q=rewind+-filename%3Aiterator_spec.cr&l=Crystal&type=Code

There's a bit of 'em, although it's debatable how much of the results are meaningful.

@asterite

This comment has been minimized.

Copy link
Member Author

commented Feb 16, 2019

Thanks Sija. All the result I saw seem to be IO#rewind or implementations of rewind (sometimes called in initialize, but that's easy to fix), but none of them seem to be actual usages of the rewind method. This further confirms my belief that rewind is there just because it needs to be there, not because it's actually useful (other than for implementing cycle in a different way).

@asterite

This comment has been minimized.

Copy link
Member Author

commented Feb 16, 2019

Let's wait Brian before merging this.

@asterite asterite added this to the 0.28.0 milestone Feb 16, 2019

@jhass jhass requested a review from bcardiff Feb 18, 2019

@bcardiff
Copy link
Member

left a comment

I like this change. The implementation is more composable, there is less code and the results are more consistent despite the nature of the iterator. 💯

@asterite asterite merged commit c3f676d into crystal-lang:master Feb 26, 2019

5 checks passed

ci/circleci: check_format Your tests passed on CircleCI!
Details
ci/circleci: test_darwin Your tests passed on CircleCI!
Details
ci/circleci: test_linux Your tests passed on CircleCI!
Details
ci/circleci: test_linux32 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@bcardiff

This comment has been minimized.

Copy link
Member

commented Feb 26, 2019

A final note :-) Buffered#rewind still exists and is fine to be used as long as unbuffered_rewind is implemented. So some cases of IO based rewing can still work.

@asterite asterite deleted the asterite:feature/iterator-no-more-rewind branch Mar 30, 2019

@baelter

This comment has been minimized.

Copy link

commented May 6, 2019

I was using this to get the size of an iterator without having to allocate an array for the elements.
E.g.

size = itr.size
itr.rewind
...

Can this be done in any other way?

@bcardiff

This comment has been minimized.

Copy link
Member

commented May 7, 2019

@baelter You could see if the specific iterator support #rewind (or #dup maybe) and but have a fallback when you don't/can't know the size.

Something like:

if itr.responds_to?(:rewind)
  # acurate size hint 
  size = itr.size
  itr.rewind
  a = Array(T).new(size)
else
  # no size hint
  a = Array(T).new
end

# use `a` and and `itr`

But depending on the use case just collection the items in the array might be enough. If you definitely needs the whole set of data there is probably no improvement by using an iterator.

@baelter

This comment has been minimized.

Copy link

commented May 8, 2019

I don't need all the items, so I would prefer to not have to allocate memory for new array for them only to get the size of the collection. I suppose I could iterate everything and count and only emit objects where needed.

#duped iterators have the same state as the original it seems like.

@asterite

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

Maybe you could describe your entire use case? It seems pretty weird to me.

@Sija

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

@asterite btw, I had a similar issue recently, could you please take a look at my approach in Sija/retriable.cr@8cd9986 - does it make sense / is there a room for improvement?

@asterite

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

Are the items there like thousands or millions? Is it an infinite list? Because if not, will the code work equally well with an Array? I think all of this might be a case of premature optimization.

@baelter

This comment has been minimized.

Copy link

commented May 9, 2019

Use case is a huge collection, potentially millions, of largeish objects that are being exposed in a paged json api. I need the have the totalt count but only give a small subset to the json builder.

I want to only use an iterator all the way to avoid allocating an array for the entire collection.

I've found a working solution in iterating the entire collection and counting the total while only building json for the page window. Code was nicer with rewind though.

@asterite

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

One way to do it is:

def results_and_total(input)
  iter = input.each
  results = iter.first(10).to_a
  total = results.size + iter.size
  p! input, results, total
end

results_and_total(1..100)
results_and_total(1..8)

That is, you take what you need, and then you get the size of the remaining stuff.

This isn't longer code than using rewind, and it's also faster because you consume the iterator just once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.