New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add yielding line number to {String,IO}#each_line #7275

Open
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@straight-shoota
Copy link
Member

straight-shoota commented Jan 5, 2019

When iterating through lines you often need a line number. This PR adds them as a second (optional) argument to the block provided to String#each_line, IO#each_line and File.each_line.
In the updated API docs I removed unrelated features upcase and reverse from the examples.

Alternatively, it could be a separate method called #each_line_with_number but I think the proposed option is better.

A second commit in this PR refactors related specs in io_spec.cr which I noticed could be simplified while adding the specs for yielding a line number.

Show resolved Hide resolved src/io.cr Outdated
@@ -908,19 +908,30 @@ abstract class IO
# ```
# io = IO::Memory.new("hello\nworld")
# io.each_line do |line|
# puts line.chomp.reverse
# puts line

This comment has been minimized.

@bcardiff

bcardiff Jan 5, 2019

Member

The chomp maybe was there from times when the each_line didn't have a chomp option.

I think it's unclear what are the available options. We should add see IO#gets for available options.

This comment has been minimized.

@straight-shoota

straight-shoota Jan 5, 2019

Member

I'd like to change that in a different PR.

Show resolved Hide resolved src/string.cr Outdated

@straight-shoota straight-shoota force-pushed the straight-shoota:jm/feature/each-line-lino branch from c2c1a53 to 76e77ef Jan 5, 2019

Show resolved Hide resolved src/io.cr
Show resolved Hide resolved src/io.cr Outdated
Show resolved Hide resolved src/io.cr Outdated

@straight-shoota straight-shoota force-pushed the straight-shoota:jm/feature/each-line-lino branch from 76e77ef to eb6aa57 Jan 5, 2019

# haiku = "the first cold shower
# even the monkey seems to want
# a little coat of straw"
# haiku.each_line do |stanza, lino|

This comment has been minimized.

@r00ster91

r00ster91 Jan 5, 2019

Contributor
Suggested change Beta
# haiku.each_line do |stanza, lino|
# haiku.each_line do |stanza, line_number|

There's also still some lino left in the specs and in other examples.

This comment has been minimized.

@Sija

Sija Jan 5, 2019

Contributor

Btw, conventially this shortcut is written as lineno which I'd suggest using instead of lino.

This comment has been minimized.

@r00ster91

r00ster91 Jan 5, 2019

Contributor

index is also an option. Let's please use fully-written names.

@ysbaddaden

This comment has been minimized.

Copy link
Member

ysbaddaden commented Jan 5, 2019

What happens if you have a file with more than Int32::MAX lines? Or connect a TCP socket to a server that will eventually stream more than this number of lines?

I guess overflow exception, whatever if we actually use the line_number variable.

@Sija

This comment has been minimized.

Copy link
Contributor

Sija commented Jan 5, 2019

Having unified Integer type would be a bliss...

@asterite

This comment has been minimized.

Copy link
Contributor

asterite commented Jan 5, 2019

I played a bit with implementing a unified Integer type, it's super slow, or at least like 1000 times slower than primitives. Maybe it could be further optimized, I don't know, but I think it's very unlikely we'll have something like that unless someone starts to really play with the idea and finds a way to make it fast (the gist is a type that holds either an int or a big int under the hood)

@RX14

This comment has been minimized.

Copy link
Member

RX14 commented Jan 5, 2019

you can always use offset: 1_i64 to stop any overflows.

@ysbaddaden

This comment has been minimized.

Copy link
Member

ysbaddaden commented Jan 6, 2019

@RX14 It makes it less likely, but can still happen, if I create a connection and write messages for some weeks. I can specify a u128, but then io.read_line(offset: 0_u128) becomes more complex/confusing than while line = io.gets; end which is simpler.

I'm not saying it's a bad idea to pass the iteration line number, but... aren't we repeating the loop yields the iteration count idea (not bad) but that we eventually dropped because of this very reason (overflow)?

@straight-shoota

This comment has been minimized.

Copy link
Member

straight-shoota commented Jan 6, 2019

We could also make this a separate method #each_line_with_number.

Another option is to simply count on the offending counter being optimized out by LLVM when the line number block arg is not used 😏

@asterite

This comment has been minimized.

Copy link
Contributor

asterite commented Jan 6, 2019

I don't think overflowing is a problem. I'd probably use the line number when iterating a file's lines, and file lines are super unlikely to overflow.

That said, iterating and using an extra local variable for this not-that-common use case is fine too. Otherwise, I'd name it each_line_with_index like we have in other places.

@straight-shoota

This comment has been minimized.

Copy link
Member

straight-shoota commented Jan 6, 2019

Yes, when using files you're probably safe, but @ysbaddaden's example of a long running server connection doesn't sound too far-fetched. And it's definitely a use case to count the number of received lines.

@RX14

This comment has been minimized.

Copy link
Member

RX14 commented Jan 6, 2019

@ysbaddaden if you managed to overflow a 64-bit signed integer of lines, you've received more than 2^63 bytes, or 9.22 exabytes.

@bcardiff

This comment has been minimized.

Copy link
Member

bcardiff commented Jan 7, 2019

Adding #each_line_with_number sounds good enough for me.

straight-shoota added some commits Jan 5, 2019

Add #each_line_with_number
This method accompanies String#each_line, IO#each_line and
File.each_line and yields a second argument to the block containing
the line number.
It is implemented as a separate method instead of an optional block
argument to each_line to avoid potential (although unlikely) overflow
when reading large amounts of lines from an IO.

@straight-shoota straight-shoota force-pushed the straight-shoota:jm/feature/each-line-lino branch from eb6aa57 to caa0466 Jan 7, 2019

@straight-shoota

This comment has been minimized.

Copy link
Member

straight-shoota commented Jan 7, 2019

Separated into each_line_with_number

@RX14

This comment has been minimized.

Copy link
Member

RX14 commented Jan 7, 2019

Then I don't see the point. Is this really used commonly enough to justify a new method? At least each_with_index, however stupid I think it is, is commonly used.

@straight-shoota

This comment has been minimized.

Copy link
Member

straight-shoota commented Jan 7, 2019

I know I have used this quite a few times, that's why I'd welcome it as a helpful addition.
There is one use case in stdlib, #7252 shows another one.

If that's enough to justify adding another method (or actually three), I don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment