read/read_fully/read_partial/unbuffered_read behavior inconsistent and design unclear #1276

technorama · 2015-08-25T11:58:57Z

Do IO#read methods wait for all requested bytes or return with any available data? The current implementation varies in behavior depending the number of bytes requested. What is the intended behavior?

read(slice, count) # loops until count bytes are read ONLY if slice < BUFFER_SIZE, otherwise returns BUFFERSIZE bytes
read(slice) # loops until slice.count bytes are read ONLY if slice < BUFFER_SIZE, otherwise returns BUFFERSIZE bytes
read() # loops until EOF
read(count) # blocks until count bytes are read (not sure how)
read_fully(slice) loops until read returns slice.count bytes. (double loop)

Completely missing is read_partial functionality - a method that returns any available bytes <= the requested size.

The text was updated successfully, but these errors were encountered:

asterite · 2015-08-25T13:10:34Z

We'll probably remove read(slice, count) and use read(slice) instead, and it will read at most slice.length bytes and return the amount read. That would be the same as read_partial, right?

For the others:

read: reads everything
read(count): reads at most count bytes, looping until the count is reached or until EOF
read_fully(slice): tries to read slice.length bytes and raises EOFError if it can't

But it's true that we need to better design these methods.

technorama · 2015-08-25T13:38:37Z

There is an inconsistency with the above proposal which would be confusing for anyone trying to learn the API.

read: everything
read(count): everything
read(slice): NOT everything

I propose that read always reads the full amount requested

read: everything
read(count): everything
read(slice): everything

And read_partial returns whatever data is available

read_partial: first available data
read_partial(count): first available data
read_partial(slice): first available data

read_fully can be retired.

Kilobyte22 · 2015-08-30T19:06:59Z

Let me annotate that read_partial should block if there are 0 bytes available.

asterite · 2015-09-04T21:53:55Z

We were thinking something along this line. These two methods, the ones that every IO must implement, read partially:

- read(slice : Slice(UInt8)) : Int32
- read(slice : Slice(Char)) : Int32

Every other read-related method reads fully, and this is the consistent part:

- read_fully(slice : Slice(UInt8)) : Nil
- read_fully(slice : Slice(Char)) : Nil
- read_to_end : Slice(UInt8)
- read_chars_to_end : Slice(Char)
- read_string_to_end : String # the current read()
- read(count) : Slice(UInt8)
- read_chars(count) : Slice(Char)
- read_string(count) : String # the current read(count)

The convention is that read, without suffix, means bytes. Then if there's an argument for overloading, like Slice(UInt8) or Slice(Char), we can use that. Otherwise, we use chars for Slice(Char) and string for String.

We will also remove read_nonblock.

gets will still be around, as reading a line only makes sense for strings.

asterite · 2015-09-08T15:58:51Z

@technorama Any opinion on the above methods, types and names?

technorama · 2015-09-09T03:32:42Z

I am confused about the need for Slice(Char) vs Slice(UInt8) vs String. There seem to be too many types but maybe I am ignorant on the intended usage.

Please correct me where I'm wrong.

String is for utf8 strings
Slice(Char) is for ?
Slice(Uint8) is for binary/ascii, data, or passing to C functions.

technorama · 2015-09-09T03:53:22Z

read_nonblock can be useful with multiprocessing.

One example is nginx and forking. The loop is similiar to:

# in each process
loop do
  multiprocess_lock do
    # multiple sockets are accepted by same process until failure to avoid context switch time.
    while sock = accept_nonblock
      add_to_event_queue sock
    end
  end

Currently it can only be done with forking. Similar things can be done in the future with websocket or other servers/services when multithreading is available.

technorama · 2015-09-09T03:54:40Z

That's a case for accept_nonblock, perhaps not read_nonblock. Maybe there isn't one.

asterite · 2015-09-09T13:28:16Z

The methods that accept Slice(Char) would work similar to Slice(UInt8): they would fill the slice with characters. We based this functionality from C#'s TextReader. Although now that I think of it, I don't know how to implement that, because read(slice : Slice(Char)) will probably invoke read(slice : Slice(UInt8)) and if you don't read the necessary bytes to "finish" a char I don't know how to handle that.

But read_chars : Slice(Char) can be implemented fine, and you'd get a slice, instead of a String.

@waj noted that in Go there's no read_nonblock so we might want to try removing it and later see if it's needed at all.

asterite · 2015-09-14T19:52:12Z

We finally decided to do it like this:

# partial read
- read(slice : Slice(UInt8)) : Int32
- read(slice : Slice(Char)) : Int32

# full read
- read_fully(slice : Slice(UInt8)) : Nil
- read_fully(slice : Slice(Char)) : Nil
- read_to_end : String
- read(count) : Slice(UInt8)
- read_chars(count) : Slice(Char)
- read_string(count) : String

technorama · 2015-09-15T01:46:13Z

There's still a discrepancy in behavior and naming. read should be partial or full for all types of arguments.

 # partial read
- read(slice : Slice(UInt8)) : Int32
- read(slice : Slice(Char)) : Int32

# full read
- read(count) : Slice(UInt8)

Here's an example of what can go wrong. Expect lots of confusion even when not refactoring.

# first iteration
while slice = io.read(FRAME_SIZE)
end

# optimized version reusing the same buffer.
slice = Bytes.new(FRAME_SIZE)
while io.read(slice) > 0
  # BREAKS HORRIBLY
end

technorama · 2015-09-15T01:59:57Z

You could make read(count) a partial read and add read_fully(count).

read_to_end as a Slice(UInt8) is useful in a variety of protocols.

# HTTP read without chunking or a known content-length
read_headers(io)
bytes = io.read_to_end # contains file data
io.close

# FTP
TCPSocket.open(host, data_port) do
  bytes = io.read_to_end
end

# Unix programs
Process.run("mysqlbinlog $filename") do |pr|
  process_binlog(pr.output.read_to_end)
end

waj · 2015-09-15T02:44:23Z

I think I agree with the read overload that gets a fixed amount of bytes. We can name it read_bytes(count : Int32) : Slice(UInt8) instead.

Regarding methods to get the full binary content into memory, we'd like to discourage that and that's why we removed those overloads from the list. Reading as string is also questionable but sometimes convenient.

asterite · 2015-10-18T13:48:25Z

I'm closing this, the only partial behaviour if that of read(slice), read_fully(slice) reads fully, and then we don't have any other overloads for read.

jhass · 2016-07-23T15:30:06Z

This is still an issue basically, it's still inconsistent, read_nonblock is still there, broken and has no replacement.

Because of https://github.com/crystal-lang/crystal/blob/master/src/io/file_descriptor.cr#L225 it's not possible to request up to n bytes and return immediately. In other words there's no way to check whether there's data to read. Not even IO.select is usable to check for it, since the data may have been consumed by IO::Buffered already.

So we do lack an interface that

returns immediately in any case.
returns immediately if there's no pending data in any buffer, kernel or userspace.
tries to return as much data as possible but chooses to return less than requested instead of blocking the caller in any way.
does this independently of the type of the IO or its underlying file descriptor, and independently of any configuration/flags set on said file descriptor.

Alternatively we lack an interface that returns the number of bytes that the next read call can request without being blocked in any way (again independently of the type of the IO or its underlying file descriptor, and independently of any configuration/flags set on said file descriptor).

asterite · 2016-07-23T16:12:40Z

I honestly don't know a lot about these subjects, but if all our IOs are non-blocking by default, why do we need a read_nonblock? If the IO is not available, give the control to another fiber until this fiber is done. In what case do you need to read what's avaialable, without switching fibers?

asterite · 2016-07-23T16:12:48Z

Maybe @waj could comment more on this...

jhass · 2016-07-23T16:31:20Z

It quickly becomes important in TUI applications when reading keypresses, you may want to redraw continuously while no key is pressed, so not block on waiting for the next keypress, but also have precise control over when to read the next keypress.

asterite · 2016-07-23T16:40:20Z

Maybe it's about configuring something in IO::FileDescriptor to return the read bytes instead of switching to another fiber? The check should be done here:

https://github.com/crystal-lang/crystal/blob/master/src/io/file_descriptor.cr#L216-L234

Instead of calling wait_readable we would return.

How can we test this, if we want to support such feature?

jhass · 2016-07-23T17:03:48Z

Reading from an empty pipe should work I guess.

jhass added RFC topic:stdlib labels Aug 31, 2015

jhass mentioned this issue Aug 31, 2015

Socket.read and Socket.read_nonblock doesn't behaves the same #759

Closed

asterite mentioned this issue Sep 14, 2015

IO#read* refactoring for read_partial/read_nonblock #1353

Closed

jhass added the status:draft label Sep 14, 2015

waj mentioned this issue Sep 15, 2015

Add IO#write(Enumerable(UInt8)) #1489

Closed

jhass mentioned this issue Sep 29, 2015

discussion: should there be IO#write_fully and IO.copy_fully #1652

Closed

asterite closed this as completed Oct 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read/read_fully/read_partial/unbuffered_read behavior inconsistent and design unclear #1276

read/read_fully/read_partial/unbuffered_read behavior inconsistent and design unclear #1276

technorama commented Aug 25, 2015

asterite commented Aug 25, 2015

technorama commented Aug 25, 2015

Kilobyte22 commented Aug 30, 2015

asterite commented Sep 4, 2015

asterite commented Sep 8, 2015

technorama commented Sep 9, 2015

technorama commented Sep 9, 2015

technorama commented Sep 9, 2015

asterite commented Sep 9, 2015

asterite commented Sep 14, 2015

technorama commented Sep 15, 2015

technorama commented Sep 15, 2015

waj commented Sep 15, 2015

asterite commented Oct 18, 2015

jhass commented Jul 23, 2016

asterite commented Jul 23, 2016

asterite commented Jul 23, 2016

jhass commented Jul 23, 2016

asterite commented Jul 23, 2016

jhass commented Jul 23, 2016

read/read_fully/read_partial/unbuffered_read behavior inconsistent and design unclear #1276

read/read_fully/read_partial/unbuffered_read behavior inconsistent and design unclear #1276

Comments

technorama commented Aug 25, 2015

asterite commented Aug 25, 2015

technorama commented Aug 25, 2015

Kilobyte22 commented Aug 30, 2015

asterite commented Sep 4, 2015

asterite commented Sep 8, 2015

technorama commented Sep 9, 2015

technorama commented Sep 9, 2015

technorama commented Sep 9, 2015

asterite commented Sep 9, 2015

asterite commented Sep 14, 2015

technorama commented Sep 15, 2015

technorama commented Sep 15, 2015

waj commented Sep 15, 2015

asterite commented Oct 18, 2015

jhass commented Jul 23, 2016

asterite commented Jul 23, 2016

asterite commented Jul 23, 2016

jhass commented Jul 23, 2016

asterite commented Jul 23, 2016

jhass commented Jul 23, 2016