Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added docs for IO interface #41291

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ExpandingMan
Copy link
Contributor

@ExpandingMan ExpandingMan commented Jun 20, 2021

There is currently no documented interface for an IO object. Figuring this out was a little bit tricky. Unfortunately, it requires users to define the low level methods unsafe_read and unsafe_write. Ideally, users would only have to define read, write and eof but making this possible would likely be quite breaking (currently it leads to method ambiguities).

Anyway, I'm not super confident in the below. I keep expecting it to mysteriously break, but I've been messing around with these a fair amount and so far this seems to work. Comments form somebody more intimately familiar with the base IO code would be appreciated.

@fredrikekre fredrikekre requested a review from vtjnash June 20, 2021 19:05
@fredrikekre fredrikekre added domain:docs This change adds or pertains to documentation domain:io Involving the I/O subsystem: libuv, read, write, etc. labels Jun 20, 2021
@oscardssmith oscardssmith added the status:forget me not PRs that one wants to make sure aren't forgotten label Jun 23, 2021

An `IO` that only defines `unsafe_read`, `unsafe_write` and `eof` can have most `isbits`
types written or read to or from it, but the `read(io, ::Type{UInt8})` and `write(io,
::UInt8)` methods must be defined to enable reading and writing individual bytes.
Copy link
Contributor

@goretkin goretkin Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ordering of thoughts here is a bit is a bit confusing, but also I am not very familiar with the IO interface (and am grateful that you're documenting it!)

My expectation from reading this is that write(io, x::MyType) should ultimately be (and in Base, is) written in terms of write(io, x::UInt8), so this wording might be clearer:
"An isbits type T (for which isbits(T)) typically defines write(::IO, ::T) in terms of "simpler" write methods, ultimately ending in write(io, ::UInt8) (and equivalently for read). The read(io, ::Type{UInt8}) and write(io, ::UInt8) methods must be defined to enable reading and writing individual bytes of these isbits types.

But perhaps more importantly, this doesn't exactly match the exploration I just did, which tl;dr suggests that it could be okay to just define write(s::MyIO, x::UInt8) without defining unsafe_write.

To probe the system:

julia> struct MyIO <: IO
       end

julia> write(MyIO(), 1 + 2*im)
ERROR: MyIO does not support byte I/O
Stacktrace:
  [1] error(::Type, ::String)
    @ Base ./error.jl:42
  [2] write(s::MyIO, x::UInt8)
    @ Base ./io.jl:224
  [3] unsafe_write
    @ ./io.jl:238 [inlined]
  [4] unsafe_write
    @ ./io.jl:646 [inlined]
  [5] unsafe_write(s::MyIO, p::Base.RefValue{Int64}, n::Int64)
    @ Base ./io.jl:644
  [6] write
    @ ./io.jl:647 [inlined]
  [7] write
    @ ./io.jl:650 [inlined]
  [8] write
    @ ./io.jl:637 [inlined]
  [9] write(s::MyIO, z::Complex{Int64})
    @ Base ./complex.jl:221
 [10] top-level scope
    @ REPL[11]:1

The stacktrace is a bit uninformative, but here is the sequence that I can tell.

  • [9] the complex number is written as its components

  • [8] multiple arguments are written

    julia/base/io.jl

    Lines 638 to 644 in e469a1e

    function write(io::IO, x1, xs...)
    written::Int = write(io, x1)
    for x in xs
    written += write(io, x)
    end
    return written
    end

  • [7] each component is written

    julia/base/io.jl

    Lines 651 to 653 in e469a1e

    function write(s::IO, x::Union{Int16,UInt16,Int32,UInt32,Int64,UInt64,Int128,UInt128,Float16,Float32,Float64})
    return write(s, Ref(x))
    end

  • [6] e.g. Int is written by making a Ref{Int} and writing that

    write(s::IO, x::Ref{T}) where {T} = unsafe_write(s, x, Core.sizeof(T))

  • [5] now we make our first call to unsafe_write:

    julia/base/io.jl

    Lines 646 to 647 in e469a1e

    @noinline unsafe_write(s::IO, p::Ref{T}, n::Integer) where {T} =
    unsafe_write(s, unsafe_convert(Ref{T}, p)::Ptr, n) # mark noinline to ensure ref is gc-rooted somewhere (by the caller)

  • [4] which calls this:

    unsafe_write(s::IO, p::Ptr, n::Integer) = unsafe_write(s, convert(Ptr{UInt8}, p), convert(UInt, n))

  • [3] which calls this:

    julia/base/io.jl

    Lines 235 to 241 in e469a1e

    function unsafe_write(s::IO, p::Ptr{UInt8}, n::UInt)
    written::Int = 0
    for i = 1:n
    written += write(s, unsafe_load(p, i))
    end
    return written
    end

    And this is the one of the required methods for the IO interface according to the PR at hand (https://github.com/JuliaLang/julia/pull/41291/files#diff-f7164005c91058bbacdf095d8de0f02dd6194daf17d58979114eb505d292e35bR748)

  • [2] as just seen, there is a fallback for unsafe_write written in terms of write(s::IO, x::UInt8), so ultimately this call sequence calls an erroring "fallback" (tangentially: instead of throwing a ERROR: MethodError: no method matching write(::MyIO, ::UInt8)and using e.g.Experimental.show_error_hints` to deliver a better error message.)

    write(s::IO, x::UInt8) = error(typeof(s)," does not support byte I/O")

So in this case it would be sufficient to just define Base.write(::MyIO, ::UInt8), and I'm not sure how to reconcile that with this line in the PR.

I thought I might get a bit more understanding by looking to see some of those definitions of Base.write, but I'm not sure what I was looking for. In any case, here they are:

julia> _fieldtype(x, i) = try
       fieldtypes(x)[i]
       catch
       missing
       end
_fieldtype (generic function with 1 method)

julia> filter(m -> _fieldtype(m.sig, 3) === UInt8, methods(write).ms)
[1] write(io::Union{Core.CoreSTDERR, Core.CoreSTDOUT}, x::UInt8) in Base at coreio.jl:26
[2] write(to::Base.GenericIOBuffer, a::UInt8) in Base at iobuffer.jl:445
[3] write(io::Base.AbstractPipe, byte::UInt8) in Base at io.jl:360
[4] write(pipe::Base64.Base64EncodePipe, x::UInt8) in Base64 at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/Base64/src/encode.jl:101
[5] write(io::Base.SecretBuffer, b::UInt8) in Base at secretbuffer.jl:112
[6] write(s::Base.BufferStream, b::UInt8) in Base at stream.jl:1351
[7] write(s::Base.LibuvStream, b::UInt8) in Base at stream.jl:1091
[8] write(s::IOStream, b::UInt8) in Base at iostream.jl:366
[9] write(::Base.DevNull, ::UInt8) in Base at coreio.jl:16
[10] write(f::Base.Filesystem.File, c::UInt8) in Base.Filesystem at filesystem.jl:136
[11] write(s::IO, x::UInt8) in Base at io.jl:224

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I find your wording much more confusing than mine, maybe somebody else wants to come along and suggest a 3rd alternative? (I'm not attached to my original wording, I'm just unsure what to change.)

As far as defining only write(io, ::UInt8) goes... yes, I don't find that particularly surprising. Like I had implied, I find the current situation rather confusing. I think writing some types if only write(io, ::UInt8) is defined will crash? Regardless, I think the only way to resolve this is to get help from someone more intimately familiar with IO. So far as I can tell, what I have written in this PR cannot break.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jun 23, 2021

The closest we have right now to API demonstrations for the bare minimum are simple types like Base.DevNull and Core.CoreSTDOUT, and Base.BufferStream. Or a probably complete example is Base.AbstractPipe:

julia> methodswith(Base.AbstractPipe)
[1] bytesavailable(io::Base.AbstractPipe) in Base at io.jl:404
[2] close(io::Base.AbstractPipe) in Base at io.jl:387
[3] eof(io::Base.AbstractPipe) in Base at io.jl:416
[4] flush(io::Base.AbstractPipe) in Base at io.jl:364
[5] getproperty(pipe::Base.AbstractPipe, name::Symbol) in Base at io.jl:349
[6] ismarked(io::Base.AbstractPipe) in Base at io.jl:381
[7] isopen(io::Base.AbstractPipe) in Base at io.jl:386
[8] isreadable(io::Base.AbstractPipe) in Base at io.jl:381
[9] iswritable(io::Base.AbstractPipe) in Base at io.jl:385
[10] mark(io::Base.AbstractPipe) in Base at io.jl:381
[11] peek(io::Base.AbstractPipe, ::Type{T}) where T in Base at io.jl:383
[12] read(io::Base.AbstractPipe) in Base at io.jl:368
[13] read(io::Base.AbstractPipe, byte::Type{UInt8}) in Base at io.jl:366
[14] readavailable(io::Base.AbstractPipe) in Base at io.jl:381
[15] readbytes!(io::Base.AbstractPipe, target::AbstractVector{UInt8}) in Base at io.jl:374
[16] readbytes!(io::Base.AbstractPipe, target::AbstractVector{UInt8}, n) in Base at io.jl:374
[17] readuntil(io::Base.AbstractPipe, arg::UInt8; kw...) in Base at io.jl:369
[18] readuntil(io::Base.AbstractPipe, arg::AbstractChar; kw...) in Base at io.jl:370
[19] readuntil(io::Base.AbstractPipe, arg::AbstractString; kw...) in Base at io.jl:371
[20] readuntil(io::Base.AbstractPipe, arg::AbstractVector; kw...) in Base at io.jl:372
[21] (f::Base.RedirectStdStream)(io::Base.AbstractPipe) in Base at stream.jl:1175
[22] reset(io::Base.AbstractPipe) in Base at io.jl:381
[23] unmark(io::Base.AbstractPipe) in Base at io.jl:381
[24] unsafe_read(io::Base.AbstractPipe, p::Ptr{UInt8}, nb::UInt64) in Base at io.jl:367
[25] unsafe_write(io::Base.AbstractPipe, p::Ptr{UInt8}, nb::UInt64) in Base at io.jl:362
[26] write(io::Base.AbstractPipe, byte::UInt8) in Base at io.jl:360
[27] write(to::IO, from::Base.AbstractPipe) in Base at io.jl:361

@Marlin-Na
Copy link

Marlin-Na commented Jun 24, 2021

Hi, I think there are more optional methods that may be implemented for IO interface. E.g. seekend, position.

BTW, I wish we could have a method like seekable as an optional trait for IO type. If seekable(T) is true, then the type should implement all three of seek, seekend and position. Currently I have to use Core.applicable(seek, xxx) to do that. For example in a callback pass to C, I have something like

if !Core.applicable(seekend, io)
    Base.Libc.errno(Base.Libc.ESPIPE)
    return -1
end

Having a seekable trait would be better than the approach above.

@ExpandingMan
Copy link
Contributor Author

ExpandingMan commented Jun 24, 2021

Ok, I have added a few more of the optional methods. (I think this is all we want to consider as part of the interface...)

@notinaboat
Copy link
Contributor

I've just been considering this in the process of building UnixIO.jl
(motivated by libuv not supporting device IO libuv/libuv#484, and by unreliable sub-process io).

readavailable must be implemented as it is needed by (at least) write(io::IO, from::IO).

julia/base/io.jl

Lines 715 to 718 in 9687260

function write(to::IO, from::IO)
n = 0
while !eof(from)
n += write(to, readavailable(from))

bytesavailable could be considered to be part of the IO interface, but it has no default implementation, so it should be implemented for new IOs too.

I think it is also worth mentioning that new IOs should consider implementing readbytes!.
The default method reads one byte at a time:

julia/base/io.jl

Lines 937 to 942 in 5584620

function readbytes!(s::IO, b::AbstractArray{UInt8}, nb=length(b))
require_one_based_indexing(b)
olb = lb = length(b)
nr = 0
while nr < nb && !eof(s)
a = read(s, UInt8)

In many cases reading n bytes into a buffer can be done far more efficiently:
e.g.
https://github.com/notinaboat/UnixIO.jl/blob/78398fddb5abc2efa71fab1920cda4a9487c3e51/src/UnixIO.jl#L363-L381
https://github.com/JuliaLang/MbedTLS.jl/blob/d4261df616c0a8194c9a473e12fb663c92e9b914/src/ssl.jl#L445-L468
https://github.com/JuliaWeb/HTTP.jl/blob/1ed5fde98eab42ab1f6967a1e2de6be1332e1ba3/src/Streams.jl#L289-L293

@notinaboat
Copy link
Contributor

Related: #24526
The documentation for the abstract IO interface should define the blocking behaviour and the rules about how the different states interact. e.g. must bytesavailable() be > 0 if eof() is false? Must isreadable() be false if isopen() is false? Must isopen() be false if close() has been called?

Another consideration that this documentation should address is potential performance pitfalls. e.g. In some cases it might be better not to implement read(io, ::Type{UInt8}). If every read has high overhead, it might be better to suggest the user wraps your IO in BufferedInputStream like this:

https://github.com/JuliaWeb/HTTP.jl/blob/master/src/Streams.jl#L257-L264

@codecov
Copy link

codecov bot commented Jul 4, 2021

Codecov Report

Merging #41291 (5584620) into master (9d5f31e) will decrease coverage by 0.39%.
The diff coverage is 82.93%.

❗ Current head 5584620 differs from pull request most recent head 3526ea6. Consider uploading reports for the commit 3526ea6 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #41291      +/-   ##
==========================================
- Coverage   79.41%   79.01%   -0.40%     
==========================================
  Files         350      350              
  Lines       73565    73697     +132     
==========================================
- Hits        58418    58232     -186     
- Misses      15147    15465     +318     
Impacted Files Coverage Δ
base/compiler/stmtinfo.jl 58.33% <ø> (ø)
base/deprecated.jl 64.63% <ø> (ø)
base/loading.jl 23.42% <0.00%> (-33.60%) ⬇️
base/promotion.jl 38.88% <ø> (ø)
base/reduce.jl 94.97% <ø> (ø)
base/shell.jl 82.79% <ø> (ø)
base/show.jl 65.94% <0.00%> (+0.74%) ⬆️
stdlib/InteractiveUtils/src/clipboard.jl 0.00% <0.00%> (ø)
stdlib/InteractiveUtils/src/editless.jl 9.78% <0.00%> (-0.11%) ⬇️
stdlib/Random/src/Random.jl 94.93% <ø> (ø)
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 153f908...3526ea6. Read the comment docs.

@ExpandingMan
Copy link
Contributor Author

I can keep plugging ahead with changes based on feedback here, but, if anyone is willing to take it on, it might be a lot quicker if somebody more experienced than me in this stuff were willing to finish this.

At the very least, please feel free to make or suggest changes to this PR.

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this. I have filled in some points with my knowledge.

Comment on lines +745 to +749
| Required methods | Brief description |
|:----------------------------------------- |:----------------------------------------------------------------------------------------------- |
| `unsafe_read(io, ::Ptr{UInt8}, ::UInt)` | Copy bytes from the `IO` to a pointer. |
| `unsafe_write(io, ::Ptr{UInt8}, ::UInt)` | Copy bytes from a pointer to the `IO`. |
| `eof(io)` | Whether the IO stream is at the end. |
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Required methods | Brief description |
|:----------------------------------------- |:----------------------------------------------------------------------------------------------- |
| `unsafe_read(io, ::Ptr{UInt8}, ::UInt)` | Copy bytes from the `IO` to a pointer. |
| `unsafe_write(io, ::Ptr{UInt8}, ::UInt)` | Copy bytes from a pointer to the `IO`. |
| `eof(io)` | Whether the IO stream is at the end. |
| Required methods | Brief description |
|:----------------------------------------- |:----------------------------------------------|
| `read(io, ::Type{UInt8})` | Read a byte from the stream. |
| `write(io, ::UInt8)` | Write a byte to the stream. |
| `eof(io)` | Whether the IO stream is at the end. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait... I'm very confused, I tested with only read and write and it didn't work, that's why I have the unsafe_ methods (obviously being able to use read and write is vastly preferable. Am I missing something here? If you are convinced this should work, I can probably figure out a MWE and show you.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't get much testing currently, but your IORecorder type seems like a candidate to also run as a test (or doctest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that sounds good, but what about unsafe_read and unsafe_write? Like I said, it didnt' work for me, but all your comments suggest that it should.

Btw, if I'm right and it doesn't work, it really should, but it seems unlikely we'd be able to fix this until 2.0.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with IORecorder(stdin), and it worked

Comment on lines +751 to +759
| Optional methods | Brief description |
|:----------------------------------------- |:----------------------------------------------------------------------------------------------- |
| `read(io, ::Type{UInt8})` | Read a byte from the stream. |
| `write(io, ::UInt8)` | Write a byte to the stream. |
| `close(io)` | Close the stream. |
| `seek(io, ::Integer)` | Seek to a specific position. |
| `position(io)` | Return the current position of the stream. |
| `seekstart(io)` | Seek to beginning of stream. |
| `seekend(io)` | Seek to the end of the stream. |
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Optional methods | Brief description |
|:----------------------------------------- |:----------------------------------------------------------------------------------------------- |
| `read(io, ::Type{UInt8})` | Read a byte from the stream. |
| `write(io, ::UInt8)` | Write a byte to the stream. |
| `close(io)` | Close the stream. |
| `seek(io, ::Integer)` | Seek to a specific position. |
| `position(io)` | Return the current position of the stream. |
| `seekstart(io)` | Seek to beginning of stream. |
| `seekend(io)` | Seek to the end of the stream. |
| Optional methods | Brief description |
|:------------------------------|:----------------------------------------------------------|
| `unsafe_read(io, ::Ptr{UInt8}, ::UInt)` | Copy bytes from the `IO` to a pointer. |
| `unsafe_write(io, ::Ptr{UInt8}, ::UInt)` | Copy bytes from a pointer to the `IO`. |
| `close(io)` | Close the stream. |
| `skip(io, offset)` | Seek by the given offset (signed). |
| `seek(io, ::Integer)` | Seek to a specific position. |
| `position(io)` | Return the current position of the stream. |
| `seekstart(io)` | Seek to beginning of stream. |
| `seekend(io)` | Seek to the end of the stream. |
| `isopen(io)` | Whether the IO stream is usable. |
| `isreadable(io)` | Whether the IO stream supports reading. |
| `iswritable(io)` | Whether the IO stream supports writing. |
| `shutdown(io)` | Close the stream for writing. |
| `flush(io)` | Hint to write out any buffers to consumers. |
| `reseteof(io)` | Reset the EOF flag on the read side. |
| `mark(io)` | Add a mark at the current position of stream. |
| `unmark(io)` | Remove the current mark. |
| `reset(io)` | Reset stream to the current mark. |
| `ismarked(io)` | Return true if stream s is marked. |
| `lock(io)`/`unlock(io)` | Set an advisory lock on IO for print. |

Comment on lines +762 to +765
To implement an `IO` object it is required to define the low-level methods `unsafe_read`
and `unsafe_write` which enable copying of data between the `IO` and a location given by a
pointer. Additionally, `eof` is required and should return `true` if reading from the
`IO` is valid, else `false`.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To implement an `IO` object it is required to define the low-level methods `unsafe_read`
and `unsafe_write` which enable copying of data between the `IO` and a location given by a
pointer. Additionally, `eof` is required and should return `true` if reading from the
`IO` is valid, else `false`.
To implement an `IO` object it is required to define the low-level methods for
byte `read` and `write` which enable reading and writing from the `IO`.
Additionally, `eof` is required and should return `true` if a `read` from the
`IO` will return at least one byte, else `false` if it will not return any more bytes.

Comment on lines +767 to +769
An `IO` that only defines `unsafe_read`, `unsafe_write` and `eof` can have most `isbits`
types written or read to or from it, but the `read(io, ::Type{UInt8})` and `write(io,
::UInt8)` methods must be defined to enable reading and writing individual bytes.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An `IO` that only defines `unsafe_read`, `unsafe_write` and `eof` can have most `isbits`
types written or read to or from it, but the `read(io, ::Type{UInt8})` and `write(io,
::UInt8)` methods must be defined to enable reading and writing individual bytes.
For high-performance, most `IO` objects need to implement `unsafe_read` and
`unsafe_write` also. These methods take a pointer and a number of bytes, and
will use an unsafe copy to move data from the input to the output.

Comment on lines +771 to +772
Methods which affect the mutable state of the `IO` stream such as `close` and `seek` are
optional.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Methods which affect the mutable state of the `IO` stream such as `close` and `seek` are
optional.
Methods which affect or query the mutable state of the `IO` stream such as
`close` and `seek` are optional.

Comment on lines +791 to +797
function Base.unsafe_read(io::IORecorder, p::Ptr{UInt8}, n::UInt)
for i ∈ 1:n
x = read(io, UInt8)
unsafe_store!(p, x, i)
end
nothing
end
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is necessary

Suggested change
function Base.unsafe_read(io::IORecorder, p::Ptr{UInt8}, n::UInt)
for i ∈ 1:n
x = read(io, UInt8)
unsafe_store!(p, x, i)
end
nothing
end

@mkitti
Copy link
Contributor

mkitti commented Oct 27, 2023

Should this pull request be revived?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 30, 2023

Sure. Would you be willing to do so?

@IanButterworth
Copy link
Sponsor Member

Just cross-referencing @mkitti 's discourse post on this https://discourse.julialang.org/t/what-is-the-io-interface/107350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation domain:io Involving the I/O subsystem: libuv, read, write, etc. status:forget me not PRs that one wants to make sure aren't forgotten
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants