Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discontinue readavailable() #7478

Closed
wants to merge 1 commit into from
Closed

discontinue readavailable() #7478

wants to merge 1 commit into from

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Jul 1, 2014

This is really hard function to use correctly, since the only correct usage of this function is in a stream reader:

buf = Uint8[]
while !eof(s)
  bytes = readavailable(s)
  append!(buf, s)
  if enough(buf)
     f(buf)
     empty!(buf)
  end
end
f(buf)

However, essentially all of the code I see posted to blogs and mailing lists to use this function are incorrectly defined, and subject to race conditions and the badly-defined behavior of this function (that and probably will stop working as soon as you try to bundle it into your application and output a decent amount of information, or forget that stdout is buffered, or run into a bit about latency)

I have seen users resort to it in the following situation, when users can't understand why readall hangs:

(outRead, outWrite) = redirect_stdout()

print("Test")
print("ing")

close(outWrite)

data = readavailable(outRead)

close(outRead)

(copied from: http://thenewphalls.wordpress.com/2014/03/21/capturing-output-in-julia/)

however, the answer to their quandary is simple: outWrite is only one of the handles to the stdout. until you call redirect_stdout again, julia still has a second handle open (on fd 2) for use by c functions

related: #7022

@vtjnash vtjnash added this to the 0.3 milestone Jul 1, 2014
@ihnorton
Copy link
Member

ihnorton commented Jul 1, 2014

(@vtjnash did you intend to pull in helpdb changes?)

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 1, 2014

(yes, i'm not sure why it is so big though, apparently it wanted to insert a lot of blank lines?)

@@ -391,10 +391,10 @@ function _uv_hook_readcb(stream::AsyncStream, nread::Int, base::Ptr{Void}, len::
else
if isa(stream,TTY)
stream.status = StatusEOF
notify(stream.closenotify)
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a separate bugfix for the readall function, which needs to be committed separately if Jeff refuses to merge this pull request

@JeffBezanson
Copy link
Sponsor Member

+1
I greatly dislike readavailable; it has essentially no meaning at all.

If you have a separate bugfix, it belongs on master, ideally with a test case, not bundled with whatever PR you are working on. That is simply sound development practice, and has nothing to do with my willingness to merge PRs. Coupling the decision to fix a bug to the decision to deprecate some other function is insane. Separating them also allows others to selectively revert or run bisect more effectively.

You seem to have one of the unlucky versions of sphinx that inserts extra unwanted newlines. In that case, please don't regenerate the helpdb. Somebody else can do it.

@@ -473,3 +473,12 @@ scale!{T<:Base.LinAlg.BlasReal}(X::Array{T}, s::Complex) = error("scale!: Cannot

@deprecate which(f::Callable, args...) @which f(args...)
@deprecate rmdir rm

function readavailable(this::AsyncStream)
depwarn("readavailable() is discontinued, because users tend to mistakenly assume it will read everything that has been written. better defined alteratives include: readall(), readbytes(), nb_available(), and !eof()", :readavailable)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is a bit too passive-aggressive. Needs a rewrite.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:
"readavailable is discontinued. please choose from the alternative I/O functions, such as readall(), readbytes(), nb_available(), and !eof(), depending on your intended usage"

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 1, 2014

ideally with a test case

the fixes to the test files revealed the bug in readall, so it does test it. i'm not working from master right now, but I can repeat the fix on master if you aren't going to just merge this anyways

@JeffBezanson
Copy link
Sponsor Member

Let me repeat: whether the bug fix is a separate change is unrelated to whether I merge this PR.

@JeffBezanson
Copy link
Sponsor Member

The modified repl test passes for me on master, without the readcb change.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 1, 2014

The repl test might pass, but it is dependent upon order-of-execution. Since we want to consider EOF on TTY equivalent to close, it needs to emit the close notify event to be certain any waiting conditions get notified.

@JeffBezanson
Copy link
Sponsor Member

Isn't the notify needed for streams other than TTYs as well though?

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 1, 2014

what is there to notify? other streams don't have a EOF event, and they will eventually get a close event after the stream is closed

@JeffBezanson
Copy link
Sponsor Member

Ah yes, I see there is a try notify(uv.readnotify) end in _uv_hook_close.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 1, 2014

yes. that's a bit of a hack, because wait can't wait on multiple Conditions (hint, hint)

@astrieanna
Copy link
Contributor

Are you aware of any stream readers/parsers other than HttpParser, in Julia? I guess this is probably not a common enough task to warrant a (confusing) function in Base.

Considering that people are apparently using readavailable as a work around when readall doesn't work, is there documentation anywhere explaining why that happens? (Or are you hoping that they'll post issues or email the mailing list when they get stuck, once you remove readavailable?)

@JeffBezanson
Copy link
Sponsor Member

Does it work to read fixed-size chunks and then look for short reads to detect end of stream?

@stevengj
Copy link
Member

stevengj commented Jul 3, 2014

We use readavailable as a stream reader in IJulia. I'm not sure what @vtjnash is proposing as an alternative here.

I suppose we could block on reading one byte, and then call nb_available to see if anything more is available and do a readbytes with that many bytes? But readavailable is a lot more convenient.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 3, 2014

I would use:
while !eof(s)
readbytes(s,nb_available(s))
end

Or maybe, takebuf_array?

This formulation also avoids the busy-wait lockup if s is ever closed that exists in the current code.

@stevengj
Copy link
Member

stevengj commented Jul 3, 2014

@vtjnash, that seems like it will be a spinloop (since nb_available will mostly return zero, readbytes will return immediately), which is not acceptable. We need something that will block until there is actually something available to read (but doesn't wait for the stream to close, unlike readall).

(The stream is never closed in IJulia until the process completes, so that's not an issue.)

@JeffBezanson
Copy link
Sponsor Member

I'll repeat my question:

Does it work to read fixed-size chunks and then look for short reads to detect end of stream?

@stevengj
Copy link
Member

stevengj commented Jul 3, 2014

@JeffBezanson, no, it doesn't work to read fixed-size chunks, unless the chunk size is 1 byte (which is too small to be efficient). Say I am reading chunks of 100 bytes, and the user prints 99 bytes to stdout. They will be waiting a long time for their output to show up in IJulia, because the I/O thread will be blocked until another byte is written.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 3, 2014

eof blocks until it can determine if you have data, or eof. If you had wanted to do fixed size reads, you could combine the eof test with wait_nb

@JeffBezanson
Copy link
Sponsor Member

How do people usually do this? I'm guessing maybe readline, effectively using line buffering?

Anyway, since a couple packages seem pretty happy with this function I lean towards moving this change out of 0.3.

@astrieanna
Copy link
Contributor

HttpParser really shouldn't use line buffering. Besides potential performance problems, some requests don't have new lines at the end.

@JeffBezanson
Copy link
Sponsor Member

Yes, line buffering would only work for stdout in IJulia.

libuv works by invoking a callback and telling you how many bytes it has. readavailable fits this kind of interface. However you can get the same effect by checking !eof and using nb_available.

@JeffBezanson JeffBezanson modified the milestones: 0.4, 0.3 Jul 3, 2014
@astrieanna
Copy link
Contributor

You can use print to print things without newlines. Would that cause a problem for IJulia? (i.e. I print something with out a final new line, and am left waiting for my output until I decide to run something else?)

@JeffBezanson
Copy link
Sponsor Member

Yes, but for printing text to stdout people are somewhat used to line buffering. Anyway removing this function is not a huge advantage so I'm moving this issue to 0.4.

@stevengj
Copy link
Member

stevengj commented Jul 3, 2014

I don't think it's acceptable to delay output until a newline occurs in IJulia.

However, if eof(s) blocks until something is available to read as @vtjnash suggests, then that approach sounds like it should be fine.

Still, I tend to agree that this should be a 0.4 change.

@JeffBezanson
Copy link
Sponsor Member

eof does currently block until data is available.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jul 3, 2014

I should also probably point out that since readavailable is unicode-unaware, it may sometimes throw invalid string exceptions on valid strings, if a utf8 characters happened to overlap with the data discretization boundaries of the internal implementation.

@JeffBezanson
Copy link
Sponsor Member

It should probably return a byte array anyway.
On Jul 3, 2014 7:11 PM, "Jameson Nash" notifications@github.com wrote:

I should also probably point out that since readavailable is
unicode-unaware, it may sometimes throw invalid string exceptions on valid
strings, if a utf8 characters happened to overlap with the data
discretization boundaries of the internal implementation.


Reply to this email directly or view it on GitHub
#7478 (comment).

@JeffBezanson
Copy link
Sponsor Member

We will keep this function but it should return a byte array instead of a string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:io Involving the I/O subsystem: libuv, read, write, etc. needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants