truncate excess output per execution request #463

Merged
merged 2 commits into from Sep 21, 2016

Projects

None yet

4 participants

@tlnagy
Contributor
tlnagy commented Sep 15, 2016
@tlnagy
Contributor
tlnagy commented Sep 15, 2016

Not sure why Appveyor is failing, looks unrelated to this PR. @stevengj what do you think?

src/stdio.jl
@@ -32,20 +32,41 @@ end
const bufs = Dict{String,IOBuffer}()
const stream_interval = 0.1
const max_bytes = 10*1024
+# max output per code cell is 512 kb
+const max_output_per_request = 1 << 19
@stevengj
stevengj Sep 16, 2016 Member

This shouldn't be a constant, it should be a Ref(1 << 19) so that we can provide an API to change it in a running notebook.

@tlnagy
tlnagy Sep 17, 2016 edited Contributor

Why use Ref here? Is it faster than having a global?

@yuyichao
yuyichao Sep 17, 2016 Member

Because

so that we can provide an API to change it in a running notebook.

@tlnagy
tlnagy Sep 17, 2016 Contributor

Either way we need a nice function for changing it so it shouldn't matter too much whether it's a global or a Ref. I ended up going with the latter.

@yuyichao
yuyichao Sep 18, 2016 Member

You can't change it if it's a const.

@stevengj
stevengj Sep 18, 2016 Member

Julia knows the type of a const foo = Ref(bar); this is the advantage of Ref over a non-const global.

@tlnagy
tlnagy Sep 18, 2016 Contributor

Ah. That makes sense. I ended up using Ref anyway, but good to know.

src/stdio.jl
while !eof(rd) # blocks until something is available
nb = nb_available(rd)
+ parent_id_exists = "msg_id" in keys(execute_msg.header)
@stevengj
stevengj Sep 16, 2016 Member

I would just put the counter as a global Ref(0) and reset it to zero in the execute_request handler. Then all this logic about checking message ids can be removed, and we have a single counter for all streams.

@tlnagy
tlnagy Sep 16, 2016 Contributor

Your global counter is likely a better solution.

src/stdio.jl
+ counter += nb
+ # if this stream as surpassed the maximum output limit then ignore future bytes
+ if parent_id_exists && continuation && counter > max_output_per_request
+ (counter - nb <= max_output_per_request) && println(orig_STDOUT[], "Excessive output truncated.")
@stevengj
stevengj Sep 16, 2016 Member

it is weird to have the println as part of the conditional, especially since println returns nothing ... won't this give an error? if true && println("foo"); end gives an error TypeError: non-boolean (Void) used in boolean context

I don't think we want to print a message on every read; I think we just want one warning per cell, and I think we want to print it to the notebook. If you switch to the global counter as I suggested above, we can just do:

if stdio_counter[] > stdio_max[]
   counter[] = 0
   warn("Output > ", stdio_max[], " bytes was truncated.")
end

in when the execute completes.

@tlnagy
tlnagy Sep 16, 2016 edited Contributor

That's pretty standard Julia syntax, Stefan even recommends it here: https://stackoverflow.com/questions/22959499/julia-can-conditional-statements-evaluate-code-on-the-same-line and I see it all over the place.

The problem is that the best place to evaluate the number of bytes written is in the primary event loop. Your solution wouldn't work because it does not actually truncate future bytes which are being streamed to us. Your suggestion of resetting the counter execute_handler would likely work. My opposing conditionals ensured that we only print a warning message right when we overstep max_output_per_request, but keeps reading from the libuv/os buffer so that the bytes actually get truncated, i.e. read and discarded.

I don't write it to the notebook because I'm truncating stdout and the truncation includes my warning message too. Maybe there is a more clever way of handling this.

@stevengj
stevengj Sep 16, 2016 Member

@tlnagy, nevermind, I was reading the println as part of the same conditional as the if on the previous line. I agree that condition && println() is a standard idiom. I misread it as doing if condition && println(), however, which would have been wrong.

@stevengj
stevengj Sep 16, 2016 edited Member

Right, you only want to reset the execution counter at the beginning of the execution handler, not at the end.

However, I would still like a warning to be printed in the notebook itself, not in the terminal where the jupyter server is running.

@tlnagy
tlnagy Sep 16, 2016 Contributor

One way to get that too work is to make sure that we are only truncating in when name = "stdout" while allowing stderr to pass through. Then we could dump the error to stderr and the user should be able to see it.

@stevengj
stevengj Sep 18, 2016 Member

No, I think we should throttle both stdio streams. Either one could have runaway output.

@tlnagy
tlnagy Sep 18, 2016 Contributor

I'm not sure how to throttle stderr while allowing and a truncation warnings through. Currently, I just call warn once stdout has passed the max limit and then just read and discard future output. If I don't have the check for stdout then the warn message also gets truncated and you can't see it.

@JobJob
JobJob Sep 19, 2016 Contributor

I agree that it would be good to truncate both streams. Instead of the warn maybe you could just directly send the message to the frontend like so:

send_ipython(publish,
         msg_pub(execute_msg, "stream",
                 @compat Dict("name" => "stderr", "text" => "Excessive output truncated after $(stdout_bytes[]) bytes.")))
@tlnagy
tlnagy Sep 20, 2016 Contributor

That doesn't seem to work in my hands. The output doesn't get displayed and the kernel hangs.

@JobJob
JobJob Sep 20, 2016 Contributor

Sorry, I was looking at an old version of the code, does this work:

send_ipython(publish[],
     msg_pub(execute_msg, "stream",
            Dict("name" => "stderr", "text" => "Excessive output truncated after $(stdout_bytes[]) bytes.")))

Am just copying from here: https://github.com/JuliaLang/IJulia.jl/blob/master/src/stdio.jl#L100-L102

Btw, I'm very happy you're implementing this feature, it often causes me problems.

@tlnagy tlnagy truncate excess output per execution request
138f6a8
@tlnagy
Contributor
tlnagy commented Sep 17, 2016

I made all the suggested changes, added a function for changing the limit, and added docs to the README.

src/stdio.jl
- write(buf, read(rd, nb))
+ if name == "stdout"
+ stdout_bytes[] += nb
+ # if this stream as surpassed the maximum output limit then ignore future bytes
@JobJob
JobJob Sep 19, 2016 Contributor

typo: if this stream has not as

+
+### Preventing truncation of output
+
+The new default behavior of IJulia is to truncate stdout (via `show` or `println`)
@stevengj
stevengj Sep 19, 2016 Member

stdout and stderr

README.md
+results. This limit can be increased to a custom value, like 1MB, as follows
+
+```
+IJulia.set_max_excessive_output(1 << 20)
@stevengj
stevengj Sep 19, 2016 Member

This name is excessive. 😉 Just set_max_stdio should be enough.

src/IJulia.jl
+before getting truncated. A large value here allows a lot of output to be
+displayed in the notebook, potentially bogging down the browser.
+"""
+function set_max_excessive_output(max_output::Int)
@stevengj
stevengj Sep 19, 2016 Member

::Integer

src/IJulia.jl
+"""
+function set_max_excessive_output(max_output::Int)
+ max_output_per_request[] = max_output
+ println("Stdout will now be truncated after $max_output bytes")
@stevengj
stevengj Sep 19, 2016 Member

No println.

src/execute_request.jl
@@ -111,6 +111,9 @@ end
# global variable so that display can be done in the correct Msg context
execute_msg = Msg(["julia"], Dict("username"=>"julia", "session"=>"????"), Dict())
+# global variable tracking the number of bytes written in the current execution
+# request
+const stdout_bytes = Ref(0)
@stevengj
stevengj Sep 19, 2016 Member

stdio_bytes, since we should track both stdout and stderr.

src/stdio.jl
@@ -45,7 +51,20 @@ function watch_stream(rd::IO, name::AbstractString)
while !eof(rd) # blocks until something is available
nb = nb_available(rd)
if nb > 0
- write(buf, read(rd, nb))
+ if name == "stdout"
@stevengj
stevengj Sep 19, 2016 Member

Do this regardless of stream name, i.e. limit both stdout and stderr

src/stdio.jl
+ # if this stream as surpassed the maximum output limit then ignore future bytes
+ if stdout_bytes[] > max_output_per_request[]
+ if stdout_bytes[] - nb <= max_output_per_request[]
+ warn("Excessive output truncated after $(stdout_bytes[]) bytes.")
@stevengj
stevengj Sep 19, 2016 edited Member

See @JobJob's suggestion of just calling send_ipython directly

@tlnagy tlnagy apply to both stderr and stdout; rename accessor function
1700e23
@tlnagy
Contributor
tlnagy commented Sep 21, 2016

Thanks @JobJob and @stevengj for the review, I've made the requested changes.

@stevengj stevengj merged commit e9699c4 into JuliaLang:master Sep 21, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment