Major socket io performance regression in 0.4 and 0.5-dev #14467

ViralBShah · 2015-12-22T13:21:08Z

The following basic socket io code shows a major performance regression in 0.4 and 0.5-dev compared to 0.3.

using Compat
const n=16*10^6
function server()
    s=listen( 8000)
    @async while true
        a=accept(s)
        @async while true
            try
                b = readbytes(a, n)
                println(length(b))
                write(a, b)
            catch e
                println("client closed connection")
                break
            end
        end
    end
end

function client(c)
    arr=Array(UInt8, n);
    @time begin
        write(c, arr)
        b=readbytes(c, n)
        println(length(b))
    end
end

server()

const c=connect("localhost", 8000)
client(c)

On 0.3.12: 1 second
On 0.4.2: 10 seconds
On 0.5-dev: 12 seconds

The text was updated successfully, but these errors were encountered:

ViralBShah · 2015-12-22T13:22:58Z

This was discovered by @amitmurthy and @shashi while doing some parallel benchmarking and finding poor speedup.

amitmurthy · 2015-12-22T16:42:50Z

A few days back I had run a simple benchmark comparing C vs 0MQ(C API) vs Julia TCPSocket vs Julia remotecall_fetch. The test was to basically echo back 1K of data in a loop.

The results were:

C code (10^6 request-responses of 1K each)
------
8.95035 seconds
9.11186 seconds

0MQ C code (10^6 request-responses of 1K each)
----------
36.91122 seconds
36.92812 seconds

Julia TCP sockets 10^4 requests-responses of 1K each
-----------------
7.927375 seconds (260.33 k allocations: 28.702 MB, 0.30% gc time)
7.583553 seconds (260.33 k allocations: 28.702 MB, 0.32% gc time)

Julia remotecall_fetch 10^5 requests-responses of 1K each
-----------------
elapsed time: 8.995498505 seconds
elapsed time: 9.335054655 seconds

code here - https://gist.github.com/amitmurthy/945843c9f0c61bc7d9e0

The loop counts are 10^6 for C and 0MQ, 10^5 for remotecall_fetch and 10^4 for TCPSocket.

There is obviously something wrong with the TCPSocket test code, it being slower than remotecall_fetch, but I couldn't locate any obvious mistake. C code is 10 times faster than remotecall_fetch, while 0MQ is thrice as fast.

JeffBezanson · 2015-12-22T22:09:21Z

Maybe related to #14106

amitmurthy · 2015-12-23T06:35:13Z

Do we have all nightly builds still available somewhere? It may be faster to spin up a AWS instance that will bisect against the nightlies rather than git bisect.

amitmurthy · 2015-12-23T09:53:06Z

git bisect throws up

5863b486435b86d9db25973a684ef4428f563a73 is the first bad commit
commit 5863b486435b86d9db25973a684ef4428f563a73
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Thu Jun 4 10:15:06 2015 -0400

    make start_reading/stop_reading automatic exactly when needed. fix #1925, fix #10655 (closes #11530)

    this also makes the read throttle more intelligent so that it doesn't
    get tripped up by wait_nb requests for more than READ_BUFFER_SZ bytes

:040000 040000 7bda1043b8ea05590ad2008e0a827dbfc83746ac a1070338b67f8f88ddad6ac16352852f8f1436f3 M  base
:040000 040000 a1d20ff70afcf6555a7ab1e24ca99b76639b657a 9d1da9cca99c713a949e02616cf805c0cbebedab M  test

commit - 5863b48

The test code was

function serve(sock)
  Base.disable_nagle(sock)
  while true
    try
        a=readbytes(sock, 1024)
        write(sock,a)
    catch e
        println("client closed connection")
        break
    end
  end
end

function srvr()
  const srv = listen(8502)
  while true
    sock = accept(srv)
    @async serve(sock)
  end
end

@schedule srvr()

const n = 10^4
function client()
  sock = connect("localhost", 8502) 
  Base.disable_nagle(sock)
  @time for i in 1:n
    write(sock,UInt8[0 for x in 1:1024])
    readbytes(sock, 1024)
  end
end

client()
client()

Prior to this commit 10^4 1K request-responses take approx 1.0 second. Post this commit, it is 7.0 seconds.

cc: @vtjnash

amitmurthy · 2016-01-11T05:00:36Z

Bump @vtjnash

Running dtruss for the above test code. The number of kevent system calls jumped from 3000 to 12500. The number of read system calls jumped from 1100 to 3500 when we compare the problematic patch with its immediate prior.

amitmurthy · 2016-01-13T15:20:48Z

Having spent quite a few hours on this, the root cause for the slowdown is still unidentified. This branch - https://github.com/JuliaLang/julia/tree/amitm/fixregr - reverts the change. It breaks tests in other ways but the performance regression is vastly improved down to 1.2x (from the current 7x).

Approximately 1.5 seconds in the slowdown is due to the try-catch-finally blocks introduced.

fixes #14467 performance regression (and some)

vtjnash · 2016-01-14T03:05:46Z

the benchmarks show very different improvements, but the one from the OP was definitely the most substantial:
before:
14.564401 seconds (1.36 k allocations: 30.555 MB, 0.05% gc time)
after:
0.066761 seconds (1.22 k allocations: 30.545 MB, 60.54% gc time)
(over 200x faster :o)

amitmurthy · 2016-01-14T04:20:50Z

Awesome!

tkelman · 2016-01-14T04:50:37Z

moving backport pending label to the PR #14667

fixes #14467 performance regression (and some) (cherry picked from commit 9dbc3ab) ref #14667

ViralBShah added performance Must go faster kind:regression Regression in behavior compared to a previous version labels Dec 22, 2015

ViralBShah modified the milestones: 0.5.0, 0.4.x Dec 22, 2015

jrevels added the kind:potential benchmark Could make a good benchmark in BaseBenchmarks label Dec 23, 2015

vtjnash added a commit that referenced this issue Jan 13, 2016

add fast-path optimizations for reading LibuvStreams

9dbc3ab

fixes #14467 performance regression (and some)

vtjnash mentioned this issue Jan 13, 2016

add fast-path optimizations for reading LibuvStreams #14667

Merged

vtjnash closed this as completed in #14667 Jan 14, 2016

vtjnash added the backport pending 0.4 label Jan 14, 2016

tkelman removed the backport pending 0.4 label Jan 14, 2016

tkelman pushed a commit that referenced this issue Mar 7, 2016

add fast-path optimizations for reading LibuvStreams

131a2df

fixes #14467 performance regression (and some) (cherry picked from commit 9dbc3ab) ref #14667

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major socket io performance regression in 0.4 and 0.5-dev #14467

Major socket io performance regression in 0.4 and 0.5-dev #14467

ViralBShah commented Dec 22, 2015

ViralBShah commented Dec 22, 2015

amitmurthy commented Dec 22, 2015

JeffBezanson commented Dec 22, 2015

amitmurthy commented Dec 23, 2015

amitmurthy commented Dec 23, 2015

amitmurthy commented Jan 11, 2016

amitmurthy commented Jan 13, 2016

vtjnash commented Jan 14, 2016

amitmurthy commented Jan 14, 2016

tkelman commented Jan 14, 2016

Major socket io performance regression in 0.4 and 0.5-dev #14467

Major socket io performance regression in 0.4 and 0.5-dev #14467

Comments

ViralBShah commented Dec 22, 2015

ViralBShah commented Dec 22, 2015

amitmurthy commented Dec 22, 2015

JeffBezanson commented Dec 22, 2015

amitmurthy commented Dec 23, 2015

amitmurthy commented Dec 23, 2015

amitmurthy commented Jan 11, 2016

amitmurthy commented Jan 13, 2016

vtjnash commented Jan 14, 2016

amitmurthy commented Jan 14, 2016

tkelman commented Jan 14, 2016