TCP comms could be faster #1477

pitrou · 2017-10-18T18:34:15Z

This issue is the distributed-specific side of tornadoweb/tornado#2147. We are using IOStream for the framed Comm protocol, which incurs additional copies.

A benchmark script is available at https://gist.github.com/pitrou/245e24de52dec34e03cfc4148c001466

pitrou · 2017-10-19T11:23:13Z

Some results for the above benchmark:

with Tornado master: 370 MB/s (same for raw IOStream)
with [WIP] Fix #2147: mostly zero-copy writes and reads in IOStream tornadoweb/tornado#2166: 680 MB/s (same for raw IOStream)
with experimental limited IOStream: more than 900 MB/s (raw iOStream is a few % faster)

All this on Ubuntu 16.04 with Python 3.6.

TomAugspurger · 2019-01-24T21:22:58Z

Was this issue essentially fixed by the use of read_into, or are there additional steps we should take?

distributed/distributed/comm/tcp.py

Lines 190 to 193 in 90758dc

    
           if PY3 and self._iostream_has_read_into: 
        
               frame = bytearray(length) 
        
               n = yield stream.read_into(frame) 
        
               assert n == length, (n, length)

pitrou · 2019-01-24T21:24:26Z

Both read_into and the write improvements. It should probably be fixed (out of curiosity, did you run the benchmark script and which numbers did you get?).

TomAugspurger · 2019-01-24T21:28:49Z

633 MB/s with tornado 5.1.1

I'm working on UCX-backed comms (#2344) and will post the performance results once I get things working.

pitrou · 2019-01-24T21:29:56Z

Which CPU and system, out of curiosity?

TomAugspurger · 2019-01-24T21:34:18Z

That's on an Nvidia DGX system, and I believe the CPU is

model name      : Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz

pitrou · 2019-01-24T21:35:32Z

I see, thanks. My benchmark numbers above had been obtained with a Core i5-2500K.

jakirkham · 2021-02-01T19:52:24Z

Should we close this issue? Sounds like this was already addressed (though please correct me if not)

GenevieveBuckley · 2021-10-18T02:00:34Z

Was this issue essentially fixed by the use of read_into, or are there additional steps we should take?

distributed/distributed/comm/tcp.py

Lines 190 to 193 in 90758dc

if PY3 and self._iostream_has_read_into:

frame = bytearray(length)

n = yield stream.read_into(frame)

assert n == length, (n, length)

@pitrou did this sufficiently fix your issues?

jakirkham · 2021-10-19T16:38:01Z

Yeah I think this was fixed a while ago. Let’s go ahead and close this. There are already newer issues on improving communication. We can follow up on additional tasks in those issues or in a new issue

TomAugspurger mentioned this issue Jan 24, 2019

Basic echo perf test. rapidsai/ucx-py#19

Closed

GenevieveBuckley added performance needs info Needs further information from the user labels Oct 18, 2021

jakirkham closed this as completed Oct 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCP comms could be faster #1477

TCP comms could be faster #1477

pitrou commented Oct 18, 2017 •

edited

pitrou commented Oct 19, 2017 •

edited

TomAugspurger commented Jan 24, 2019

pitrou commented Jan 24, 2019

TomAugspurger commented Jan 24, 2019

pitrou commented Jan 24, 2019

TomAugspurger commented Jan 24, 2019

pitrou commented Jan 24, 2019

jakirkham commented Feb 1, 2021

GenevieveBuckley commented Oct 18, 2021

jakirkham commented Oct 19, 2021

TCP comms could be faster #1477

TCP comms could be faster #1477

Comments

pitrou commented Oct 18, 2017 • edited

pitrou commented Oct 19, 2017 • edited

TomAugspurger commented Jan 24, 2019

pitrou commented Jan 24, 2019

TomAugspurger commented Jan 24, 2019

pitrou commented Jan 24, 2019

TomAugspurger commented Jan 24, 2019

pitrou commented Jan 24, 2019

jakirkham commented Feb 1, 2021

GenevieveBuckley commented Oct 18, 2021

jakirkham commented Oct 19, 2021

pitrou commented Oct 18, 2017 •

edited

pitrou commented Oct 19, 2017 •

edited