[cheroot==8.1.0 regression] Significant performance drop #305

mar10 · 2020-07-23T16:50:23Z

❓ I'm submitting a ...

🐞 bug report
🐣 feature request
❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior?

This more a note than a bug report:
I run some benchmark against my project (WsgiDAV] and found that there is a drastical drop in performance between cheroot versions 8.0.0 and 8.1.0.

The test uses a homegrown tool (stressor) that runs a test script (see here):

10 users in parallel
- GET a common file
- PUT another file (one per user)
- GET that file
Running for 30 seconds

Benchmarks show 11k activities per 30 seconds until v8.0.
Starting with cheroot 8.1 this drops to 2.9k activities per 30 sec.

I also tried with gevent as backend, and it delivered 12.4k activities.

Please note that these results may be due to poor implementation on my tools, but all I changed was WSGI server behind WsgiDAV, so I thought I let you know.

💡 To Reproduce

See https://github.com/mar10/wsgidav/blob/master/tests/stressor/results.md

If you are interested, I can provide more details.

Cheers
Martin

webknjaz · 2020-07-24T22:50:34Z

Hey @mar10, thanks for the heads up! v8.1.0 introduced a number of functional regressions that are got some fixes in v8.3.0 and v8.4.0 so please also test with these. It was a result of refactoring the connection management to decouple keep-alive TCP connections from HTTP requests. So I'd say that you should also check whether your tool reuses the connections or creates new ones of each HTTP request.

mar10 · 2020-07-25T09:21:21Z

Hi Sviatoslav,

I already tested with 8.3.1 and 8.4.0 with same results.

stressor uses the requests library with sessions, which should support keep-alive, but I can double check.

However, even if it did not reuse connections, would the refactoring explain that cheroot now slows down to 25% (for my specific benchmark)?
I did not change anything in the scenario (test-tool, WSGI app), except for swapping out the WSGI servers.

webknjaz · 2020-07-25T19:42:05Z

Hm... I think you use the same session for all the requests meaning that they all are processed sequentially because they come one after another over the wire in the same TCP connection. HTTP/1.1 doesn't have built-in parallelism, unlike HTTP/2 that uses streams within the same connection.

Try checking that it does use just one connection (using tcpdump/wireshark, I guess). Also try make stressor creat a number of sessions and load-balance the queries. Plus try stress-testing with some other tool like ab/wrk2/locust.

mar10 · 2020-07-25T21:00:09Z

Stressor spawns one thread for every virtual user.
Each 'user' is holding its own requests/urllib3 session, so every user should re-use its connection.
,
I modified the test. This time a sequence is only one GET request to a single static file:

Cheroot 8.0

Executed 10,163 activities in 10,163 sequences, using 1 parallel sessions. => 338.8 requests per sec/user
Executed 11,584 activities in 11,584 sequences, using 2 parallel sessions. => 193.1 requests per sec/user
Executed 14,545 activities in 14,545 sequences, using 10 parallel sessions. => 48.5 requests per sec/user
=> 0.003 .. 0.02 seconds per request

Cheroot 8.4

Executed 292 activities in 292 sequences, using 1 parallel sessions. => 9.7 requests per sec/user
Executed 582 activities in 582 sequences, using 2 parallel sessions. => 9.7 requests per sec/user
Executed 2,943 activities in 2,933 sequences, using 10 parallel sessions. => 9.8 requests per sec/user
=> 0.1 seconds per request

It may sound silly, but it looks like someone introduced a sleep(0.1) in v8.1+ ;-)

webknjaz · 2020-07-25T21:39:40Z

Weird. This clearly needs profiling. What are the thread num settings in your setup?

webknjaz · 2020-07-25T21:42:43Z

@mar10 mind record a flame graph using https://github.com/benfred/py-spy?

mar10 · 2020-07-26T13:01:08Z

numthreads is set to 50.

py-spy looks interesting, I will try it out if I find time.

However, I searched for "0.1" and found that connections.py contains this code:

        try:
            for fno in socket_dict:
                self._selector.register(fno, selectors.EVENT_READ)
            rlist = [
                key.fd for key, _event
                in self._selector.select(timeout=0.1)
            ]
        except OSError:
          ...

I patched the timeout value and ran the benchmarks again

timeout=0.1 (original)
Executed 2,907 activities in 2,907 sequences, using 10 parallel sessions.

timeout=0.05
Executed 5,766 activities in 5,766 sequences, using 10 parallel sessions.

timeout=0.02
Executed 12,199 activities in 12,199 sequences, using 10 parallel sessions.

timeout=0.01
Executed 13,933 activities in 13,933 sequences, using 10 parallel sessions.

Smaller values don't seem to improve the performance.

This change addresses a significant performance regression that was introduced in v8.1.0 (see #305), sort-term. It makes the sockets select timeout shorter to reduce the duration of each tick in the new connection processing loop. The proper fix is yet to be created per #308 (comment)

webknjaz · 2020-07-27T22:39:41Z

Extracting what @liamstask posted in the PR:

it looks to me that this could be related to the fact that, currently in server.tick(), only one request can be added per tick even if several are available to be processed in ConnectionManager.connections.

in this case, since ConnectionManager does not register connections previously marked as read-ready with the selector, it's possible that none of the remaining connections in the selector are active, meaning the server is forced to wait for the full timeout duration each time tick() is called.

fundamentally, the value of this timeout should not really be particularly sensitive, since the selector should wake up once connections are ready to be processed, regardless of the timeout.

(Originally posted by @liamstask in #308 (comment))

mar10 · 2020-08-01T05:33:38Z

8.4.2 fixed it for me, thanks

8.4.1:
Run time 0:00:30.109496, net: 4:59.91 min.
Executed 2,920 activities in 2,920 sequences, using 10 parallel sessions.

8.4.2:
Run time 0:00:30.099942, net: 4:57.40 min.
Executed 14,131 activities in 14,131 sequences, using 10 parallel sessions.

webknjaz · 2020-08-12T23:16:58Z

@liamstask does #309 fix this?

liamstask · 2020-08-13T06:11:04Z

#309 does not, but #311 should improve this, though I could use some assistance with performance testing

mar10 added bug Something is broken triage labels Jul 23, 2020

triage-new-issues bot removed the triage label Jul 23, 2020

webknjaz added regression Something that worked earlier got broken in new release reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR labels Jul 24, 2020

mar10 mentioned this issue Jul 26, 2020

Fix a significant performance regression introduced in v8.1.0 #308

Merged

15 tasks

webknjaz changed the title ~~Performance drop with v8.1?~~ [cheroot==8.1.0 regression] Significant performance drop Jul 26, 2020

webknjaz linked a pull request Jul 27, 2020 that will close this issue

Fix a significant performance regression introduced in v8.1.0 #308

Merged

15 tasks

mar10 closed this as completed Aug 1, 2020

liamstask mentioned this issue Aug 13, 2020

server: process connections as they become active, rather than on tick #311

Merged

15 tasks

liamstask mentioned this issue Sep 2, 2020

connections: put conns safely #319

Closed

15 tasks

liamstask mentioned this issue Dec 20, 2020

connections: expiration_interval as select timeout #352

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cheroot==8.1.0 regression] Significant performance drop #305

[cheroot==8.1.0 regression] Significant performance drop #305

mar10 commented Jul 23, 2020 •

edited

webknjaz commented Jul 24, 2020

mar10 commented Jul 25, 2020 •

edited

webknjaz commented Jul 25, 2020

mar10 commented Jul 25, 2020 •

edited

webknjaz commented Jul 25, 2020

webknjaz commented Jul 25, 2020

mar10 commented Jul 26, 2020

webknjaz commented Jul 27, 2020

mar10 commented Aug 1, 2020

webknjaz commented Aug 12, 2020

liamstask commented Aug 13, 2020

[cheroot==8.1.0 regression] Significant performance drop #305

[cheroot==8.1.0 regression] Significant performance drop #305

Comments

mar10 commented Jul 23, 2020 • edited

webknjaz commented Jul 24, 2020

mar10 commented Jul 25, 2020 • edited

webknjaz commented Jul 25, 2020

mar10 commented Jul 25, 2020 • edited

Cheroot 8.0

Cheroot 8.4

webknjaz commented Jul 25, 2020

webknjaz commented Jul 25, 2020

mar10 commented Jul 26, 2020

webknjaz commented Jul 27, 2020

mar10 commented Aug 1, 2020

webknjaz commented Aug 12, 2020

liamstask commented Aug 13, 2020

mar10 commented Jul 23, 2020 •

edited

mar10 commented Jul 25, 2020 •

edited

mar10 commented Jul 25, 2020 •

edited