Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cheroot==8.1.0 regression] Significant performance drop #305

Closed
1 of 3 tasks
mar10 opened this issue Jul 23, 2020 · 11 comments Β· Fixed by #308 or #311
Closed
1 of 3 tasks

[cheroot==8.1.0 regression] Significant performance drop #305

mar10 opened this issue Jul 23, 2020 · 11 comments Β· Fixed by #308 or #311
Labels
bug Something is broken regression Something that worked earlier got broken in new release reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR

Comments

@mar10
Copy link
Contributor

mar10 commented Jul 23, 2020

❓ I'm submitting a ...

  • 🐞 bug report
  • 🐣 feature request
  • ❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior?

This more a note than a bug report:
I run some benchmark against my project (WsgiDAV] and found that there is a drastical drop in performance between cheroot versions 8.0.0 and 8.1.0.

The test uses a homegrown tool (stressor) that runs a test script (see here):

  • 10 users in parallel
    • GET a common file
    • PUT another file (one per user)
    • GET that file
  • Running for 30 seconds

Benchmarks show 11k activities per 30 seconds until v8.0.
Starting with cheroot 8.1 this drops to 2.9k activities per 30 sec.

I also tried with gevent as backend, and it delivered 12.4k activities.

Please note that these results may be due to poor implementation on my tools, but all I changed was WSGI server behind WsgiDAV, so I thought I let you know.

πŸ’‘ To Reproduce

See https://github.com/mar10/wsgidav/blob/master/tests/stressor/results.md

If you are interested, I can provide more details.

Cheers
Martin

@mar10 mar10 added bug Something is broken triage labels Jul 23, 2020
@webknjaz
Copy link
Member

Hey @mar10, thanks for the heads up! v8.1.0 introduced a number of functional regressions that are got some fixes in v8.3.0 and v8.4.0 so please also test with these. It was a result of refactoring the connection management to decouple keep-alive TCP connections from HTTP requests. So I'd say that you should also check whether your tool reuses the connections or creates new ones of each HTTP request.

@webknjaz webknjaz added regression Something that worked earlier got broken in new release reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR labels Jul 24, 2020
@mar10
Copy link
Contributor Author

mar10 commented Jul 25, 2020

Hi Sviatoslav,

I already tested with 8.3.1 and 8.4.0 with same results.

stressor uses the requests library with sessions, which should support keep-alive, but I can double check.

However, even if it did not reuse connections, would the refactoring explain that cheroot now slows down to 25% (for my specific benchmark)?
I did not change anything in the scenario (test-tool, WSGI app), except for swapping out the WSGI servers.

@webknjaz
Copy link
Member

Hm... I think you use the same session for all the requests meaning that they all are processed sequentially because they come one after another over the wire in the same TCP connection. HTTP/1.1 doesn't have built-in parallelism, unlike HTTP/2 that uses streams within the same connection.

Try checking that it does use just one connection (using tcpdump/wireshark, I guess). Also try make stressor creat a number of sessions and load-balance the queries. Plus try stress-testing with some other tool like ab/wrk2/locust.

@mar10
Copy link
Contributor Author

mar10 commented Jul 25, 2020

Stressor spawns one thread for every virtual user.
Each 'user' is holding its own requests/urllib3 session, so every user should re-use its connection.
,
I modified the test. This time a sequence is only one GET request to a single static file:

Cheroot 8.0

Executed 10,163 activities in 10,163 sequences, using 1 parallel sessions. => 338.8 requests per sec/user
Executed 11,584 activities in 11,584 sequences, using 2 parallel sessions. => 193.1 requests per sec/user
Executed 14,545 activities in 14,545 sequences, using 10 parallel sessions. => 48.5 requests per sec/user
=> 0.003 .. 0.02 seconds per request

Cheroot 8.4

Executed 292 activities in 292 sequences, using 1 parallel sessions. => 9.7 requests per sec/user
Executed 582 activities in 582 sequences, using 2 parallel sessions. => 9.7 requests per sec/user
Executed 2,943 activities in 2,933 sequences, using 10 parallel sessions. => 9.8 requests per sec/user
=> 0.1 seconds per request

It may sound silly, but it looks like someone introduced a sleep(0.1) in v8.1+ ;-)

@webknjaz
Copy link
Member

Weird. This clearly needs profiling. What are the thread num settings in your setup?

@webknjaz
Copy link
Member

@mar10 mind record a flame graph using https://github.com/benfred/py-spy?

@mar10
Copy link
Contributor Author

mar10 commented Jul 26, 2020

numthreads is set to 50.

py-spy looks interesting, I will try it out if I find time.

However, I searched for "0.1" and found that connections.py contains this code:

        try:
            for fno in socket_dict:
                self._selector.register(fno, selectors.EVENT_READ)
            rlist = [
                key.fd for key, _event
                in self._selector.select(timeout=0.1)
            ]
        except OSError:
          ...

I patched the timeout value and ran the benchmarks again

timeout=0.1 (original)
Executed 2,907 activities in 2,907 sequences, using 10 parallel sessions.

timeout=0.05
Executed 5,766 activities in 5,766 sequences, using 10 parallel sessions.

timeout=0.02
Executed 12,199 activities in 12,199 sequences, using 10 parallel sessions.

timeout=0.01
Executed 13,933 activities in 13,933 sequences, using 10 parallel sessions.

Smaller values don't seem to improve the performance.

@webknjaz webknjaz changed the title Performance drop with v8.1? [cheroot==8.1.0 regression] Significant performance drop Jul 26, 2020
webknjaz added a commit that referenced this issue Jul 27, 2020
This change addresses a significant performance regression that was
introduced in v8.1.0 (see #305), sort-term.

It makes the sockets select timeout shorter to reduce the duration of
each tick in the new connection processing loop.

The proper fix is yet to be created per
#308 (comment)
@webknjaz
Copy link
Member

Extracting what @liamstask posted in the PR:

it looks to me that this could be related to the fact that, currently in server.tick(), only one request can be added per tick even if several are available to be processed in ConnectionManager.connections.

in this case, since ConnectionManager does not register connections previously marked as read-ready with the selector, it's possible that none of the remaining connections in the selector are active, meaning the server is forced to wait for the full timeout duration each time tick() is called.

fundamentally, the value of this timeout should not really be particularly sensitive, since the selector should wake up once connections are ready to be processed, regardless of the timeout.

(Originally posted by @liamstask in #308 (comment))

@webknjaz webknjaz linked a pull request Jul 27, 2020 that will close this issue
15 tasks
@mar10
Copy link
Contributor Author

mar10 commented Aug 1, 2020

8.4.2 fixed it for me, thanks

8.4.1:
Run time 0:00:30.109496, net: 4:59.91 min.
Executed 2,920 activities in 2,920 sequences, using 10 parallel sessions.

8.4.2:
Run time 0:00:30.099942, net: 4:57.40 min.
Executed 14,131 activities in 14,131 sequences, using 10 parallel sessions.

@mar10 mar10 closed this as completed Aug 1, 2020
@webknjaz
Copy link
Member

@liamstask does #309 fix this?

@liamstask
Copy link
Contributor

#309 does not, but #311 should improve this, though I could use some assistance with performance testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken regression Something that worked earlier got broken in new release reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR
Projects
None yet
3 participants