Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in threadpool shrink code. #204

Merged
merged 5 commits into from Sep 26, 2019
Merged

Fix race condition in threadpool shrink code. #204

merged 5 commits into from Sep 26, 2019

Conversation

the-allanc
Copy link
Contributor

@the-allanc the-allanc commented May 22, 2019

❓ What kind of change does this PR introduce?

  • 🐞 bug fix
  • 🐣 feature
  • πŸ“‹ docs update
  • πŸ“‹ tests/coverage improvement
  • πŸ“‹ refactoring
  • πŸ’₯ other

❓ What is the current behavior? (You can also link to an open issue here)

This was previously reported as a CherryPy bug.

πŸ“‹ Other information:

There was a previously submitted fix in the original ticket. I've transplanted the idea behind it into cheroot, and changed the counter to a deque object instead (as I suspect modifying a counter on an object is not an atomic operation).

πŸ“‹ Checklist:

  • I think the code is well written
  • I wrote good commit messages
  • I have squashed related commits together after the changes have been approved
  • Unit tests for the changes exist
  • Integration tests for the changes exist (if applicable)
  • I used the same coding conventions as the rest of the project
  • The new code doesn't generate linter offenses
  • Documentation reflects the changes
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences

This change is Reviewable

@the-allanc
Copy link
Contributor Author

the-allanc commented May 22, 2019

To test this, I've been using a basic CherryPy app (with cherrypy_dynpool installed):

import cherrypy
import cherrypy_dynpool
import time

class HelloWorld(object):
    @cherrypy.expose
    def index(self):
        time.sleep(0.5)
        return "Yo World!"


def main():
    cherrypy.engine.threadpool_monitor = cherrypy_dynpool.ThreadPoolMonitor(cherrypy.engine)
    cherrypy.engine.threadpool_monitor.subscribe()

    cherrypy.config.update({
        'server.thread_pool': 5,
        'server.thread_pool_max': 600,
        'server.thread_pool_minspare': 2,
        'server.thread_pool_maxspare': 40,
        'server.thread_pool_frequency': 4,
        'server.thread_pool_shrink_frequency': 10,
        'server.thread_pool_log_frequency': 3,
    })

    app = cherrypy.tree.mount(HelloWorld(), '/', config={})
    cherrypy.engine.start()
    cherrypy.engine.threadpool_monitor.configure(
        thread_pool=cherrypy.server.httpserver.requests,
        logger=cherrypy.engine.log
    )
    cherrypy.engine.block()


if __name__ == '__main__':
    main()

I then use ApacheBench to fire multiple requests at the server: ab -n 6000 -c 300 http://localhost:8080/

On my machine, both versions of cheroot (before and after my changes) fail to fulfill all 6000 requests (they normally manage around 5800 requests). But in the code before my changes, the logs look like this at the end:

[22/May/2019:22:53:34] ENGINE Thread pool: [current=64/idle=64/queue=0]
[22/May/2019:22:53:41] ENGINE Thread pool: [current=64/idle=64/queue=0] Shrinking by 59
[22/May/2019:22:53:43] ENGINE Thread pool: [current=33/idle=33/queue=23]
[22/May/2019:22:55:10] ENGINE Thread pool: [current=33/idle=33/queue=112]

This is the result of ApacheBench finishing running, and then cheroot being in a state where threads are apparently idle, but they are no longer handling any requests. The last line is me attempting to re-run ApacheBench against the same server, but no requests are being processed at all.

Repeating that against cheroot with my changes in place:

[22/May/2019:22:59:31] ENGINE Thread pool: [current=41/idle=41/queue=0] Shrinking by 36
[22/May/2019:22:59:32] ENGINE Thread pool: [current=41/idle=5/queue=0]
[22/May/2019:22:59:41] ENGINE Thread pool: [current=41/idle=5/queue=0] Shrinking by 2
[22/May/2019:22:59:41] ENGINE Thread pool: [current=21/idle=5/queue=0]
[22/May/2019:22:59:51] ENGINE Thread pool: [current=21/idle=5/queue=0] Shrinking by 2
[22/May/2019:22:59:53] ENGINE Thread pool: [current=12/idle=5/queue=0]
[22/May/2019:23:00:01] ENGINE Thread pool: [current=12/idle=5/queue=0] Shrinking by 2
[22/May/2019:23:00:02] ENGINE Thread pool: [current=7/idle=5/queue=0]
[22/May/2019:23:00:11] ENGINE Thread pool: [current=7/idle=5/queue=0] Shrinking by 2
[22/May/2019:23:00:11] ENGINE Thread pool: [current=5/idle=5/queue=0]

The thread pool successfully shrinks down to the desired size, and you can re-run ApacheBench against the server too. Note that there's nothing stuck in the queue, either.

@@ -223,6 +234,7 @@ def shrink(self, amount):
# to remove. As each request is processed by a worker, that worker
# will terminate and be culled from the list.
for n in range(n_to_remove):
self._pending_shutdowns.append(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably wanted to use the constant here:

Suggested change
self._pending_shutdowns.append(None)
self._pending_shutdowns.append(_SHUTDOWNREQUEST)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the object we store in _pending_shutdowns is irrelevant. That constant is important in the requests queue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have another named constant for this? I don't like it being a magic value with no meaning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-allanc WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a magic value, it's a meaningless value. Naming a constant for it and then not referring to that named constant anywhere else would be more confusing.

I needed something to act as a thread-safe counter, and using a deque seemed to be the most straight-forward option.

@webknjaz
Copy link
Member

@the-allanc plz sync this branch with master: i've fixed linter errors there

@webknjaz
Copy link
Member

I've manually triggered canceled stages in travis

@codecov
Copy link

codecov bot commented Jun 25, 2019

Codecov Report

Merging #204 into master will decrease coverage by 2.06%.
The diff coverage is 22.22%.

@@            Coverage Diff             @@
##           master     #204      +/-   ##
==========================================
- Coverage   74.91%   72.84%   -2.07%     
==========================================
  Files          23       23              
  Lines        3580     4346     +766     
==========================================
+ Hits         2682     3166     +484     
- Misses        898     1180     +282

@the-allanc
Copy link
Contributor Author

the-allanc commented Jun 25, 2019

@the-allanc plz sync this branch with master: i've fixed linter errors there

Done.

@webknjaz
Copy link
Member

Apparently, this PR somehow breaks tests under Python 2.7 under Windows.

@webknjaz
Copy link
Member

@the-allanc In general, I have no objections except we need AppVeyor CI to be green first.

@webknjaz webknjaz requested a review from jaraco June 25, 2019 18:23
@webknjaz webknjaz added bug Something is broken enhancement Improvement labels Jun 25, 2019
@jaraco
Copy link
Member

jaraco commented Sep 2, 2019

This looks good to me. I don't know when I'll have time to give it proper attention, so happy to move forward once the tests are stable/addressed.

Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-Authored-By: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
# Remove any dead threads from our list
for t in [t for t in self._threads if not t.is_alive()]:
self._threads.remove(t)
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a perfect case for contextlib usage

Suggested change
try:
with contextlib.suppress(IndexError):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me.

@webknjaz
Copy link
Member

I've fixed AppVeyor on master so it's probably worth rebasing this PR.

@jaraco jaraco merged commit a6f716a into cherrypy:master Sep 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken enhancement Improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants