New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in threadpool shrink code. #204
Fix race condition in threadpool shrink code. #204
Conversation
To test this, I've been using a basic CherryPy app (with cherrypy_dynpool installed): import cherrypy
import cherrypy_dynpool
import time
class HelloWorld(object):
@cherrypy.expose
def index(self):
time.sleep(0.5)
return "Yo World!"
def main():
cherrypy.engine.threadpool_monitor = cherrypy_dynpool.ThreadPoolMonitor(cherrypy.engine)
cherrypy.engine.threadpool_monitor.subscribe()
cherrypy.config.update({
'server.thread_pool': 5,
'server.thread_pool_max': 600,
'server.thread_pool_minspare': 2,
'server.thread_pool_maxspare': 40,
'server.thread_pool_frequency': 4,
'server.thread_pool_shrink_frequency': 10,
'server.thread_pool_log_frequency': 3,
})
app = cherrypy.tree.mount(HelloWorld(), '/', config={})
cherrypy.engine.start()
cherrypy.engine.threadpool_monitor.configure(
thread_pool=cherrypy.server.httpserver.requests,
logger=cherrypy.engine.log
)
cherrypy.engine.block()
if __name__ == '__main__':
main() I then use ApacheBench to fire multiple requests at the server: On my machine, both versions of cheroot (before and after my changes) fail to fulfill all 6000 requests (they normally manage around 5800 requests). But in the code before my changes, the logs look like this at the end: [22/May/2019:22:53:34] ENGINE Thread pool: [current=64/idle=64/queue=0]
[22/May/2019:22:53:41] ENGINE Thread pool: [current=64/idle=64/queue=0] Shrinking by 59
[22/May/2019:22:53:43] ENGINE Thread pool: [current=33/idle=33/queue=23]
[22/May/2019:22:55:10] ENGINE Thread pool: [current=33/idle=33/queue=112] This is the result of ApacheBench finishing running, and then cheroot being in a state where threads are apparently idle, but they are no longer handling any requests. The last line is me attempting to re-run ApacheBench against the same server, but no requests are being processed at all. Repeating that against cheroot with my changes in place: [22/May/2019:22:59:31] ENGINE Thread pool: [current=41/idle=41/queue=0] Shrinking by 36
[22/May/2019:22:59:32] ENGINE Thread pool: [current=41/idle=5/queue=0]
[22/May/2019:22:59:41] ENGINE Thread pool: [current=41/idle=5/queue=0] Shrinking by 2
[22/May/2019:22:59:41] ENGINE Thread pool: [current=21/idle=5/queue=0]
[22/May/2019:22:59:51] ENGINE Thread pool: [current=21/idle=5/queue=0] Shrinking by 2
[22/May/2019:22:59:53] ENGINE Thread pool: [current=12/idle=5/queue=0]
[22/May/2019:23:00:01] ENGINE Thread pool: [current=12/idle=5/queue=0] Shrinking by 2
[22/May/2019:23:00:02] ENGINE Thread pool: [current=7/idle=5/queue=0]
[22/May/2019:23:00:11] ENGINE Thread pool: [current=7/idle=5/queue=0] Shrinking by 2
[22/May/2019:23:00:11] ENGINE Thread pool: [current=5/idle=5/queue=0] The thread pool successfully shrinks down to the desired size, and you can re-run ApacheBench against the server too. Note that there's nothing stuck in the queue, either. |
@@ -223,6 +234,7 @@ def shrink(self, amount): | |||
# to remove. As each request is processed by a worker, that worker | |||
# will terminate and be culled from the list. | |||
for n in range(n_to_remove): | |||
self._pending_shutdowns.append(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably wanted to use the constant here:
self._pending_shutdowns.append(None) | |
self._pending_shutdowns.append(_SHUTDOWNREQUEST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the object we store in _pending_shutdowns
is irrelevant. That constant is important in the requests queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have another named constant for this? I don't like it being a magic value with no meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the-allanc WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a magic value, it's a meaningless value. Naming a constant for it and then not referring to that named constant anywhere else would be more confusing.
I needed something to act as a thread-safe counter, and using a deque seemed to be the most straight-forward option.
@the-allanc plz sync this branch with master: i've fixed linter errors there |
I've manually triggered canceled stages in travis |
Codecov Report
@@ Coverage Diff @@
## master #204 +/- ##
==========================================
- Coverage 74.91% 72.84% -2.07%
==========================================
Files 23 23
Lines 3580 4346 +766
==========================================
+ Hits 2682 3166 +484
- Misses 898 1180 +282 |
Done. |
Apparently, this PR somehow breaks tests under Python 2.7 under Windows. |
@the-allanc In general, I have no objections except we need AppVeyor CI to be green first. |
This looks good to me. I don't know when I'll have time to give it proper attention, so happy to move forward once the tests are stable/addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-Authored-By: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
# Remove any dead threads from our list | ||
for t in [t for t in self._threads if not t.is_alive()]: | ||
self._threads.remove(t) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a perfect case for contextlib usage
try: | |
with contextlib.suppress(IndexError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me.
I've fixed AppVeyor on master so it's probably worth rebasing this PR. |
β What kind of change does this PR introduce?
β What is the current behavior? (You can also link to an open issue here)
This was previously reported as a CherryPy bug.
π Other information:
There was a previously submitted fix in the original ticket. I've transplanted the idea behind it into cheroot, and changed the counter to a deque object instead (as I suspect modifying a counter on an object is not an atomic operation).
π Checklist:
and description in grammatically correct, complete sentences
This change isβ