Skip to content

Commit 22de618

Browse files
author
epriestley
committedJun 26, 2020
When acquiring a GlobalLock, put good connections that just got unlucky back in the pool
Summary: See PHI1794, which describes a connection exhaustion issue with a large number of webhook tasks in queue. The "GlobalLock" mechanism manages a separate connection pool from the main pool, and webhook workers immediately try to grab a webhook lock with a 0-second wait when they start. So far, this is fine. Prior to this change, good connections which fail to acqiure a lock are discarded. This can lead to connection exhaustion as the worker rapidly cycles through lock attempts: the connections will remain open for at least 60 seconds (since D16389) in an effort to avoid outbound port exhaustion, but they're effectively orphaned because they aren't part of the main pool and aren't part of the lock pool. We're basically leaking a connection every time we fail to lock. Failing to lock doesn't mean we need to discard the connection: it's a completely suitable connection for reuse. Instead of dropping it on the floor, put it into the lock pool. Test Plan: - Used "bin/webhook call ... --count 10000 --background" to queue a large number of webhook calls against a slow ("sleep(15);") webhook. - Used "bin/phd launch 32 taskmaster" to start taskmasters. - Observed MySQL connection behavior: - Before change: 2048 configured connections immediately exhausted. - After change: connections stable at ~160ish. - Ran queue for a while, saw expected single-threaded calls to webhook. Differential Revision: https://secure.phabricator.com/D21369
1 parent d91abf5 commit 22de618

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed
 

‎src/infrastructure/util/PhabricatorGlobalLock.php

+12
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,18 @@ protected function doLock($wait) {
144144

145145
$ok = head($result);
146146
if (!$ok) {
147+
148+
// See PHI1794. We failed to acquire the lock, but the connection itself
149+
// is still good. We're done with it, so add it to the pool, just as we
150+
// would if we were releasing the lock.
151+
152+
// If we don't do this, we may establish a huge number of connections
153+
// very rapidly if many workers try to acquire a lock at once. For
154+
// example, this can happen if there are a large number of webhook tasks
155+
// in the queue.
156+
157+
self::$pool[] = $conn;
158+
147159
throw id(new PhutilLockException($lock_name))
148160
->setHint($this->newHint($lock_name, $wait));
149161
}

0 commit comments

Comments
 (0)
Failed to load comments.