Skip to content

Commit

Permalink
pool: Allow concurrent replica updates in migration modules
Browse files Browse the repository at this point in the history
Motivation:

A migration job has a master lock for all operations that affect the tasks
queue in any way. This master lock is also acquired after a task has completed,
at which point the local replica state is updated according to the job
definition and the task is removed from the various internal maps and queues of
the job. This means that even with a high concurrency setting and several
worker threads, at most a single replica's state can be updated at a time by a
given job. On hardware in which syncing the meta db to disk is slow, this can
significantly limit migration job throughput.

Modification:

The method that updates the replica state doesn't actually touch any of the
internal state of a migration job. The reason it obtained the lock anyway is
that updating the replica state would generate repository change notifications
and these would be delivered to the job on a concurrent thread. The lock
prevented interleaving processing of those notifications with the state change.

The state change handler however only does one of two things: It either tries
to add or remove a replica from the job. If the job is currently modifying the
replica state, then the task is in the running map and in the Done state. In
this case neither add nor remove have any effect (the task is already
considered to be part of the job, yet removing it has no effect as cancelling a
task in the Done state is a noop). Thus it is my conclusion that it isn't really
necessary to obtain the lock while updating the replica state.

This patch moves the replica state update out of the locked region. This will
allow multiple tasks to update a replica concurrently.

Result:

Less lock contention in migration module, in particular when meta db updates are
slow.

Target: master
Require-notes: yes
Require-book: no
Acked-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>

Reviewed at https://rb.dcache.org/r/9911/
  • Loading branch information
gbehrmann committed Feb 15, 2017
1 parent 0ee24c6 commit a9c6538
Showing 1 changed file with 2 additions and 3 deletions.
Expand Up @@ -804,10 +804,10 @@ public void taskFailedPermanently(Task task, int rc, String msg)
@Override
public void taskCompleted(Task task)
{
PnfsId pnfsId = task.getPnfsId();
applySourceMode(pnfsId);
_lock.lock();
try {
PnfsId pnfsId = task.getPnfsId();
applySourceMode(pnfsId);
_running.remove(pnfsId);
_context.unlock(pnfsId);
_statistics.addCompleted(_sizes.remove(pnfsId));
Expand Down Expand Up @@ -895,7 +895,6 @@ private boolean isPinned(CacheEntry entry)
}

/** Apply source mode update to replica. */
@GuardedBy("_lock")
private void applySourceMode(PnfsId pnfsId)
{
try {
Expand Down

0 comments on commit a9c6538

Please sign in to comment.