Skip to content

Commit

Permalink
resilience: handle storage unit NoSuchElement failure
Browse files Browse the repository at this point in the history
Motivation:

When doing a pool scan, the storage unit information must be
recovered for each file.  This is done from the chimera attributes.
The code checks an internal map for an index number corresponding
to the unit it knows about from the PoolMonitor.

However, if the pool selection unit configuration changes such that
a storage unit is eliminated, that mapping will also be deleted
from resilience.  In the case that the attributes stored in
Chimera still have the older storage class information, there
will be a NoSuchElementException thrown.

The code, however, neglects to handle this special case.  The
exception seems to be thrown and never caught, not even as
a RuntimeException.  This prevents the scan from completing.

Modification:

Catch the NoSuchElementException and return false so that
the namespace scanning can skip it and move on.

Result:

Pool scans that encounter this situation do not get stuck
forever in the queue.

Target: master
Request: 3.2
Request: 3.1
Request: 3.0
Request: 2.16
Require-notes: yes
Require-book: no
Acked-by: Tigran
  • Loading branch information
alrossi committed Sep 7, 2017
1 parent 91ebad3 commit 5a7bdfe
Showing 1 changed file with 9 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
import java.sql.Connection;
import java.util.ArrayList;
import java.util.Collection;
import java.util.NoSuchElementException;

import diskCacheV111.util.AccessLatency;
import diskCacheV111.util.CacheException;
Expand All @@ -76,6 +77,7 @@ LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
import org.dcache.resilience.db.ScanSummary;
import org.dcache.resilience.handlers.FileOperationHandler;
import org.dcache.resilience.handlers.ResilienceMessageHandler;
import org.dcache.resilience.util.ExceptionMessage;
import org.dcache.resilience.util.LocationSelector;
import org.dcache.resilience.util.PoolSelectionUnitDecorator.SelectionAction;
import org.dcache.vehicles.FileAttributes;
Expand Down Expand Up @@ -294,7 +296,13 @@ public boolean validateForAction(Integer storageUnit,
* Storage unit is not recorded in checkpoint, so it should
* be set here.
*/
unitIndex = poolInfoMap.getStorageUnitIndex(attributes);
try {
unitIndex = poolInfoMap.getStorageUnitIndex(attributes);
} catch (NoSuchElementException e) {
LOGGER.error("validateForAction, cannot handle {}: {}.",
pnfsId, new ExceptionMessage(e));
return false;
}

LOGGER.trace("validateForAction {} got unit from attributes {}.",
pnfsId, unitIndex);
Expand Down

0 comments on commit 5a7bdfe

Please sign in to comment.