Skip to content

Commit

Permalink
dcache-resilience: ignore broken cached files
Browse files Browse the repository at this point in the history
Motivation:

master@c367e9fad850e4ff83560cf11edf38fc7ded313f
https://rb.dcache.org/r/11643/

changed the way resilience handles file "removal"
(no longer setting the repository entry to 'removed'
but simply by caching the replica).

This change, however, did not take into account
the handling of broken files.  Since its inception,
resilience has always tried to handle all but the
last sticky replica that was broken by removal
and recopy.

However, when removal was changed to simple
removal of the sticky bit, the logic governing
broken files was no longer valid.  Encountering
a broken file, it would indiscriminately attempt
to remove it, whether it was cached or not;
this now leads to an infinite loop, with the
file operation continuously iterating without
doing any further work.

This situation can potentially hang the pool scans
(if there are as many broken files as there are
scan threads), and even the file operation
queue.

Modification:

The current patch repairs this situation,
maintaining the intended semantics:
upon discovery of a broken sticky replica,
it is cached, and if another replica is
required, it is made.

NOTE:  it seems to be the consensus of
opinion that resilience should no longer
try to handle broken replicas at all.

This will be addressed by a subsequent
patch.

Result:

No longer any potential for stalled
operations when encountering broken replicas.

Target: master
Request: 6.2
Request: 6.1
Request: 6.0
Request: 5.2
Requires-notes: yes
Requires-book: no
Patch: https://rb.dcache.org/r/12513/
Acked-by: Tigran
  • Loading branch information
alrossi authored and mksahakyan committed Aug 17, 2020
1 parent f0fa8ac commit 53f0401
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -404,8 +404,12 @@ public boolean validateForAction(Integer storageUnit,
locations, valid, count);

if (count == 0) {
LOGGER.debug("{}, requirements are already met.", pnfsId);
return false;
Set<String> broken = verifier.getBroken(verified);
if (broken.isEmpty()) {
LOGGER.debug("{}, requirements are already met.", pnfsId);
return false;
}
count = broken.size();
}

/*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ public void handleBrokenFileLocation(PnfsId pnfsId, String pool) {

locations.remove(pool);

if (poolInfoMap.getCountableLocations(locations) > 1) {
if (poolInfoMap.getCountableLocations(locations) > 0) {
FileUpdate update = new FileUpdate(pnfsId, pool,
MessageType.CLEAR_CACHE_LOCATION,
false);
Expand Down Expand Up @@ -889,12 +889,17 @@ public Type handleVerification(FileAttributes attributes) {
* First, if there are broken replicas, remove the first one
* and iterate. As with the broken file handler routine, do
* not remove the last sticky replica, whatever it is.
*
* Since remove actually means removal of the sticky bit,
* we ignore broken cached files (their repository state would not be
* changed and we would get caught in an infinite loop of always trying
* to remove this replica before making a new one).
*/
Set<String> broken = verifier.getBroken(responsesFromPools);
if (!broken.isEmpty()) {
String target = broken.iterator().next();
if (!verifier.isSticky(target, responsesFromPools)
|| verifier.getSticky(responsesFromPools).size() > 1) {
if (verifier.isSticky(target, responsesFromPools)
&& verifier.getSticky(responsesFromPools).size() > 1) {
fileOpMap.updateOperation(pnfsId, null, target);
operation.incrementCount();
return Type.REMOVE;
Expand Down Expand Up @@ -981,10 +986,10 @@ public Type handleVerification(FileAttributes attributes) {
pnfsId, sticky);

/*
* Find the non-sticky locations.
* Find the non-sticky locations (exclude broken files).
* Partition the sticky locations between usable and excluded.
*/
Set<String> nonSticky = Sets.difference(exist, sticky);
Set<String> nonSticky = Sets.difference(Sets.difference(exist, sticky), broken);
Set<String> excluded
= verifier.areSticky(poolInfoMap.getExcludedLocationNames(members),
responsesFromPools);
Expand Down

0 comments on commit 53f0401

Please sign in to comment.