Skip to content

Commit

Permalink
dcache-resilience: repair over-aggressive handling of broken file mes…
Browse files Browse the repository at this point in the history
…sages

Motivation:

While https://rb.dcache.org/r/10734 (commit a352dfc)
made some important repairs to the handling of broken files, it
did not recognize a bug which existed in the code path which
neglects to check that the file is actually ONLINE and the pool is resilient.

This results in nearly double the number of alarms reported (since
the message received probably was already associated with a BROKEN_FILE
alarm).

Modification:

Do the necessary checks.

Result:

Resilience should not be reporting on non-resilient files which have
been corrupted.

Target: master
Request: 4.1
Request: 4.0
Request: 3.2
Request: 3.1
Request: 3.0
Request: 2.16
Acked-by: Dmitry
Require-notes: yes
Require-book: no
  • Loading branch information
alrossi committed Apr 9, 2018
1 parent 39558c9 commit 1656cb6
Showing 1 changed file with 10 additions and 5 deletions.
Expand Up @@ -167,14 +167,19 @@ public void handleBrokenFileLocation(PnfsId pnfsId, String pool) {
= FileUpdate.getAttributes(pnfsId, pool,
MessageType.CORRUPT_FILE,
namespace);
int actual = 0;
int countable = 0;
if (attributes == null) {
LOGGER.trace("{} not ONLINE.", pnfsId);
return;
}

if (attributes != null) {
actual = attributes.getLocations().size();
countable = poolInfoMap.getCountableLocations(attributes.getLocations());
if (!poolInfoMap.isResilientPool(pool)) {
LOGGER.trace("{} not in resilient group.", pool);
return;
}

int actual = attributes.getLocations().size();
int countable = poolInfoMap.getCountableLocations(attributes.getLocations());

if (actual <= 1) {
/*
* This is the only copy, or it is not/no longer in the
Expand Down

0 comments on commit 1656cb6

Please sign in to comment.