Skip to content

Commit

Permalink
replica manager (old): fix countable logic when rescanning pool repos…
Browse files Browse the repository at this point in the history
…itory

Motivation:

Users of the old replica manager have reported inconsistent behavior regarding source files
which, depending on the dCache configuration, can cause thrashing or unbounded copy-retries.

The problem has its origin in the fact that the replica manager was designed to handle only
precious files, but since then there has been a change such that the RetentionPolicy
and AccessLatency tags are mapped to the attributes precious, cached and sticky.  The
manager was apparently not adjusted to take into consideration this change.  Inconsistency
can thus arise if the resilient pools are linked to storage units which are in turn linked to
directories where the RetentionPolicy is not CUSTODIAL [=> precious], because the
original/source copy will land on the pool as "cached+sticky", regardless of whether the
value of the pool.lfs property is 'precious' (i.e., no-flush) for that pool.

The initial replication proceeds as normal in this case, because the entry method simply
takes all files from a resilient pool and marks them as 'countable', the prerequisite for
being included in the hard replica count.  However, if for any reason (such as a pool state change,
reboot of the replica manager, etc.) a rescan of the pool occurs, and the pool contains such
originals, the replicas database entry for this file now becomes marked countable=false,
because it is cached, not precious (the scan uses a different method from the
initial one).  Even though the required number of copies may still exist on readable pools,
the replica manager nevertheless thinks one is missing and proceeds to attempt a p2p.  This may
lead, as in the reported ticket, to repeated failures.  In any case, it leads to more work than is
needed, since there is no danger that a cached+sticky (i.e., system sticky) file will be removed and thus
should count as a hard replica.

Modification:

The method used to process pool repository entries into the replicas table now considers
both precious and cached+sticky copies as 'countable'.

Result:

Inconsistent counting of hard replicas is eliminated via a minimal code intervention.

Note 1:  Soft replicas (cached but not sticky) are still countable='f', since they are subject to removal by the sweeper.
Note 2:  The other solution to this issue is simply to inform our users that RetentionPolicy should always be CUSTODIAL
         for the directories linked to resilient pools.   This, however, still would require the site to (a) propagate
         the tag change throughout the directory tree where needed; (b) change the tags for the files in these directories
         in the namespace; (b) modify those files to countable='t' in the replicas table.   This could be a complicated and
         lengthy process for a big installation, especially one like BNL with 6 separate replication managers.

Target: master
RT: 8871  (replication problem: bitmask=258 and countable=false for a file copy in Replication Manager DB)
Request: 2.14
Request: 2.13
Request: 2.12
Request: 2.11
Request: 2.10
Require-notes: yes
Require-book: yes
Acked-by: Dmitry

RELEASE NOTES:  Fixes a bug where source files which are written to directories with REPLICA ONLINE tags are
                at first marked countable but then on rescan of the pool are marked not countable.  The fix
                makes 'cached' + (system-owned)'sticky' files the equivalent of precious files on pools
                with pool.lfs='precious'.
  • Loading branch information
alrossi committed Jan 20, 2016
1 parent e44543c commit 4106cb3
Showing 1 changed file with 4 additions and 6 deletions.
Expand Up @@ -13,14 +13,13 @@
import java.sql.SQLException;
import java.sql.Statement;
import java.text.MessageFormat;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;

import diskCacheV111.repository.CacheRepositoryEntryInfo;
import diskCacheV111.util.PnfsId;

import dmg.cells.nucleus.CellAdapter;
import java.util.Collections;

import static org.dcache.commons.util.SqlHelper.tryToClose;

Expand Down Expand Up @@ -143,16 +142,15 @@ public synchronized void addPnfsToPool(List<CacheRepositoryEntryInfo> fileList,
// table
String pnfsId = info.getPnfsId().toString();
int bitmask = info.getBitMask();
boolean notRemovable = info.isPrecious() ||
(info.isCached() && info.isSticky());
boolean countable =
info.isPrecious() &&
// info.isCached() &&
notRemovable &&
!info.isReceivingFromClient() &&
!info.isReceivingFromStore() &&
// info.isSendingToStore() &&
!info.isBad() &&
!info.isRemoved() &&
!info.isDestroyed();
// info.isSticky();

try {
pstmt.setString(1, pnfsId);
Expand Down

0 comments on commit 4106cb3

Please sign in to comment.