Skip to content

Commit

Permalink
pool: Fix pool size health check in case of asynchronous release of s…
Browse files Browse the repository at this point in the history
…pace

We have observed that operating sometimes don't report freed space
right away when deleting files. This was observed on Linux (using ext4
and Mac OS X), but the behaviour can likely be observed on many other
systems.

The pool periodically checks the free space on the pool partition and
compares it with its own internal account of free space. If the partition
does not have enough free space, the pool size is reduced accordingly.

If the operating system does not report free space right after a file
has been deleted, it may happen that the pool's internal account object
has registered free space that isn't yet reported as free by the operating
system. In that case, the periodic health check may falsely reduce the
pool size.

The patch fixes this by introducing a 60 second grace period after
the pool deletes a file. During this period, the pool size health
check is suppressed.

Target: trunk
Request: 2.10
Request: 2.9
Request: 2.8
Request: 2.7
Request: 2.6
Require-notes: yes
Require-book: no
Acked-by: Paul Millar <paul.millar@desy.de>
Patch: https://rb.dcache.org/r/7366/
(cherry picked from commit b721145)

Conflicts:
	modules/dcache/src/main/java/org/dcache/pool/repository/v5/CheckHealthTask.java

(cherry picked from commit 97fa65a)

Conflicts:
	modules/dcache/src/main/java/org/dcache/pool/repository/v5/CheckHealthTask.java
  • Loading branch information
gbehrmann committed Oct 20, 2014
1 parent fd72c38 commit 6595426
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 19 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ public class Account
private long _precious;
private long _removable;
private long _requested;
private long _timeOfLastFree;

public synchronized long getTotal()
{
Expand Down Expand Up @@ -48,6 +49,11 @@ public synchronized long getRequested()
return _requested;
}

public synchronized long getTimeOfLastFree()
{
return _timeOfLastFree;
}

public synchronized void setTotal(long total)
{
if (total < _used) {
Expand All @@ -71,6 +77,7 @@ public synchronized void free(long space)

notifyAll();
_used -= space;
_timeOfLastFree = System.currentTimeMillis();
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
class CheckHealthTask implements Runnable
{
private final static Logger _log = LoggerFactory.getLogger(CheckHealthTask.class);
public static final int GRACE_PERIOD_ON_FREE = 60_000;

private final CacheRepositoryV5 _repository;

Expand Down Expand Up @@ -114,25 +115,42 @@ private void adjustFreeSpace()
*/
Account account = _account;
synchronized (account) {
long free = _metaDataStore.getFreeSpace();
long total = _metaDataStore.getTotalSpace();

if (total == 0) {
_log.debug("Java reported file system size as 0. Skipping file system size check.");
return;
}

if (total < account.getTotal()) {
_log.warn(String.format("The file system containing the data files appears to be smaller (%,d bytes) than the configured pool size (%,d bytes).", total, _account.getTotal()));
}

if (free < account.getFree()) {
long newSize =
account.getTotal() - (account.getFree() - free);

_log.warn(String.format("The file system containing the data files appears to have less free space (%,d bytes) than expected (%,d bytes); reducing the pool size to %,d bytes to compensate. Notice that this does not leave any space for the meta data. If such data is stored on the same file system, then it is paramount that the pool size is reconfigured to leave enough space for the meta data.", free, _account.getFree(), newSize));

account.setTotal(newSize);
/* It is not uncommon that file system free space asynchronously from
* file deletion. Thus after we delete a file, it may take a while
* before the free space is reported as such by the operating system.
* To compensate, we suppress this check for a grace period after the
* last delete.
*/
if (account.getTimeOfLastFree() > System.currentTimeMillis() - GRACE_PERIOD_ON_FREE) {
long free = _metaDataStore.getFreeSpace();
long total = _metaDataStore.getTotalSpace();

if (total == 0) {
_log.debug("Java reported file system size as 0. Skipping file system size check.");
return;
}

if (total < account.getTotal()) {
_log.warn(String.format("The file system containing the data files appears to be smaller " +
"(%,d bytes) than the configured pool size (%,d bytes).",
total, _account.getTotal()));
}

if (free < account.getFree()) {
long newSize =
account.getTotal() - (account.getFree() - free);

_log.warn(String.format("The file system containing the data files appears to have less free " +
"space (%,d bytes) than expected (%,d bytes); reducing the " +
"pool size to %,d bytes to compensate. Notice that this does " +
"not leave any space for the meta data. If such data is " +
"stored on the same file system, then it is paramount that " +
"the pool size is reconfigured to leave enough space for the " +
"meta data.",
free, _account.getFree(), newSize));

account.setTotal(newSize);
}
}
}
}
Expand Down

0 comments on commit 6595426

Please sign in to comment.