LockCurrentlyNotAvailable with shouldSkipBlockingWait #44

fejk0 · 2019-11-28T08:27:34Z

Greetings gentlemen,

we are facing problem with acquiring lock with 'shouldSkipBlockingWait' set to true;
Find reproducible test bellow.
To reproduce this manually, just kill process while owning lock, start another one and try to acquire it with options as in junit...
Is this bug or just bad usage ?

Version being used is 1.1.0
Thanks

  @Test
  void notExpectedBehaviour() throws Exception{

    final long leaseDuration = 1_000;
    final long heartBeatPeriod = 200;
    final String partition = "super_key_LOCK";

    AcquireLockOptions lockOptions = AcquireLockOptions
        .builder(partition)
        .withAcquireReleasedLocksConsistently(true)
        .withShouldSkipBlockingWait(true)
        .build();

    AmazonDynamoDBLockClient lockClientOne = new AmazonDynamoDBLockClient(
        AmazonDynamoDBLockClientOptions
            .builder(amazonDynamoDB, "table")
            .withPartitionKeyName(Constants.HASH_KEY)
            .withTimeUnit(TimeUnit.MILLISECONDS)
            .withLeaseDuration(leaseDuration)
            .withHeartbeatPeriod(heartBeatPeriod)
            .withCreateHeartbeatBackgroundThread(true)
            .build());

    LockItem lockItem = lockClientOne.acquireLock(lockOptions);
    Assertions.assertNotNull(lockItem, "lock acquired");

    // create new lock client which should not be able to acquire lock
    AmazonDynamoDBLockClient lockClientTwo = new AmazonDynamoDBLockClient(
        AmazonDynamoDBLockClientOptions
            .builder(amazonDynamoDB, "table")
            .withPartitionKeyName(Constants.HASH_KEY)
            .withTimeUnit(TimeUnit.MILLISECONDS)
            .withLeaseDuration(leaseDuration)
            .withHeartbeatPeriod(heartBeatPeriod)
            .withCreateHeartbeatBackgroundThread(true)
            .build());

    Thread.sleep(leaseDuration);
    boolean wasThrown = false;
    try {
      lockClientTwo.acquireLock(lockOptions);
    } catch (LockCurrentlyUnavailableException e) {
      wasThrown = true;
    }
    Assertions.assertTrue(wasThrown, "exception - expected behavior");

    // force shutdown
    Field shuttingDown = lockClientOne.getClass().getDeclaredField("shuttingDown");
    shuttingDown.setAccessible(true);
    shuttingDown.set(lockClientOne, true);

    // wait so item gets old
    Thread.sleep(leaseDuration * 3L);

    //  we would expect that lock can be acquired as nobody is sending heartbeats - but this throws exception LockCurrentlyNotAvailable 
    lockItem = lockClientTwo.acquireLock(lockOptions);

  }

The text was updated successfully, but these errors were encountered:

schen42 · 2020-04-22T18:42:49Z

Looks like this problem occurs because when the existing lock is retrieved, the timestamp that is used to evaluate expiry in isExpired() is updated. So the test waits leaseDuration * 3L, which would correctly cause isExpired() to return true because the timestamp is T - (T - leaseDuration * 3) > leaseDuration, but when the second acquireLock is called, that timestamp is updated to T (and T - T is not greater than leaseDuration) [ref]. This causes isExpired() to return false. In the normal case, this won't be an issue because the lock will eventually be released. However, in the corner case where a lock expires due to, say, dying process, then it appears that the lock will never be able to be retrieved again so long as shouldSkipBlockingWait is set to true.

Long story short, seems like a major bug in the library. There appears to be no clear workaround from client-side either.

murbot · 2021-01-11T22:13:16Z

I've encountered this issue as well. It seems like the "shouldSkipBlockingWait" support is in conflict with how lease expiration works (requires a process to wait for the lock for the lease duration before the lock is considered expired).

It seems we simply can't use "shouldSkipBlockingWait" until this is resolved, unfortunately.

thbr03 · 2021-02-04T13:01:05Z

@schen42 I would not agree in calling this a corner case, rather a case which I assumed was supported.

However, in the corner case where a lock expires due to, say, dying process, then it appears that the lock will never be able to be retrieved again so long as shouldSkipBlockingWait is set to true.

I've encountered this issue as well version, 1.2.0.

This commit addresses issue awslabs#44 by providing a new option to utilize wall clock time (within the provided error bound) to determine if a lock is expired. Previously it was impossible to determine if a lease was expired without blocking for at least lease duration and seeing if the version had changed. Thus, if you never block and the lease wasn't explicitly marked as released then the lock was unable to ever be acquired again. By providing a correct upper clock skew error bound clients can correctly take over locks which have expired but not been explicitly released without blocking. In most distributed systems relying on wall clock time is generally not correct but in this case we can provide an upper clock skew error bound on the scale of minutes without facing any negative consequences for most clients.

This commit addresses issue awslabs#44 by providing a new option to utilize wall clock time along with a provided error bound to determine if a lock is expired. Previously it was impossible to determine if a lease was expired without blocking for at least lease duration and seeing if the version had changed. Thus, if you never block and the lease wasn't explicitly marked as released then the lock was unable to ever be acquired again. By providing a correct upper clock skew error bound clients can correctly take over locks which have expired but not been explicitly released without blocking. In most distributed systems relying on wall clock time is generally not correct but in this case we can provide an upper clock skew error bound on the scale of minutes without facing any negative consequences for most clients.

This commit addresses issue awslabs#44 by providing a new option to utilize wall clock time along with a provided error bound to determine if a lock is expired. Previously it was impossible to determine if a lease was expired without blocking for at least lease duration and seeing if the version had changed. Thus, if you never block and the lease wasn't explicitly marked as released then the lock was unable to ever be acquired again. By providing a correct upper clock skew error bound clients can correctly take over locks which have expired but not been explicitly released without blocking. In most distributed systems relying on wall clock time is generally not correct but in this case we can provide an upper clock skew error bound on the scale of minutes without facing any negative consequences for most clients. This tradeoff of having a lease be unacquireable for several minutes is possibly better than being forced to block in many use cases.

shetsa-amzn · 2024-10-10T17:04:04Z

We have merged the latest commit from @moshegood and this change has been release in version 1.3.

chris-ryan-square added a commit to chris-ryan-square/amazon-dynamodb-lock-client that referenced this issue May 26, 2022

Fix for issue: awslabs#44

79c8d7c

chris-ryan-square mentioned this issue May 26, 2022

Fix: Lock can be acquired after lease duration if the current owner has been terminated unexpectedly when using skipBlockingWait=true #79

Closed

artembilan mentioned this issue Feb 13, 2023

consumer stops consuming events and throws KinesisMessageDrivenChannelAdapter : The lock for key 'xxxxxxxx:shardId-00000000000X' was not renewed in time spring-cloud/spring-cloud-stream-binder-aws-kinesis#186

Closed

tso mentioned this issue Mar 10, 2023

Add a withClockSkewUpperBound option when acquiring a lock #88

Open

shetsa-amzn pushed a commit that referenced this issue May 22, 2023

Fix for issue: #44

a68b3a2

shetsa-amzn mentioned this issue May 22, 2023

Fix: Lock can be acquired after lease duration if the current owner has been terminated unexpectedly when using skipBlockingWait=true #91

Closed

shetsa-amzn linked a pull request May 22, 2023 that will close this issue

Fix: Lock can be acquired after lease duration if the current owner has been terminated unexpectedly when using skipBlockingWait=true #91

Closed

moshegood mentioned this issue Feb 13, 2024

Fix: ShouldSkipBlockingWait should still acquire a dead lock if tried for longer than TTL #99

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LockCurrentlyNotAvailable with shouldSkipBlockingWait #44

LockCurrentlyNotAvailable with shouldSkipBlockingWait #44

fejk0 commented Nov 28, 2019

schen42 commented Apr 22, 2020 •

edited

Loading

murbot commented Jan 11, 2021

thbr03 commented Feb 4, 2021 •

edited

Loading

shetsa-amzn commented Oct 10, 2024

LockCurrentlyNotAvailable with shouldSkipBlockingWait #44

LockCurrentlyNotAvailable with shouldSkipBlockingWait #44

Comments

fejk0 commented Nov 28, 2019

schen42 commented Apr 22, 2020 • edited Loading

murbot commented Jan 11, 2021

thbr03 commented Feb 4, 2021 • edited Loading

shetsa-amzn commented Oct 10, 2024

schen42 commented Apr 22, 2020 •

edited

Loading

thbr03 commented Feb 4, 2021 •

edited

Loading