Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LockCurrentlyNotAvailable with shouldSkipBlockingWait #44

Open
fejk0 opened this issue Nov 28, 2019 · 4 comments
Open

LockCurrentlyNotAvailable with shouldSkipBlockingWait #44

fejk0 opened this issue Nov 28, 2019 · 4 comments

Comments

@fejk0
Copy link

fejk0 commented Nov 28, 2019

Greetings gentlemen,

we are facing problem with acquiring lock with 'shouldSkipBlockingWait' set to true;
Find reproducible test bellow.
To reproduce this manually, just kill process while owning lock, start another one and try to acquire it with options as in junit...
Is this bug or just bad usage ?

Version being used is 1.1.0
Thanks

  @Test
  void notExpectedBehaviour() throws Exception{

    final long leaseDuration = 1_000;
    final long heartBeatPeriod = 200;
    final String partition = "super_key_LOCK";

    AcquireLockOptions lockOptions = AcquireLockOptions
        .builder(partition)
        .withAcquireReleasedLocksConsistently(true)
        .withShouldSkipBlockingWait(true)
        .build();

    AmazonDynamoDBLockClient lockClientOne = new AmazonDynamoDBLockClient(
        AmazonDynamoDBLockClientOptions
            .builder(amazonDynamoDB, "table")
            .withPartitionKeyName(Constants.HASH_KEY)
            .withTimeUnit(TimeUnit.MILLISECONDS)
            .withLeaseDuration(leaseDuration)
            .withHeartbeatPeriod(heartBeatPeriod)
            .withCreateHeartbeatBackgroundThread(true)
            .build());

    LockItem lockItem = lockClientOne.acquireLock(lockOptions);
    Assertions.assertNotNull(lockItem, "lock acquired");

    // create new lock client which should not be able to acquire lock
    AmazonDynamoDBLockClient lockClientTwo = new AmazonDynamoDBLockClient(
        AmazonDynamoDBLockClientOptions
            .builder(amazonDynamoDB, "table")
            .withPartitionKeyName(Constants.HASH_KEY)
            .withTimeUnit(TimeUnit.MILLISECONDS)
            .withLeaseDuration(leaseDuration)
            .withHeartbeatPeriod(heartBeatPeriod)
            .withCreateHeartbeatBackgroundThread(true)
            .build());

    Thread.sleep(leaseDuration);
    boolean wasThrown = false;
    try {
      lockClientTwo.acquireLock(lockOptions);
    } catch (LockCurrentlyUnavailableException e) {
      wasThrown = true;
    }
    Assertions.assertTrue(wasThrown, "exception - expected behavior");

    // force shutdown
    Field shuttingDown = lockClientOne.getClass().getDeclaredField("shuttingDown");
    shuttingDown.setAccessible(true);
    shuttingDown.set(lockClientOne, true);

    // wait so item gets old
    Thread.sleep(leaseDuration * 3L);

    //  we would expect that lock can be acquired as nobody is sending heartbeats - but this throws exception LockCurrentlyNotAvailable 
    lockItem = lockClientTwo.acquireLock(lockOptions);

  }
@schen42
Copy link

schen42 commented Apr 22, 2020

Looks like this problem occurs because when the existing lock is retrieved, the timestamp that is used to evaluate expiry in isExpired() is updated. So the test waits leaseDuration * 3L, which would correctly cause isExpired() to return true because the timestamp is T - (T - leaseDuration * 3) > leaseDuration, but when the second acquireLock is called, that timestamp is updated to T (and T - T is not greater than leaseDuration) [ref]. This causes isExpired() to return false. In the normal case, this won't be an issue because the lock will eventually be released. However, in the corner case where a lock expires due to, say, dying process, then it appears that the lock will never be able to be retrieved again so long as shouldSkipBlockingWait is set to true.

Long story short, seems like a major bug in the library. There appears to be no clear workaround from client-side either.

@murbot
Copy link

murbot commented Jan 11, 2021

I've encountered this issue as well. It seems like the "shouldSkipBlockingWait" support is in conflict with how lease expiration works (requires a process to wait for the lock for the lease duration before the lock is considered expired).

It seems we simply can't use "shouldSkipBlockingWait" until this is resolved, unfortunately.

@thbr03
Copy link

thbr03 commented Feb 4, 2021

@schen42 I would not agree in calling this a corner case, rather a case which I assumed was supported.

However, in the corner case where a lock expires due to, say, dying process, then it appears that the lock will never be able to be retrieved again so long as shouldSkipBlockingWait is set to true.

I've encountered this issue as well version, 1.2.0.

chris-ryan-square added a commit to chris-ryan-square/amazon-dynamodb-lock-client that referenced this issue May 26, 2022
tso pushed a commit to tso/amazon-dynamodb-lock-client that referenced this issue Mar 9, 2023
This commit addresses issue awslabs#44 by providing a new option to utilize
wall clock time (within the provided error bound) to determine if a
lock is expired.

Previously it was impossible to determine if a lease was expired without
blocking for at least lease duration and seeing if the version had changed.
Thus, if you never block and the lease wasn't explicitly marked as released
then the lock was unable to ever be acquired again. By providing a correct
upper clock skew error bound clients can correctly take over locks which
have expired but not been explicitly released without blocking.

In most distributed systems relying on wall clock time is generally not
correct but in this case we can provide an upper clock skew error bound
on the scale of minutes without facing any negative consequences for most
clients.
tso pushed a commit to tso/amazon-dynamodb-lock-client that referenced this issue Mar 9, 2023
This commit addresses issue awslabs#44 by providing a new option to utilize
wall clock time along with a provided error bound to determine if a
lock is expired.

Previously it was impossible to determine if a lease was expired without
blocking for at least lease duration and seeing if the version had changed.
Thus, if you never block and the lease wasn't explicitly marked as released
then the lock was unable to ever be acquired again. By providing a correct
upper clock skew error bound clients can correctly take over locks which
have expired but not been explicitly released without blocking.

In most distributed systems relying on wall clock time is generally not
correct but in this case we can provide an upper clock skew error bound
on the scale of minutes without facing any negative consequences for most
clients.
tso pushed a commit to tso/amazon-dynamodb-lock-client that referenced this issue Mar 10, 2023
This commit addresses issue awslabs#44 by providing a new option to utilize
wall clock time along with a provided error bound to determine if a
lock is expired.

Previously it was impossible to determine if a lease was expired without
blocking for at least lease duration and seeing if the version had changed.
Thus, if you never block and the lease wasn't explicitly marked as released
then the lock was unable to ever be acquired again. By providing a correct
upper clock skew error bound clients can correctly take over locks which
have expired but not been explicitly released without blocking.

In most distributed systems relying on wall clock time is generally not
correct but in this case we can provide an upper clock skew error bound
on the scale of minutes without facing any negative consequences for most
clients.
tso pushed a commit to tso/amazon-dynamodb-lock-client that referenced this issue Mar 10, 2023
This commit addresses issue awslabs#44 by providing a new option to utilize
wall clock time along with a provided error bound to determine if a
lock is expired.

Previously it was impossible to determine if a lease was expired without
blocking for at least lease duration and seeing if the version had changed.
Thus, if you never block and the lease wasn't explicitly marked as released
then the lock was unable to ever be acquired again. By providing a correct
upper clock skew error bound clients can correctly take over locks which
have expired but not been explicitly released without blocking.

In most distributed systems relying on wall clock time is generally not
correct but in this case we can provide an upper clock skew error bound
on the scale of minutes without facing any negative consequences for most
clients.
tso pushed a commit to tso/amazon-dynamodb-lock-client that referenced this issue Mar 10, 2023
This commit addresses issue awslabs#44 by providing a new option to utilize
wall clock time along with a provided error bound to determine if a
lock is expired.

Previously it was impossible to determine if a lease was expired without
blocking for at least lease duration and seeing if the version had changed.
Thus, if you never block and the lease wasn't explicitly marked as released
then the lock was unable to ever be acquired again. By providing a correct
upper clock skew error bound clients can correctly take over locks which
have expired but not been explicitly released without blocking.

In most distributed systems relying on wall clock time is generally not
correct but in this case we can provide an upper clock skew error bound
on the scale of minutes without facing any negative consequences for most
clients.
tso pushed a commit to tso/amazon-dynamodb-lock-client that referenced this issue Mar 10, 2023
This commit addresses issue awslabs#44 by providing a new option to utilize
wall clock time along with a provided error bound to determine if a
lock is expired.

Previously it was impossible to determine if a lease was expired without
blocking for at least lease duration and seeing if the version had changed.
Thus, if you never block and the lease wasn't explicitly marked as released
then the lock was unable to ever be acquired again. By providing a correct
upper clock skew error bound clients can correctly take over locks which
have expired but not been explicitly released without blocking.

In most distributed systems relying on wall clock time is generally not
correct but in this case we can provide an upper clock skew error bound
on the scale of minutes without facing any negative consequences for most
clients.
tso pushed a commit to tso/amazon-dynamodb-lock-client that referenced this issue Mar 10, 2023
This commit addresses issue awslabs#44 by providing a new option to utilize
wall clock time along with a provided error bound to determine if a
lock is expired.

Previously it was impossible to determine if a lease was expired without
blocking for at least lease duration and seeing if the version had changed.
Thus, if you never block and the lease wasn't explicitly marked as released
then the lock was unable to ever be acquired again. By providing a correct
upper clock skew error bound clients can correctly take over locks which
have expired but not been explicitly released without blocking.

In most distributed systems relying on wall clock time is generally not
correct but in this case we can provide an upper clock skew error bound
on the scale of minutes without facing any negative consequences for most
clients. This tradeoff of having a lease be unacquireable for several
minutes is possibly better than being forced to block in many use cases.
tso pushed a commit to tso/amazon-dynamodb-lock-client that referenced this issue Mar 10, 2023
This commit addresses issue awslabs#44 by providing a new option to utilize
wall clock time along with a provided error bound to determine if a
lock is expired.

Previously it was impossible to determine if a lease was expired without
blocking for at least lease duration and seeing if the version had changed.
Thus, if you never block and the lease wasn't explicitly marked as released
then the lock was unable to ever be acquired again. By providing a correct
upper clock skew error bound clients can correctly take over locks which
have expired but not been explicitly released without blocking.

In most distributed systems relying on wall clock time is generally not
correct but in this case we can provide an upper clock skew error bound
on the scale of minutes without facing any negative consequences for most
clients. This tradeoff of having a lease be unacquireable for several
minutes is possibly better than being forced to block in many use cases.
tso pushed a commit to tso/amazon-dynamodb-lock-client that referenced this issue Mar 10, 2023
This commit addresses issue awslabs#44 by providing a new option to utilize
wall clock time along with a provided error bound to determine if a
lock is expired.

Previously it was impossible to determine if a lease was expired without
blocking for at least lease duration and seeing if the version had changed.
Thus, if you never block and the lease wasn't explicitly marked as released
then the lock was unable to ever be acquired again. By providing a correct
upper clock skew error bound clients can correctly take over locks which
have expired but not been explicitly released without blocking.

In most distributed systems relying on wall clock time is generally not
correct but in this case we can provide an upper clock skew error bound
on the scale of minutes without facing any negative consequences for most
clients. This tradeoff of having a lease be unacquireable for several
minutes is possibly better than being forced to block in many use cases.
shetsa-amzn pushed a commit that referenced this issue May 22, 2023
@shetsa-amzn
Copy link
Collaborator

We have merged the latest commit from @moshegood and this change has been release in version 1.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants