New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OAK-10006 Prevent writes to Azure storage if lock is not held #760
base: trunk
Are you sure you want to change the base?
OAK-10006 Prevent writes to Azure storage if lock is not held #760
Conversation
|
||
void doRenewLease() throws StorageException { | ||
BlobRequestOptions blobRequestOptions = new BlobRequestOptions(); | ||
blobRequestOptions.setMaximumExecutionTimeInMs(Math.max(INTERVAL - renewalInterval - 1, INTERVAL / 2 - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here that explains this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will add a comment.
In summary, previously renewLease()
was called with the default BlobRequestOptions
. However, those defaults can be set globally for the client instance and may be not a good fit for calling renewLease()
as it could block for too long. As the lease is acquired for 60 seconds and renewLease()
is called by default every 30 seconds, I think it should timeout after at most 29 seconds (giving 1 sec margin), so that we can call notifyLockStatusChange(LockStatus.RENEWAL_FAILED)
and suspend writes before the lease becomes available to be acquired by another instance.
BTW, I have just noticed that values should be in millis and thus multiplied by 1000...
case LOST: | ||
try { | ||
if (tarFiles != null) { | ||
tarFiles.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will write the graph and references to the tar file (archive.close()
in TarWriter
), shouldn't we avoid that? Also it will try to flush the write queue (q.flush()
in AbstractRemoteSegmentArchiveWriter
), so if it's not empty, it will wait there forever, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Thanks for spotting this, @ahanikel !
I suppose we should add an unsafeShutdown()
method to TarFiles
, TarWriter
and AbstractRemoteSegmentArchiveWriter
, so that we can shut them down without attempting any finalisation writes to the remote storage, which would anyway fail or corrupt the data.
What do you think?
If the lock is lost to another instance, all writes attempts should fail with an exception. If the lock fails to be renewed, all writes are suspended (blocked) until the renewal succeed.
067b62e
to
d434cde
Compare
If the lock is lost to another instance, all writes attempts should fail with an exception.
If the lock fails to be renewed, all writes are suspended (blocked) until the renewal succeed.