New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent building duplicate immediate uploaders #1426

Merged
merged 4 commits into from Feb 24, 2017

Conversation

Projects
None yet
2 participants
@PtrTeixeira
Contributor

PtrTeixeira commented Feb 15, 2017

There appears to be a threading issue where if the file system poller
happens to run at the same time as an immediate uploaders is running, a
second immediate uploader can be created for the same metadata.
Typically this would be prevented by the metadata-to-uploader map, but
immediate uploaders are removed from or never added to the map, so it
was never checked if there was an existing uploader for a given metadata
file.

This polls the list of immediate uploaders to check their metadata, and
will refuse to create another uploader of there is an existing uploader
with the same metadata file.

Prevent building duplicate immediate uploaders
There _appears_ to be a threading issue where if the file system poller
happens to run at the same time as an immediate uploaders is running, a
second immediate uploader can be created for the same metadata.
Typically this would be prevented by the metadata-to-uploader map, but
immediate uploaders are removed from or never added to the map, so it
was never checked if there was an existing uploader for a given metadata
file.

This polls the list of immediate uploaders to check their metadata, and
will refuse to create another uploader of there is an existing uploader
with the same metadata file.
@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Feb 16, 2017

Member

Overall, this makes the window smaller, but would still allow duplicates in some cases (i.e. if it runs after the check, but before the uploader is added to the map. I think this is on the right track, but we need to see if there is a solution that covers all cases

Member

ssalinas commented Feb 16, 2017

Overall, this makes the window smaller, but would still allow duplicates in some cases (i.e. if it runs after the check, but before the uploader is added to the map. I think this is on the right track, but we need to see if there is a solution that covers all cases

Add seperate map to track immediate uploaders
Duplicates the same kind of structure that we are currently using to
keep track of the normal loaders with the immediate uploaders.  That is,
I added another map from metadata to immediate uploaders.  This is
checked before creating a new uploader and only cleared when the
existing uploader is definitely done and has been removed.  Hopefully
this should prevent the creation of duplicate immediate uploaders in a
more robust way than the prior attempt, which might not maintain the
block for the lifetime of the first immediate uploader.
@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Feb 21, 2017

Member

This again makes our window of time for running a duplicate uploader smaller, but doesn't completely eliminate it. I think we can accomplish what we need with a simple locking mechanism and a List of the s3 metadata.

We don't ever actually access the value of the immediate uploader map, so all we really need to store is a list of the metadata that we already have an immediate uploader for.

So, let's try something like a ReentrantLock to avoid multiple things updating our list at once.

We can create our list and the single lock when the uploader driver is started like

private final List<S3UploadMetadata> immediateUploaders = new ArrayList();
private final ReentrantLock lock = new ReentrantLock();

and then have some method to try and add an immediate uploader like:

public boolean addImmediateUploader(S3UploadMetadata metadata) {
  try {
    if (lock.tryLock({some timeout}, {Some timeout TimeUnit}) {
      // If we are here, we successfully acquired the lock, no other thread can be here at the same time
      if (! immediateUploaders.contains(metadata)) {
        // we don't have one yet
        immediateUploaders.add(metadata);
        return true;
      } else {
        // we already have one
        return false;
      }
    } else {
      // couldn't get the lock in time (something probably wrong if we are getting here)
    }
  } finally {
    lock.unlock() // make sure we give it back
  }
}

That way, right at the start of things we can have an if (addImmediateUploader(metadata)) -> do the other stuff

java 8 has some nicer things for working with situations like this, but if I remember correctly they are mostly based around maps, not lists. So, this might be slightly more verbose, but will give a guarantee we aren't doing anything in duplicate.

Member

ssalinas commented Feb 21, 2017

This again makes our window of time for running a duplicate uploader smaller, but doesn't completely eliminate it. I think we can accomplish what we need with a simple locking mechanism and a List of the s3 metadata.

We don't ever actually access the value of the immediate uploader map, so all we really need to store is a list of the metadata that we already have an immediate uploader for.

So, let's try something like a ReentrantLock to avoid multiple things updating our list at once.

We can create our list and the single lock when the uploader driver is started like

private final List<S3UploadMetadata> immediateUploaders = new ArrayList();
private final ReentrantLock lock = new ReentrantLock();

and then have some method to try and add an immediate uploader like:

public boolean addImmediateUploader(S3UploadMetadata metadata) {
  try {
    if (lock.tryLock({some timeout}, {Some timeout TimeUnit}) {
      // If we are here, we successfully acquired the lock, no other thread can be here at the same time
      if (! immediateUploaders.contains(metadata)) {
        // we don't have one yet
        immediateUploaders.add(metadata);
        return true;
      } else {
        // we already have one
        return false;
      }
    } else {
      // couldn't get the lock in time (something probably wrong if we are getting here)
    }
  } finally {
    lock.unlock() // make sure we give it back
  }
}

That way, right at the start of things we can have an if (addImmediateUploader(metadata)) -> do the other stuff

java 8 has some nicer things for working with situations like this, but if I remember correctly they are mostly based around maps, not lists. So, this might be slightly more verbose, but will give a guarantee we aren't doing anything in duplicate.

Lock test for creating immediate uploader
The way that we are blocking attempts to create duplicate immediate
uploaders is by creating a list of metadata objects that we have
immediate uploaders for, and refusing to create a new uploader if that
list already contains the metadata file associated with the new
uploader. We need to perform the check-and-add operation atomically, to
prevent the creation of two duplicate immediate uploaders to be
interspersed in such a way that they are both created. In order to do
so, this moves the check-and-add operation into a locked block of code
in order to prevent that kind of interspersal.
Show outdated Hide outdated ...java/com/hubspot/singularity/s3uploader/SingularityS3UploaderDriver.java
try {
if (lock.tryLock(400, TimeUnit.MILLISECONDS)) {
if (this.immediateUploadMetadata.contains(metadata)) {
LOG.debug("Stopped from creating a duplicate immediate uploader.");

This comment has been minimized.

@ssalinas

ssalinas Feb 22, 2017

Member

Let's make these statements slightly more useful by adding the actual metadata to them. Like:

LOG.debug("Already have an immediate uploader for {}", metadata);
@ssalinas

ssalinas Feb 22, 2017

Member

Let's make these statements slightly more useful by adding the actual metadata to them. Like:

LOG.debug("Already have an immediate uploader for {}", metadata);
@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Feb 22, 2017

Member

just the small comment on logging, other than that LGTM

Member

ssalinas commented Feb 22, 2017

just the small comment on logging, other than that LGTM

Add clarifying information to logging
Log the metadata involved when attempting to get a lock on creating a
new immediate SingularityS3Uploader.
@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Feb 24, 2017

Member

👍 LGTM

Member

ssalinas commented Feb 24, 2017

👍 LGTM

@ssalinas ssalinas merged commit 2605887 into s3uploader-upload-immediately Feb 24, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@ssalinas ssalinas deleted the prevent-duplicate-immediate-uploaders branch Feb 24, 2017

@ssalinas ssalinas modified the milestone: 0.14.0 Mar 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment