Add ability to upload files immediately to S3 #1399

Merged
merged 19 commits into from Feb 27, 2017

Conversation

Projects
None yet
3 participants
@PtrTeixeira
Contributor

PtrTeixeira commented Jan 17, 2017

Change has two parts: Adds an optional uploadImmediately field to the uploader
metadata, that when present and set will cause the uploader to upload it
immediately on change, and code within the on-detection function
handleNewOrModifiedS3Metadata to perform the upload if that field is
detected.

This change in particular involved moving the vast majority of the code in
checkUploads into a seperate function. I wanted to avoid duplicating the
logic in checkUploads in handleNewOrModifiedS3Metadata. If I directly
called checkUploads, however, that would upload all the files that were
queued to upload, rather than just the files marked to upload immediately. So I
made a new function that would just take a map of files to upload and upload
only those files. So when it needed to upload the all queued files, as in
checkUploads, it could directly work on the metadataToUploader field, and
when it only needs to upload a single block of files, it can only be passed
that single uploader/metadata.

Add ability to upload files immediately to S3
Change has two parts: Adds an optional uploadImmediately field to the uploader
metadata, that when present and set will cause the uploader to upload it
immediately on change, and code within the on-detection function
`handleNewOrModifiedS3Metadata` to perform the upload if that field is
detected.

This change in particular involved refactoring the vast majority of the code in
`checkUploads` into a seperate function.  I wanted to avoid duplicating the
logic in `checkUploads` in `handleNewOrModifiedS3Metadata`. If I directly
called `checkUploads`, however, that would upload all the files that were
queued to upload, rather than just the files marked to upload immediately. So I
made a new function that would just take a map of files to upload and upload
only those files. So when it needed to upload the all queued files, as in
`checkUploads`, it could directly work on the `metadataToUploader` field, and
when it only needs to upload a single block of files, it can only be passed
that single uploader/metadata.
@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jan 17, 2017

Member

Commenting here to document a chat we had in-person. Decided to go a slightly different route and have the immediate uploads submit their own future to the executorService, rather than trying to hook something immediate into the runs of the scheduledExecutorService. @PtrTeixeira is making updates and we'll review again once those are up

Member

ssalinas commented Jan 17, 2017

Commenting here to document a chat we had in-person. Decided to go a slightly different route and have the immediate uploads submit their own future to the executorService, rather than trying to hook something immediate into the runs of the scheduledExecutorService. @PtrTeixeira is making updates and we'll review again once those are up

PtrTeixeira added some commits Jan 17, 2017

Move uploading checks off of fs polling thread
The way that I had written the code for this, the main thread, which was
used to poll the filesystem for metadata changes, was being blocked by
awaiting the files to finish uploading. This moves the code that depends
on the the uploads finishing (namely, logging that it finished and that
closing the uploaders) into callbacks off of the futures.  This change
also no longer places any uploaders into the class-scoped expiredUpload
queue.
Fix constructor call on modified resource
I modified the S2UploadMetadata resource to add a new optional field,
then failed to modify any constructor calls to guarantee that it
actually worked.  This _should_ make it so that the program actually
compiles.
@PtrTeixeira

This comment has been minimized.

Show comment
Hide comment
@PtrTeixeira

PtrTeixeira Jan 17, 2017

Contributor

cc @ssalinas Could you take a look?

Contributor

PtrTeixeira commented Jan 17, 2017

cc @ssalinas Could you take a look?

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jan 18, 2017

Member

@PtrTeixeira looks good overall. For ease of debugging when you try this out, I'd add some info or debug logging near the beginning of performImmediateUpload. Would give us a better idea that everything is being triggered appropriately rather than just waiting to see if things uploaded or waiting for the callback to trigger.

Member

ssalinas commented Jan 18, 2017

@PtrTeixeira looks good overall. For ease of debugging when you try this out, I'd add some info or debug logging near the beginning of performImmediateUpload. Would give us a better idea that everything is being triggered appropriately rather than just waiting to see if things uploaded or waiting for the callback to trigger.

Add logging statements for immediate uploader
Add some additional logging statements to help track the progress of the
program before and through the callbacks on the immediate uploader.
Hopefully this should make it easier to debug the program.

@ssalinas ssalinas added the hs_staging label Jan 18, 2017

@ssalinas ssalinas modified the milestone: 0.14.0 Jan 18, 2017

PtrTeixeira added some commits Jan 18, 2017

Guarantee uploader removed from expiring list
Depending on the settings, the immediate uploader may still get added to
the expiring list. The change guarantees that it is always removed from
the expiring list, in order to prevent a memory leak in the application.
Correctly handle existing metadata -> immediate
In general, the upload driver wants to avoid re-adding modified existing
metadata, because it doesn't want to create a new uploader for it and it
doesn't actually (for the most part) look at the contents of the
metadata file, delegating that to the uploader itself. So on the whole
it ignores existing metadata files on change. In this case, in
particular, it needs to pick up changes in the metadata file because it
should still perform the upload asap.

This change just forces the upload if there is an existing metadata file
that has been changed so that it requires immediate upload, it will
upload the indicated files. As a heads-up, if you do this (modify an
existing metadata file), it will still delete the metadata file after it
performs the upload, so if you want to keep that around you will need to
re-create the metadata file.

@ssalinas ssalinas added the hs_qa label Jan 23, 2017

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jan 31, 2017

Member

FYI @PtrTeixeira I merged #1391 ,should be able to resolve the merge conflicts on this one now

Member

ssalinas commented Jan 31, 2017

FYI @PtrTeixeira I merged #1391 ,should be able to resolve the merge conflicts on this one now

@ssalinas ssalinas added the hs_stable label Feb 6, 2017

@darcatron

This comment has been minimized.

Show comment
Hide comment
@darcatron

darcatron Feb 7, 2017

Contributor

🚢

Contributor

darcatron commented on 442da52 Feb 7, 2017

🚢

@darcatron

This comment has been minimized.

Show comment
Hide comment
@darcatron

darcatron Feb 7, 2017

Contributor

🚢

Contributor

darcatron commented on b750c7e Feb 7, 2017

🚢

PtrTeixeira and others added some commits Feb 8, 2017

Merge pull request #1424 from HubSpot/rename-metric
Rename metric for immediate uploaders
Prevent building duplicate immediate uploaders
There _appears_ to be a threading issue where if the file system poller
happens to run at the same time as an immediate uploaders is running, a
second immediate uploader can be created for the same metadata.
Typically this would be prevented by the metadata-to-uploader map, but
immediate uploaders are removed from or never added to the map, so it
was never checked if there was an existing uploader for a given metadata
file.

This polls the list of immediate uploaders to check their metadata, and
will refuse to create another uploader of there is an existing uploader
with the same metadata file.
Add seperate map to track immediate uploaders
Duplicates the same kind of structure that we are currently using to
keep track of the normal loaders with the immediate uploaders.  That is,
I added another map from metadata to immediate uploaders.  This is
checked before creating a new uploader and only cleared when the
existing uploader is definitely done and has been removed.  Hopefully
this should prevent the creation of duplicate immediate uploaders in a
more robust way than the prior attempt, which might not maintain the
block for the lifetime of the first immediate uploader.
Lock test for creating immediate uploader
The way that we are blocking attempts to create duplicate immediate
uploaders is by creating a list of metadata objects that we have
immediate uploaders for, and refusing to create a new uploader if that
list already contains the metadata file associated with the new
uploader. We need to perform the check-and-add operation atomically, to
prevent the creation of two duplicate immediate uploaders to be
interspersed in such a way that they are both created. In order to do
so, this moves the check-and-add operation into a locked block of code
in order to prevent that kind of interspersal.
Add clarifying information to logging
Log the metadata involved when attempting to get a lock on creating a
new immediate SingularityS3Uploader.
Merge pull request #1426 from HubSpot/prevent-duplicate-immediate-upl…
…oaders

Prevent building duplicate immediate uploaders

@ssalinas ssalinas merged commit 1a7b7c4 into master Feb 27, 2017

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@ssalinas ssalinas deleted the s3uploader-upload-immediately branch Feb 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment