Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Add ability to upload files immediately to S3 #1399
Change has two parts: Adds an optional uploadImmediately field to the uploader
This change in particular involved moving the vast majority of the code in
Change has two parts: Adds an optional uploadImmediately field to the uploader metadata, that when present and set will cause the uploader to upload it immediately on change, and code within the on-detection function `handleNewOrModifiedS3Metadata` to perform the upload if that field is detected. This change in particular involved refactoring the vast majority of the code in `checkUploads` into a seperate function. I wanted to avoid duplicating the logic in `checkUploads` in `handleNewOrModifiedS3Metadata`. If I directly called `checkUploads`, however, that would upload all the files that were queued to upload, rather than just the files marked to upload immediately. So I made a new function that would just take a map of files to upload and upload only those files. So when it needed to upload the all queued files, as in `checkUploads`, it could directly work on the `metadataToUploader` field, and when it only needs to upload a single block of files, it can only be passed that single uploader/metadata.
Commenting here to document a chat we had in-person. Decided to go a slightly different route and have the immediate uploads submit their own future to the executorService, rather than trying to hook something immediate into the runs of the scheduledExecutorService. @PtrTeixeira is making updates and we'll review again once those are up
The way that I had written the code for this, the main thread, which was used to poll the filesystem for metadata changes, was being blocked by awaiting the files to finish uploading. This moves the code that depends on the the uploads finishing (namely, logging that it finished and that closing the uploaders) into callbacks off of the futures. This change also no longer places any uploaders into the class-scoped expiredUpload queue.
@PtrTeixeira looks good overall. For ease of debugging when you try this out, I'd add some info or debug logging near the beginning of
In general, the upload driver wants to avoid re-adding modified existing metadata, because it doesn't want to create a new uploader for it and it doesn't actually (for the most part) look at the contents of the metadata file, delegating that to the uploader itself. So on the whole it ignores existing metadata files on change. In this case, in particular, it needs to pick up changes in the metadata file because it should still perform the upload asap. This change just forces the upload if there is an existing metadata file that has been changed so that it requires immediate upload, it will upload the indicated files. As a heads-up, if you do this (modify an existing metadata file), it will still delete the metadata file after it performs the upload, so if you want to keep that around you will need to re-create the metadata file.
There _appears_ to be a threading issue where if the file system poller happens to run at the same time as an immediate uploaders is running, a second immediate uploader can be created for the same metadata. Typically this would be prevented by the metadata-to-uploader map, but immediate uploaders are removed from or never added to the map, so it was never checked if there was an existing uploader for a given metadata file. This polls the list of immediate uploaders to check their metadata, and will refuse to create another uploader of there is an existing uploader with the same metadata file.
Duplicates the same kind of structure that we are currently using to keep track of the normal loaders with the immediate uploaders. That is, I added another map from metadata to immediate uploaders. This is checked before creating a new uploader and only cleared when the existing uploader is definitely done and has been removed. Hopefully this should prevent the creation of duplicate immediate uploaders in a more robust way than the prior attempt, which might not maintain the block for the lifetime of the first immediate uploader.
The way that we are blocking attempts to create duplicate immediate uploaders is by creating a list of metadata objects that we have immediate uploaders for, and refusing to create a new uploader if that list already contains the metadata file associated with the new uploader. We need to perform the check-and-add operation atomically, to prevent the creation of two duplicate immediate uploaders to be interspersed in such a way that they are both created. In order to do so, this moves the check-and-add operation into a locked block of code in order to prevent that kind of interspersal.