New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very small number of S3 multipart upload objects got truncated #564
Comments
Thank you for bringing this to our attention. We will investigate this to see if we can reproduce or find a cause for this behavior. The uploader object has an event dispatcher that you can hook into to create your own "safety-net". 😄 Here is an example: // Configure your builder.
$uploader = UploadBuilder::newInstance()->build();
// Attach some event listeners to keep track of parts uploaded.
$dispatcher = $uploader->getEventDispatcher();
$numUploads = 0;
$dispatcher->addListener($uploader::BEFORE_PART_UPLOAD, function ($event) use (&$numUploads) {
$numUploads++;
});
$dispatcher->addListener($uploader::AFTER_UPLOAD, function ($event) use (&$numUploads) {
$countedParts = count($event['state']);
// Compare the number of parts recorded in the upload's state to the ones you counted.
if ($countedParts !== $numUploads) {
throw new \RuntimeException("Multipart upload is missing parts for completion. "
. "Found {$countedParts}, but expected {$numUploads}.");
}
});
// Trigger the upload.
$uploader->upload(); Let us know if you are able to provide any additional information. |
Thank you very much for the code snippet. Wow, this is more than I expected! I'll give it a try and share it with my customer. I'll also keep trying to reproduce the problem. I'll let you know if I find anything. |
Is there any information that you can share about the 7 files that you observed as truncated? What file size was uploaded and what was the expected size? Is there a common pattern regarding the sizes (e.g., they are all X size but were expected to be Y size)?
That's a very large number. I think you'll get better throughput by reducing this number significantly. At 300 concurrent requests, you're probably at 100% CPU and possibly saturating your network connection. |
Thanks for looking into this issue. Here is the information. All files (except 6) seem to miss the last part. I don't see any strong relationships between sizes.
Understood. I think the high concurrency number wouldn't be related to this issue because about 99% of files they have uploaded are smaller than 20MB. The actual concurrency number will be <= 4. |
Unfortunately, I couldn't reproduce the problem and no further information available to proceed the investigation. Let me close this issue for now because:
If the problem occurs again and I get more information, I'll reopen this issue. Thanks! |
Hi,
One of my customers is using your S3 client and has uploaded over 1 million objects using the high-level abstractions for multipart upload (
UploadBuilder::newInstance()->bulid(); $uploader->upload()
)However, they found very very small number of files on S3 are truncated. They found so far 7 objects, out of ~1 million uploads, are truncated. They all missing the last upload part, but
upload()
didn't throw any exception.I think all parts were uploaded without error, but by some reason, the last part was omitted by Complete Multipart Upload request. But they have no evidence to support it. I haven't reproduced the problem.
Here are some details:
setConcurrency(300)
and the default values for other parameters.I think this will be a race condition in Guzzle. But I posted this issue here because other users could have the same problem. I tried to debug the client, but I couldn't find anything that would explain the cause.
(I have no PHP experience.)
Also if we can't fix this now (because I can't provide steps to reproduce), will you add a "safety-net" to AWS SDK for PHP so my customer can detect when the problem happens again, and leave some information to help us to analyze the root cause?
For instance, before running Complete Multipart Upload, count the number of upload parts commands given to the client and compare it against the count of replies from the client. If they don't match, throw an exception with the information about the missing part.
Thanks,
Tatsuya
The text was updated successfully, but these errors were encountered: