Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ManagedUpload never finishes when stream size == part size #1382

Merged
merged 1 commit into from
Mar 3, 2017

Conversation

johanneswuerbach
Copy link
Contributor

@johanneswuerbach johanneswuerbach commented Mar 1, 2017

When uploading a stream with an unknown initial length and a total length
exactly equal to a multiple of the part size and a queueSize of 1, the upload callback is
never finished.

This is caused by only checking whether the upload finished after uploading
the last part, but not when the end event is received.

In our case the end event is received after the last part has been uploaded,
but there is also no more data to be uploaded, resulting in a hanging upload.

I would love to add a test, but couldn't figure out how to trigger receiving the end event after the stream was read, any hints?

@coveralls
Copy link

coveralls commented Mar 1, 2017

Coverage Status

Coverage decreased (-0.2%) to 91.473% when pulling b5351e0 on johanneswuerbach:fix-buf-size-eq-part-size into 975b315 on aws:master.

@johanneswuerbach johanneswuerbach force-pushed the fix-buf-size-eq-part-size branch 2 times, most recently from 022f16e to d585eed Compare March 2, 2017 14:09
@coveralls
Copy link

coveralls commented Mar 2, 2017

Coverage Status

Coverage decreased (-0.02%) to 91.619% when pulling d585eed on johanneswuerbach:fix-buf-size-eq-part-size into 3fc5ce6 on aws:master.

@coveralls
Copy link

coveralls commented Mar 2, 2017

Coverage Status

Coverage decreased (-0.02%) to 91.619% when pulling d585eed on johanneswuerbach:fix-buf-size-eq-part-size into 3fc5ce6 on aws:master.

Copy link
Contributor

@jeskew jeskew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change definitely needs a test, as it was not caught by the current test suite. Maybe something like the following?

        it 'can send a stream that is exactly equal to part size', (done) ->
          partSize = 5 * 1024 * 1024
          require('crypto').randomBytes partSize, (err, buf) ->
            throw err if err
            stream = AWS.util.buffer.toStream buf
            reqs = helpers.mockResponses [data: ETag: 'ETAG']
            upload = new AWS.S3.ManagedUpload({
              partSize: partSize
              params: { Body: stream }
            })
            upload.send ->
              expect(helpers.operationsForRequests(reqs)).to.eql ['s3.putObject']
              done()

        it 'can send a stream that is exactly divisible by part size', (done) ->
          partSize = 2 * 5 * 1024 * 1024
          require('crypto').randomBytes partSize, (err, buf) ->
            throw err if err
            stream = AWS.util.buffer.toStream buf
            reqs = helpers.mockResponses [
              { data: UploadId: 'uploadId' }
              { data: ETag: 'ETAG1' }
              { data: ETag: 'ETAG2' }
              { data: ETag: 'FINAL_ETAG', Location: 'FINAL_LOCATION' }
            ]
            upload = new AWS.S3.ManagedUpload({
              partSize: partSize
              params: { Body: stream }
            })
            upload.send ->
              expect(helpers.operationsForRequests(reqs)).to.eql [
                's3.createMultipartUpload'
                's3.uploadPart'
                's3.uploadPart'
                's3.completeMultipartUpload'
              ]
              done()

When uploading a stream with an unknown initial length and a total length
exactly equal to a multiple of the part size and a queueSize of 0, the
upload callback is never finished.

This is caused by only checking whether the upload finished after uploading
the last part, but not when the end event is received.

In our case the end event is received after the last part has been uploaded,
but there is also no more data to be uploaded, resulting in a hanging upload.
@johanneswuerbach
Copy link
Contributor Author

johanneswuerbach commented Mar 3, 2017

Thanks for the hint @jeskew. Tests added and confirmed that they were hanging before the fix and pass after it.

@coveralls
Copy link

coveralls commented Mar 3, 2017

Coverage Status

Coverage increased (+0.09%) to 91.729% when pulling b982ac8 on johanneswuerbach:fix-buf-size-eq-part-size into 3fc5ce6 on aws:master.

@jeskew jeskew merged commit 440ed1e into aws:master Mar 3, 2017
@johanneswuerbach johanneswuerbach deleted the fix-buf-size-eq-part-size branch March 4, 2017 01:12
@mramato
Copy link

mramato commented Mar 8, 2017

@jeskew, I apologize if I should create a new issue instead of commenting here, but I believe this change is causing my uploads to s3 to become corrupted. While I'm still debugging the problem, in my case sometimes (but not all of the time) self.finishMultiPart(); is being called too early and the upload completes "successfully" but before all parts are actually complete. The end result is that the file on s3 is truncated and missing data. If I roll back to a version of the sdk before this change, everything works 100% of the time (like it has for months).

My server code that performs the upload is really simple:

S3DataStore.prototype.uploadForProcessing = function (id, stream) {
    var size = 0;
    var uploadStream = new PassThrough();

    stream.on('data', function (data) {
        size += data.length;
        var canContinue = uploadStream.write(data);
        if (!canContinue) {
            stream.pause();
            uploadStream.once('drain', function () {
                stream.resume();
            });
        }
    });

    stream.on('error', function (error) {
        uploadStream.emit('error', error);
    });

    stream.on('end', function () {
        uploadStream.end();
    });

    return s3.upload({
        Bucket: this._bucket,
        Key: this._prefix + id,
        Body: uploadStream
    }).promise().then(function () {
        return {
            path: id,
            size: size
        };
    });
};

Should I create a new issue with the above information? I can try and open a PR if I can figure out the problem in more detail, but this is a major regression for me in the meantime.

@johanneswuerbach
Copy link
Contributor Author

I also just hit the problem and are working on a fix as we speak.

@mramato
Copy link

mramato commented Mar 8, 2017

Awesome, thanks!

@johanneswuerbach
Copy link
Contributor Author

#1395

@lock
Copy link

lock bot commented Sep 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants