Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion parser->current_buffer_.IsEmpty() failed in Node when using Upload in @aws-sdk/lib-storage in cloud environment #2843

Open
3 tasks done
mingchuno opened this issue Sep 23, 2021 · 15 comments
Labels
bug This issue is a bug. p3 This is a minor priority issue

Comments

@mingchuno
Copy link

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug
This is actually an issue in nodejs/node#39671

When using @aws-sdk/lib-storage to perform multipart upload to S3. NodeJs throw the following exception:

node[1]: ../src/node_http_parser.cc:510:static void node::{anonymous}::Parser::Execute(const v8::FunctionCallbackInfo<v8::Value>&): Assertion `parser->current_buffer_.IsEmpty()' failed.

This is from node/http module. Interestingly, this only happen on cloud env but not local dev env work fine. (Further investigation is needed to verify). It run fine almost every time in my local machine (Macbook Pro 16" Intel running node v14.17.5) with only one time I have seen the exception. But in the production env running EKS over EC2, it fail consistancly at the same point no matter the data size.

This MAY NOT be an issue of @aws-sdk/lib-storage but it seems to be only happen when using aws sdk together in cloud env.

I have tried changing my application logic to only perform 1 upload at a time and it did not crash even in cloud env (AWS EKS running on EC2). So I suspect it is a concurrent issue of some sort in either aws-sdk or nodejs level. It would be great if we (user, aws-sdk, nodejs) can work together and find a re-producer first.

Is the issue in the browser/Node.js?
Node.js and possible only happens on cloud (AWS?) ENV only

If on Node.js, are you running this on AWS Lambda?
No

Details of the browser/Node.js version

  • machine type: m5.xlarge
  • container base image tag: node:14.17.5-alpine3.14 and 16.9.1-alpine3.14 have been tried (same node version as my local but use alpine instead. I am not sure does it matter)
  • resources on k8s pod
resources:
  limits:
    cpu: 1000m
    memory: 1024Mi
  requests:
    cpu: 1000m
    memory: 1024Mi

SDK version number

@aws-sdk/lib-storage@3.33.0 for me and I am not sure others

To Reproduce (observed behavior)

No reproducer at this moment. Looking for someone to reproduce it.

Expected behavior

No exception thrown

Screenshots

No

Additional context

Code snippet for reference

import { Upload } from '@aws-sdk/lib-storage'
import { PassThrough } from 'stream'

const multipartUpload = new Upload({
  client: s3Client,
  params: {
    Bucket: 'some-s3-bucket',
    Key: 'path/to/some.json',
    Body: inputStream.pipe(new PassThrough()),
    ContentType: 'application/octet-stream',
  },
})
await multipartUpload.done()
@ajredniwja ajredniwja transferred this issue from aws/aws-sdk-js Sep 28, 2021
@ajredniwja ajredniwja added guidance General information and guidance, answers to FAQs, or recommended best practices/resources. needs-triage This issue or PR still needs to be triaged. labels Sep 28, 2021
@rcausey
Copy link

rcausey commented Sep 29, 2021

We've also encountered this crash on AWS Lambda, and even when queueSize is set to 1. We've found it's much easier to reproduce with larger values of queueSize though.

@mingchuno
Copy link
Author

@rcausey Can you provide more details on your use case? You use the lambda to upload multipart file to S3 too? What do queueSize mean? You are connecting SQS / AWS MQ to lambda?

@rcausey
Copy link

rcausey commented Sep 29, 2021

@rcausey Can you provide more details on your use case? You use the lambda to upload multipart file to S3 too? What do queueSize mean? You are connecting SQS / AWS MQ to lambda?

Yes, we use @aws-sdk/lib-storage to do multipart uploads to S3 on Lambda.

queueSize is a parameter that controls concurrency in @aws-sdk/lib-storage uploads; i.e.

const parallelUploads3 = new Upload({
  client: new S3({}) || new S3Client({}),
  tags: [...], // optional tags
  queueSize: 4, // optional concurrency configuration
  partSize: 5MB, // optional size of each part
  leavePartsOnError: false, // optional manually handle dropped parts
  params: target,
});

We first ran into this issue when we switched from v2 to the v3 SDK. We couldn't reproduce it when we set queueSize to 1, so this is how we "fixed" it, since this was sufficient for our needs. But we recently encountered the issue in our production environment even with queueSize set to 1, so it seems like it can still happen (albeit much less frequently) without concurrent uploads.

@mingchuno
Copy link
Author

mingchuno commented Sep 30, 2021

@rcausey Can we isolate a simple reproducer in cloud env that can (with high confident) reproduce the issue?

@vudh1 vudh1 self-assigned this Nov 8, 2021
@evgeny-roubinchtein
Copy link

We are seeing this also. In our case, we have a Lambda handling a message from an SQS queue, and sometimes the handler dies with the error the OP described. I am pasting a stack trace from the CloudWatch log, but it looks like the node binary you are running our code under is stripped (not that it shouldn't have been, but it doesn't help debug the issue).

How can we assist you in getting to the bottom of this? For example, can you guide us in building a custom Node.js runtime in which the node binary isn't stripped but which is otherwise as close as we can get to the run-time you are using? It might be helpful to have a better stack trace, I am guessing...

Here's the stack, such as it is from a CloudWatch log for a run of Lambda where this issue hit:

/var/lang/bin/node[8]: ../src/node_http_parser.cc:510:static void node::{anonymous}::Parser::Execute(const v8::FunctionCallbackInfo<v8::Value>&): Assertion `parser->current_buffer_.IsEmpty()' failed.
1: 0x55913ada7673 node::Abort() [/var/lang/bin/node]
2: 0x55913ada7702  [/var/lang/bin/node]
3: 0x55913adc4750  [/var/lang/bin/node]
4: 0x55913afe3619  [/var/lang/bin/node]
5: 0x55913afe53a1 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/var/lang/bin/node]
6: 0x55913b91a7d9  [/var/lang/bin/node]
[...]
RequestId: <id_elided> Error: Runtime exited with error: signal: aborted (core dumped)
Runtime.ExitError

@upMKuhn
Copy link

upMKuhn commented Jan 12, 2022

I encountered the error as well. Found a good hint that it can be avoided, by setting the chunk size. It said the error happens when its exactly 5MB per upload.

@upMKuhn
Copy link

upMKuhn commented Jan 12, 2022

@alexzuo2013
Copy link

alexzuo2013 commented Jan 12, 2022 via email

@upMKuhn
Copy link

upMKuhn commented Jan 12, 2022

I think the assert is triggered when it's sending a Multipart Upload with at least 2 chunks and the body happens to be empty for some unknown reason. The scenario below is reproducing the error all the time.
Weirdly enough, if I change the chunk size to 5.5 * 1024 *1024 it has no error, but the log message indicates it is uploading the file twice? The file to upload is 7mb and the progress.loaded is 14mb after it finished

const file = await fs.open("myFile.json", 'r'); // file is 7mb
    const stream = file.createReadStream({ encoding: null }); // get byte stream
    await this.waitForReadable(stream);

await new Upload({
      client: this.s3Client,
      leavePartsOnError: false,
      tags: args.tags,
      params: {
        Key: this.fullPath(args.fileName),
        Bucket: this.bucketName,
        Body: stream,
      },
    }).done();
/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node[76192]: ../src/node_http_parser.cc:517:static void node::(anonymous namespace)::Parser::Execute(const FunctionCallbackInfo<v8::Value> &): Assertion `parser->current_buffer_.IsEmpty()' failed.
 1: 0x107274815 node::Abort() (.cold.1) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 2: 0x105f73aa9 node::Abort() [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 3: 0x105f738e1 node::Assert(node::AssertionInfo const&) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 4: 0x105f901ff node::(anonymous namespace)::Parser::Execute(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 5: 0x10615d239 v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 6: 0x10615cd06 v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 7: 0x10615c47f v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
 8: 0x1069cd399 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit [/Users/martinkuhn/.nvm/versions/node/v16.13.1/bin/node]
^CWaiting for the debugger to disconnect...
Waiting for the debugger to disconnect...

@upMKuhn
Copy link

upMKuhn commented Jan 12, 2022

My bad the error still happens, it still seems to be random :O What I noticed while debugging the library is that an empty PUT request is sent when the stream ended. Although it should never be triggered? Will debug tonight a bit more and double check.

https://github.com/aws/aws-sdk-js-v3/blob/main/lib/lib-storage/src/Upload.ts line 136

        // Use put instead of multi-part for one chunk uploads.
        if (dataPart.partNumber === 1 && dataPart.lastPart) {
          return await this.__uploadUsingPut(dataPart);
        }

@upMKuhn
Copy link

upMKuhn commented Jan 12, 2022

Maybe its releated. I have called the done function twice when doing ´new Upload().done().done()´ This caused an empty stream being sent. Thus deleting the uploaded file again and perhaps a potential trigger for the Assert error.

@upMKuhn
Copy link

upMKuhn commented Jan 13, 2022

Yep, I haven't encountered any error since I made sure .done() is never called twice for the same upload object.

@vudh1 vudh1 added bug This issue is a bug. and removed guidance General information and guidance, answers to FAQs, or recommended best practices/resources. needs-triage This issue or PR still needs to be triaged. labels Feb 21, 2022
@vudh1 vudh1 removed their assignment Feb 22, 2022
@libre-man
Copy link

I'm still encountering this issue. We've worked around it for now by creating a custom HttpHandler that uses undici. Would AWS be interested in such a handler to workaround this issue, or is there still active research going into why this issue is happening?

@vsosevic
Copy link

I could still reproduce this bug using '@aws-sdk/lib-storage' library, but this was not being reproduced using AWS-SDK of 2 version.
Steps to reproduce - create ~1500 of 0-length files.
In that case it would crash randomly - on 50th file, 250th file, 853d file, etc.
So this bug is not stable.

If you put some data to each file - 8KB-5MB - this bug isn't reproducible.

Surprisingly enough it's working pretty stable without any bugs with AWS-SDK 2 ver.

@pmiszczak-cksource
Copy link

NodeJS v18.6.0 has a fix for nodejs/node#39671 (nodejs/node@491c7619c4)

@RanVaknin RanVaknin added the p3 This is a minor priority issue label Feb 9, 2023
georgeblahblah added a commit to guardian/frontend that referenced this issue Apr 28, 2023
A bug in Node 16 means that `@guardian/actions-riff-raff` can fail to upload assets to S3. See aws/aws-sdk-js-v3#2843 for more information
georgeblahblah added a commit to guardian/frontend that referenced this issue Jul 21, 2023
A bug in Node 16 means that `@guardian/actions-riff-raff` can fail to upload assets to S3. See aws/aws-sdk-js-v3#2843 for more information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests