Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceeding 4MB limit #49

Open
jonchase opened this issue Jun 23, 2017 · 7 comments
Open

Exceeding 4MB limit #49

jonchase opened this issue Jun 23, 2017 · 7 comments
Assignees

Comments

@jonchase
Copy link

We're running 1.4.5.

We're commonly getting errors for exceeding the 4MB write limit to Firehose. This causes the lambda to retry indefinitely on the same data, causing processing on the Kinesis shard to backup until the data ages out.

We solved this by editing constants.js:

// firehose max PutRecordBatch size 4MB
FIREHOSE_MAX_BATCH_BYTES = 4 * 1024 * 1024;

Is now:

// firehose max PutRecordBatch size 2MB
FIREHOSE_MAX_BATCH_BYTES = 2 * 1024 * 1024;

And for good measure we also did the following:

FIREHOSE_MAX_BATCH_COUNT = 500;

Is now:

FIREHOSE_MAX_BATCH_COUNT = 250;

The above changes immediately solved our issue. Looking through the code, I'm not sure how a batch of > 4MB is able to get through, but it appears that was the case for us.

@benoittgt
Copy link
Contributor

Thanks for the pick. Do you think it's relevant to make that change by default in the lambda?

@jonchase
Copy link
Author

Changing the value from 4->2MB resolved our issue. I'm not sure lowering the value is the right way to go about fixing it though - it seems like somehow conditions were allowing a >4MB payload to be submitted, which sounds like the root cause to identify and fix.

@IanMeyers IanMeyers self-assigned this Aug 2, 2017
@ysamlan
Copy link

ysamlan commented Feb 4, 2019

I stumbled on this in some google spelunking - the AWS docs are a bit vague on this but while the 1MB per-record limit on Firehose is on the raw record data before encoding, I think the 4MB total request size limit might be on the full (encoded) payload. The SDK base64-encodes and JSON-wraps the data, so you probably need to pad this limit out by 33% for the base64, plus whatever the JSON around it works out to - someone at Amazon would have to confirm that though.

@IanMeyers
Copy link
Contributor

IanMeyers commented Feb 4, 2019

Looking - are you using any transformers in your installation?

@IanMeyers
Copy link
Contributor

I expect the issue is that the request limit is actually 4MiB, not 4MB as configured in the system. 1.5.2 and 374158a addresses this.

@ysamlan
Copy link

ysamlan commented Feb 6, 2019

Yup - I asked AWS support and they actually said the base64/JSON overhead shouldn't count. But that being said, they're a bit fuzzy on MiB vs MB depending which part of the docs you look at, so probably 4 * 1000 * 1000 is a safer bet than 4 * 1024 * 1024.

mbaran90 added a commit to mbaran90/lambda-streams-to-firehose that referenced this issue Jun 12, 2020
* Made the change to calculated FIREHOSE_MAX_BATCH_BYTES based on actual next next size instead of current record size.
* Current code will work only if we records are with constant size. other wise will get "Exceeding 4MB limit" issue. awslabs#49
@mbaran90
Copy link
Contributor

Noticed that it calculates FIREHOSE_MAX_BATCH_BYTES based on current record size instead of actual next next size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants