Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small amount of event not streamed without errors #37

Closed
benoittgt opened this issue Feb 14, 2017 · 5 comments
Closed

Small amount of event not streamed without errors #37

benoittgt opened this issue Feb 14, 2017 · 5 comments

Comments

@benoittgt
Copy link
Contributor

Hello

In my pipeline : The data came with kinesis stream, I don't use Dynamodb mapping, only the default, firehose write data into an s3 file.

Yesterday two 'event' were not recorder into s3. I dig into Kinesis Stream and found my two event :

#<struct Aws::Kinesis::Types::Record
  sequence_number="49*3138",
  approximate_arrival_timestamp=2017-02-13 17:18:39 +0100,
  data="3444946|2017-02-13 16:18:39.323|***",
  partition_key="partitionkey">,
#<struct Aws::Kinesis::Types::Record
  sequence_number="49*28130",
  approximate_arrival_timestamp=2017-02-13 17:18:39 +0100,
  data="3444947|2017-02-13 16:18:39.364|***",
  partition_key="partitionkey">,

In s3 I have an s3 file with lot's of events. I have one before, and after but not this two :

3444945|2017-02-13 16:18:30.100|***
3444948|2017-02-13 16:18:41.832|**

I dig into cloudwatch to find errors in the lambda but without seeing any issues.

Where to look at to understand why records where not inserted into s3? 😞

@benoittgt benoittgt changed the title Missing records but no errors. Small amount of event to streamed without errors Feb 14, 2017
@benoittgt benoittgt changed the title Small amount of event to streamed without errors Small amount of event not streamed without errors Feb 20, 2017
@benoittgt
Copy link
Contributor Author

Closing for now because it seems related to Firehose itself.

@benoittgt
Copy link
Contributor Author

Their is a critical issue with putRecordBatch. We don't handle the answer correctly...

As mention in the doc : http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Firehose.html#putRecordBatch-property

The PutRecordBatch response includes a count of failed records, FailedPutCount, and an array of responses, RequestResponses. Each entry in the RequestResponses array provides additional information about the processed record, and directly correlates with a record in the request array using the same ordering, from the top to the bottom. The response array always includes the same number of records as the request array. RequestResponses includes both successfully and unsuccessfully processed records. Firehose attempts to process all records in each PutRecordBatch request. A single record failure does not stop the processing of subsequent records.
A successfully processed record includes a RecordId value, which is unique for the record. An unsuccessfully processed record includes ErrorCode and ErrorMessage values. ErrorCode reflects the type of error, and is one of the following values: ServiceUnavailable or InternalFailure. ErrorMessage provides more detailed information about the error.

If there is an internal server error or a timeout, the write might have completed or it might have failed. If FailedPutCount is greater than 0, retry the request, resending only those records that might have failed processing. This minimizes the possible duplicate records and also reduces the total bytes sent (and corresponding charges). We recommend that you handle any duplicates at the destination.

If PutRecordBatch throws ServiceUnavailableException, back off and retry. If the exception persists, it is possible that the throughput limits have been exceeded for the delivery stream.

Data records sent to Firehose are stored for 24 hours from the time they are added to a delivery stream as it attempts to send the records to the destination. If the destination is unreachable for more than 24 hours, the data is no longer available.

That's happen to me every week on less than 10 of my records. We have to retry record with error code.

@benoittgt benoittgt reopened this Feb 23, 2017
@benoittgt
Copy link
Contributor Author

Also from the AWS support :

Please note that PutRecordBatch is batch record calls and some records may succee,, some may fail and calls do not throw exceptions if individual records within the batch fails.
So the client code must examine the return result/response to determine if records have failed and retry them. Response/result return by this method will contain ErrorCode and ErrorMessage based on this you have to apply your retry logic in the code.

Also the sample code link ( https://github.com/awslabs/lambda-streams-to-firehose/blob/master/index.js#L491 ) you have mentioned earlier doesn't have all this logic implemented. You can get Exception or Error in-case of ServiceUnavailableException due to Throughput limit exceeded and in that case also you should implement a back-off jitter retry logic.

[+] More Details PutRecordBatch: http://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecordBatch.html
[+] Back-off jitter Retry Logic: http://docs.aws.amazon.com/general/latest/gr/api-retries.html

@benoittgt
Copy link
Contributor Author

Fixed for me with #43 . Thanks a lot @DaichiUeura. Waiting the PR to be merge to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant