Small amount of event not streamed without errors #37

benoittgt · 2017-02-14T13:37:10Z

Hello

In my pipeline : The data came with kinesis stream, I don't use Dynamodb mapping, only the default, firehose write data into an s3 file.

Yesterday two 'event' were not recorder into s3. I dig into Kinesis Stream and found my two event :

#<struct Aws::Kinesis::Types::Record
  sequence_number="49*3138",
  approximate_arrival_timestamp=2017-02-13 17:18:39 +0100,
  data="3444946|2017-02-13 16:18:39.323|***",
  partition_key="partitionkey">,
#<struct Aws::Kinesis::Types::Record
  sequence_number="49*28130",
  approximate_arrival_timestamp=2017-02-13 17:18:39 +0100,
  data="3444947|2017-02-13 16:18:39.364|***",
  partition_key="partitionkey">,

In s3 I have an s3 file with lot's of events. I have one before, and after but not this two :

3444945|2017-02-13 16:18:30.100|***
3444948|2017-02-13 16:18:41.832|**

I dig into cloudwatch to find errors in the lambda but without seeing any issues.

Where to look at to understand why records where not inserted into s3? 😞

The text was updated successfully, but these errors were encountered:

benoittgt · 2017-02-22T15:51:21Z

Closing for now because it seems related to Firehose itself.

benoittgt · 2017-02-23T13:18:49Z

Their is a critical issue with putRecordBatch. We don't handle the answer correctly...

As mention in the doc : http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Firehose.html#putRecordBatch-property

The PutRecordBatch response includes a count of failed records, FailedPutCount, and an array of responses, RequestResponses. Each entry in the RequestResponses array provides additional information about the processed record, and directly correlates with a record in the request array using the same ordering, from the top to the bottom. The response array always includes the same number of records as the request array. RequestResponses includes both successfully and unsuccessfully processed records. Firehose attempts to process all records in each PutRecordBatch request. A single record failure does not stop the processing of subsequent records.
A successfully processed record includes a RecordId value, which is unique for the record. An unsuccessfully processed record includes ErrorCode and ErrorMessage values. ErrorCode reflects the type of error, and is one of the following values: ServiceUnavailable or InternalFailure. ErrorMessage provides more detailed information about the error.

If there is an internal server error or a timeout, the write might have completed or it might have failed. If FailedPutCount is greater than 0, retry the request, resending only those records that might have failed processing. This minimizes the possible duplicate records and also reduces the total bytes sent (and corresponding charges). We recommend that you handle any duplicates at the destination.

If PutRecordBatch throws ServiceUnavailableException, back off and retry. If the exception persists, it is possible that the throughput limits have been exceeded for the delivery stream.

Data records sent to Firehose are stored for 24 hours from the time they are added to a delivery stream as it attempts to send the records to the destination. If the destination is unreachable for more than 24 hours, the data is no longer available.

That's happen to me every week on less than 10 of my records. We have to retry record with error code.

benoittgt · 2017-02-23T13:30:13Z

We can choose idea like :

https://github.com/mapzen/api-traffic-processors/blob/9d8e6883b6627ab565365cc365f70815cd534500/exporters/kinesisExporter.js#L33-L43 (doesn't manage only failed record, can produce duplicates)
https://github.com/Dan70402/node-firehose-write-stream/blob/722e958cb21ace42056502d8cd2e8187428f8fda/index.js#L74-L105 (put only failed records)
https://github.com/mirusresearch/firehoser/blob/abaf686ee0524cea7739d0c99c2857cbc8311fc1/firehoser.js#L108-L127 (same but use node 6)

benoittgt · 2017-02-23T13:53:22Z

Also from the AWS support :

Please note that PutRecordBatch is batch record calls and some records may succee,, some may fail and calls do not throw exceptions if individual records within the batch fails.
So the client code must examine the return result/response to determine if records have failed and retry them. Response/result return by this method will contain ErrorCode and ErrorMessage based on this you have to apply your retry logic in the code.

Also the sample code link ( https://github.com/awslabs/lambda-streams-to-firehose/blob/master/index.js#L491 ) you have mentioned earlier doesn't have all this logic implemented. You can get Exception or Error in-case of ServiceUnavailableException due to Throughput limit exceeded and in that case also you should implement a back-off jitter retry logic.

[+] More Details PutRecordBatch: http://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecordBatch.html
[+] Back-off jitter Retry Logic: http://docs.aws.amazon.com/general/latest/gr/api-retries.html

benoittgt · 2017-03-30T13:35:46Z

Fixed for me with #43 . Thanks a lot @DaichiUeura. Waiting the PR to be merge to close the issue.

benoittgt changed the title ~~Missing records but no errors.~~ Small amount of event to streamed without errors Feb 14, 2017

benoittgt changed the title ~~Small amount of event to streamed without errors~~ Small amount of event not streamed without errors Feb 20, 2017

benoittgt closed this as completed Feb 22, 2017

benoittgt mentioned this issue Feb 22, 2017

Duplicated records #33

Closed

benoittgt reopened this Feb 23, 2017

DaichiUeura mentioned this issue Mar 17, 2017

Take care of FailedPutCount in putRecordBatch resp #43

Merged

IanMeyers closed this as completed in #43 Apr 3, 2017

DaichiUeura mentioned this issue Jun 22, 2017

Firehose delivery missing? #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small amount of event not streamed without errors #37

Small amount of event not streamed without errors #37

benoittgt commented Feb 14, 2017

benoittgt commented Feb 22, 2017

benoittgt commented Feb 23, 2017

benoittgt commented Feb 23, 2017 •

edited

Loading

benoittgt commented Feb 23, 2017

benoittgt commented Mar 30, 2017

Small amount of event not streamed without errors #37

Small amount of event not streamed without errors #37

Comments

benoittgt commented Feb 14, 2017

benoittgt commented Feb 22, 2017

benoittgt commented Feb 23, 2017

benoittgt commented Feb 23, 2017 • edited Loading

benoittgt commented Feb 23, 2017

benoittgt commented Mar 30, 2017

benoittgt commented Feb 23, 2017 •

edited

Loading