process s3 event messages from SQS #215

cjxkb210 · 2020-08-07T13:40:13Z

Added ability to consume s3 event messages from an SQS queue. This
facilitates the option of having multiple consumers of s3event
notifications for the same prefix. You could for example subscribe an
SNS topic to the s3event for a given prefix and then have multiple sqs
queues subscribed to that topic. This expands usage options for
multiple varying consumers of the same s3 prefix event.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Added ability to consume s3 event messages from an SQS queue. This facilitates the option of having multiple consumers of s3event notifications for the same prefix. You could for example subscribe an SNS topic to the s3event for a given prefix and then have multiple sqs queues subscribed to that topic. This expands usage options for multiple varying consumers of the same s3 prefix event.

cjxkb210 · 2020-08-07T13:48:04Z

Hi,

We'd like to expose the option of having multiple consumers of s3 events for a given prefix. AWS only allows you to have one consumer of s3 events for a given prefix. This consumer can be a Lambda, SQS, or SNS topic.

If the RedshiftLambdaLoader could consume s3 events indirectly via an SQS queue this would open the option of using the SNS topic as the subscriber of s3 events. In this case we could subscribe multiple queues, for example if you wanted to load the data into Redshift via the RedshiftLambdaLoader but independently consumer from a separate queue with a process that ingests the data into an index.

I've created the attached PR for review. Please note that in my example I've also allowed the processing of multiple sqs messages in the same request. In my testing I enabled raw message delivery on an SNS topic, to which I subscribed an SQS queue that the RedshiftLambdaLoader could consume from. This minimised the impact of the changes.

Please note, I'm not fully convinced that processing multiple sqs mesages in the same request is safe with the batch locking logic but I believe it may handle this although perhaps it was not intended to do so?

Could you review and let me know your thoughts on exposing such a feature? Would this approach cause you any concerns?

Thanks,
Chris.

cjxkb210 · 2020-08-07T13:52:14Z

Hi @IanMeyers, would you mind reviewing my suggestion above?

Much appreciated,
Chris.

IanMeyers · 2020-08-07T15:59:23Z

Sorry have been on holiday. Will review monday the 10th.

cjxkb210 · 2020-08-07T16:11:24Z

Thank you, have a great weekend.

cjxkb210 · 2020-08-11T08:06:26Z

after testing, I confirmed my initial concerns that processing the sqs messages in a tight loop will cause problems.

Failing further processing to Batch 7a2afa1f-7dc5-45e8-b0ad-677ac59738d1 which may be stuck in 'locked' state. If so, unlock the back using node unlockBatch.js <batch ID>, delete the processed file marker with node processedFiles.js -d <filename>, and then re-store the file in S3"

I'm still digging into the cause, I think a Redshift Maintenance window might be in play. Perhaps the batch was left in a locked state because the lambda timed out/did not mark the batch as failed.

IanMeyers · 2020-08-11T16:29:32Z

Lambda timeouts can indeed result in locked batches.

cjxkb210 · 2020-08-12T09:15:41Z

yeah, I've seen logs that look like this
Remaining duration of -90597 insufficient to load cluster

It's possible I simply got unlucky and the function did not manage to fail the batch leaving it in a locked state.

I've had my changes running through via SQS loading batches for 12 hours now. The data I'm testing with gets placed onto s3 every 15mins.

cjxkb210 · 2020-08-12T10:36:09Z

I'll be testing this over a longer time period but this is ready for review now. I've increased my timeouts a little and added failed batch re-processing with SQS delayed delivery for the redshift maintenance windows.

cjxkb210 · 2020-08-12T13:14:10Z

interestingly I noticed this in the logs(I don't think is in anyway related to this PR)

2020-08-12T13:48:12.102+01:00
START RequestId: 1a5aa337-9f84-5f13-9486-f49a78bcbb46 Version: $LATEST

2020-08-12T13:48:12.115+01:00
info: Failing Batch 48e2e469-2fbd-4e98-99c6-9b028831a612 due to "Remaining duration of -89145 insufficient to load cluster"

2020-08-12T13:48:12.458+01:00
END RequestId: 1a5aa337-9f84-5f13-9486-f49a78bcbb46
REPORT RequestId: 1a5aa337-9f84-5f13-9486-f49a78bcbb46 Duration: 355.01 ms Billed Duration: 400 ms Memory Size: 256 MB Max Memory Used: 103 MB

I confirmed that the Remaining duration of -89145 insufficient to load cluster did pertain to the same request 1a5aa337-9f84-5f13-9486-f49a78bcbb46 by checking the batches table.

This means the context.getRemainingTimeInMillis() function returned -89145 13ms after request invocation when my lambda had been configured with a whopping 2min 30 second timeout. The number makes no sense and I'm not convinced getRemainingTimeInMillis is behaving nicely. I've upgraded my nodejs runtime from 10.x to 12.x in case there was an AWS defect. I could find a record of one, but I'll see if that eliminates the strange behavior around context.getRemainingTimeInMillis()

IanMeyers · 2020-08-12T14:59:40Z

hmm. I can't regress this. Also, if this consistently happened, then no one's loads would ever succeed. I think there's something else going on here, but don't see anything obvious in the code that could cause it - this is a single instance of the function being used, and while it accesses context via a parent reference, it shouldn't cause any consistency issues. Let me know what happens with 12.x?

cjxkb210 · 2020-08-12T15:22:56Z

yeah I've just seen the same issue with 12.x. Its infrequent, I've seen it 3 times in 48 batches.

IanMeyers · 2020-08-12T15:25:32Z

Can you please raise a support ticket?

cjxkb210 · 2020-08-12T15:28:18Z

yes, I'll raise a ticket for it, I'll spend a little more time this afternoon to see if I can spot anything that might explain it.

cjxkb210 · 2020-09-17T11:40:30Z

@IanMeyers this has gone through rounds of testing and is looking good. What are your thoughts of rolling this in?

Kind Regards, Chris.

IanMeyers · 2020-09-17T11:43:03Z

Thanks for the info. In general I'm supportive, but need to do a code review. I will book out some time tomorrow (18/9) to do so.

cjxkb210 · 2020-09-17T11:46:41Z

Thank you @IanMeyers

[REQ-2126] - consume the message for bad events, log and move on.

[REQ-2126] safely consume and ignore aws test events which despite them being test in nature do not honour raw message delivery settings.

IanMeyers self-assigned this Aug 7, 2020

cjxkb210 added 2 commits December 11, 2020 16:53

consume bad messages

18f3c5e

[REQ-2126] - consume the message for bad events, log and move on.

Update index.js

d0bd1af

[REQ-2126] safely consume and ignore aws test events which despite them being test in nature do not honour raw message delivery settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process s3 event messages from SQS #215

process s3 event messages from SQS #215

cjxkb210 commented Aug 7, 2020

cjxkb210 commented Aug 7, 2020 •

edited

cjxkb210 commented Aug 7, 2020

IanMeyers commented Aug 7, 2020

cjxkb210 commented Aug 7, 2020

cjxkb210 commented Aug 11, 2020 •

edited

IanMeyers commented Aug 11, 2020

cjxkb210 commented Aug 12, 2020

cjxkb210 commented Aug 12, 2020

cjxkb210 commented Aug 12, 2020 •

edited

IanMeyers commented Aug 12, 2020

cjxkb210 commented Aug 12, 2020

IanMeyers commented Aug 12, 2020

cjxkb210 commented Aug 12, 2020

cjxkb210 commented Sep 17, 2020

IanMeyers commented Sep 17, 2020

cjxkb210 commented Sep 17, 2020

process s3 event messages from SQS #215

Are you sure you want to change the base?

process s3 event messages from SQS #215

Conversation

cjxkb210 commented Aug 7, 2020

cjxkb210 commented Aug 7, 2020 • edited

cjxkb210 commented Aug 7, 2020

IanMeyers commented Aug 7, 2020

cjxkb210 commented Aug 7, 2020

cjxkb210 commented Aug 11, 2020 • edited

IanMeyers commented Aug 11, 2020

cjxkb210 commented Aug 12, 2020

cjxkb210 commented Aug 12, 2020

cjxkb210 commented Aug 12, 2020 • edited

IanMeyers commented Aug 12, 2020

cjxkb210 commented Aug 12, 2020

IanMeyers commented Aug 12, 2020

cjxkb210 commented Aug 12, 2020

cjxkb210 commented Sep 17, 2020

IanMeyers commented Sep 17, 2020

cjxkb210 commented Sep 17, 2020

cjxkb210 commented Aug 7, 2020 •

edited

cjxkb210 commented Aug 11, 2020 •

edited

cjxkb210 commented Aug 12, 2020 •

edited