Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self reflexive lambda function execution by sqs queue trigger stops after 16 (sixteen) calls #3899

Closed
hannehomuth opened this issue Oct 16, 2023 · 5 comments
Assignees
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made. service-api This issue is caused by the service API, not the SDK implementation. sqs

Comments

@hannehomuth
Copy link

hannehomuth commented Oct 16, 2023

Describe the bug

Short description:

When an SQS-Queue acts as a trigger for an lambda function, and the lambda function itself sends messages to the same sqs queue, the lambda function will not be triggered after the sixteens (16th) message with the same messagebody.

Use-Case where it was detected

Our use case is that we've a lambda function that is started by an sqs delay queue message. Within the message there is an ID, which we check against a dynamodb. If the dynamodb query returns that the entry with the ID from the sqs event is in state "OK", we do some work an quit the lambda function. If the dynamodb query returns that the entry with provided ID is not in state "OK", we send a new message with the same ID in the message body to the same sqs queue. So basically we calling us in a self reflexive way.

Environment

We've detected that issue within a lambda running in Java 8 on SDK 1.12.566.
But the behavior can be reproduced also within the python runtime, so i choosed python for the bug ticket.

Expected Behavior

Not matter how often we send a message to the sqs queue, the lambda function should be triggered.

Current Behavior

After sixteen (16) messages, the lambda function will not be triggered anymore. We can see one message "in flight" in the queue overview, but no lambda processing it.

Reproduction Steps

  • Create a SQS Queue

    • Type: FIFO
    • Name: sqsbugreport.fifo
    • Visibility timeout: 12 hours
    • Delivery delays: 1 second
    • Receive message wait time: 0 seconds
    • Message retention period: 4 days
    • Maximum message size: 256kb
    • Content based duplication: off
    • High throughput FIFO queue: off
    • Encryption: disabled
    • Access policy basic (leave defaults)
    • Redrive policy: disabled
    • Dead letter queue: none
  • Create Lambda function

    • Use a blueprint (Hello World function python 10)
    • FunctionName: sqsbugreport-function
    • Execution role: Create a new role with basic Lambda permissions
    • Click "Create function"
    • Change Lambda function code to the code attached
      • (Please change the queue_url parameter to your environment)
    • Change permission of lambda function (Configuration -> Permisssions -> Edit)
      • Scroll all the way down to "Existing role" and click on "View the role" on IAM
      • In the IAM window, click on the policy name.
      • In the opening Policy window, click "Edit"
        • Click "Add new statement"
          • Service: SQS
          • Actions: all Actions
          • Resource: All resources
        • Click "Next"
        • Click "Save changes"
    • Add trigger to lambda function
      • Click "Add trigger"
      • Select Source: SQS
      • Select SQS Queue created earlier
      • Batch size: 1
      • Leave other settings on default
      • Click "Add" to create the trigger
  • Send initial message to SQS queue

    • Goto sqs queue created earlier
    • Click "send and receive messages"
      • Message Body: Something
      • Message Group ID: Test
      • Message deduplication ID:
      • Click "Send message"

Lambda Code

import json
import boto3
import uuid


print('Loading function')


def lambda_handler(event, context):
    
    sqs = boto3.client('sqs')
    queue_url = 'https://sqs.eu-central-1.amazonaws.com/<account-id>/sqs-error-test.fifo'
    
    response = sqs.send_message(
        QueueUrl=queue_url,
            MessageBody=(
            'Something'
        ),
        MessageDeduplicationId=str(uuid.uuid4()),
        MessageGroupId='TEST'
    )
    
    
    print(response['MessageId'])
    return 'OK'

Possible Solution

No response

Additional Information/Context

Lambda function overview

image

Start execution

image

FIFO-Queue after execution

image

  • As you can see here, there is one "Message in flight" but no lambda function is triggered.

Log of execution

log-results.csv

  • As you can see in the logs, the lambda function gets called 16 times, than it stopped, even the sqs message was also send during the 16th attempt.

SDK version used

1.12.566

Environment details (OS name and version, etc.)

Java Runtime, Python runtime

@hannehomuth hannehomuth added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Oct 16, 2023
@hannehomuth hannehomuth reopened this Oct 16, 2023
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@hannehomuth
Copy link
Author

It also happens on standard sqs queues (no FIFO)

@tim-finnigan tim-finnigan self-assigned this Oct 17, 2023
@tim-finnigan
Copy link
Contributor

Hi @hannehomuth thanks for reaching out. I saw a recent announcement (https://aws.amazon.com/about-aws/whats-new/2023/07/aws-lambda-detects-recursive-loops-lambda-functions/) noting that Lambda will now stop recursive invocations between Amazon SQS, AWS Lambda, and Amazon SNS after 16 recursive calls. Therefore this appears to be the expected behavior.

The announcement also notes:

If a function is invoked by the same triggering event more than 16 times, Lambda will stop the next invocation and sends the event to a Dead-Letter Queue or on-failure destination, if configured.

So you could configure a dead-letter queue for this scenario. Please let us know if you have any follow up questions, otherwise I will set this issue to auto-close.

@tim-finnigan tim-finnigan added closing-soon This issue will automatically close in 4 days unless further comments are made. service-api This issue is caused by the service API, not the SDK implementation. sqs and removed bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Oct 17, 2023
@hannehomuth
Copy link
Author

Hi @tim-finnigan

yes, you're totally right. I've searched any docu in aws i had found, but it did not see any hint to this announcement.
Thank you!

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made. service-api This issue is caused by the service API, not the SDK implementation. sqs
Projects
None yet
Development

No branches or pull requests

2 participants