Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Lambda Extension logs shutdown #8063

Merged
merged 6 commits into from
May 6, 2021

Conversation

nhinsch
Copy link
Contributor

@nhinsch nhinsch commented May 3, 2021

What does this PR do?

There is a race condition that causes panics and prevents us from receiving logs when shutting down the Lambda Extension. When the extension receives a shutdown event from the API, it stops the logs agent. However, the extension may continue to receive log messages from the logs API after receiving the shutdown event. Currently, when these messages are received, we try to send them to the logs agent. Since the logs agent has been stopped, this results in a panic.

To fix this issue:

  • Consolidate the shutdown logic in a new Daemon.Stop() function.
  • Add a 1-second delay after we receive the shutdown signal before shutting down. This should allow for final log messages to be delivered from the logs API.
  • After the 1-second delay, shut down the HTTP server first before shutting down the agents. This will prevent us from sending messages to the agents after they have already shut down.
  • Do some refactoring so that we also call Daemon.Stop() after receiving a SIGINT signal. This shouldn't really happen in practice because during a shutdown, we receive a shutdown message over the Lambda API, and then SIGKILL two seconds later.

Motivation

The current behavior causes panics and lost log messages.

Describe how to test your changes

I have been manually testing these changes with functions running periodically in the sandbox account that frequently encounter the race condition. I used debug log messages to confirm the behavior is as expected.

@nhinsch nhinsch added this to the 7.29.0 milestone May 3, 2021
@nhinsch nhinsch requested a review from a team as a code owner May 3, 2021 17:22
pkg/serverless/protocol.go Outdated Show resolved Hide resolved
cmd/serverless/main.go Outdated Show resolved Hide resolved
@nhinsch
Copy link
Contributor Author

nhinsch commented May 3, 2021

@DarcyRaynerDD I have refactored this and incorporated your suggestions. I updated the initial PR comment to reflect the new approach. Let me know your thoughts.

@nhinsch nhinsch changed the base branch from master to release/lambda-extension-v8 May 3, 2021 21:41
Copy link
Contributor

@remeh remeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment on the usage of the lock in the shutdown which I think needs one more commit, but except from that, solid!

pkg/serverless/protocol.go Show resolved Hide resolved
pkg/serverless/protocol.go Show resolved Hide resolved
Copy link
Contributor

@remeh remeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@nhinsch nhinsch merged commit c8a3eb8 into release/lambda-extension-v8 May 6, 2021
@nhinsch nhinsch deleted the ngh/closed-ch-fix-2 branch May 6, 2021 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants