Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add crash and server restart alert #2849

Closed
finestructure opened this issue Jan 18, 2024 · 6 comments
Closed

Add crash and server restart alert #2849

finestructure opened this issue Jan 18, 2024 · 6 comments
Assignees

Comments

@finestructure
Copy link
Member

We should add an alert for server restarts

[ NOTICE ] Server starting on http://0.0.0.0:80 [component: server]

We've seen crashes today that led to swift-backtrace locking up the nodes for a while:

CleanShot_2024-01-18_at_17 55 452x

We don't have good visibility when restarts happen - it's a testament really to how stable the processes have been, we've never really had issues with crashes in the server process.

@finestructure finestructure self-assigned this Jan 18, 2024
@finestructure
Copy link
Member Author

I don't think we want swift-backtrace running by default in prod with those performance characteristics. On staging, yes, but in prod I'd want to enable it explicitly as a last resort to track down issues. We ended up shooting down nodes left, right, and centre due to it taking so long to wrap up.

@finestructure
Copy link
Member Author

https://github.com/apple/swift/blob/main/docs/Backtracing.rst

SWIFT_BACKTRACE="enable=no" I think.

@finestructure
Copy link
Member Author

finestructure commented Jan 20, 2024

We had another instance this morning where a node was completely locked up for hours, running swift-backtrace at 100% CPU:

CleanShot 2024-01-20 at 10 49 25@2x

What's also concerning is that we're not seeing any traces anymore in our logs even when backtrace doesn't lock up. We used to get unsymbolicated ones at least but now they're gone.

@finestructure finestructure changed the title Add server restart alert Add crash and server restart alert Jan 22, 2024
@finestructure
Copy link
Member Author

Also look for Backtrace took as a crash alert.

@tayloraswift
Copy link

We had another instance this morning where a node was completely locked up for hours, running swift-backtrace at 100% CPU:

i have seen something sporadically similar to this ever since upgrading to swift 5.9. did you see swift-backtrace in top while this was occurring? as far as i understand, swift-backtrace should not take more than a minute to complete.

@finestructure
Copy link
Member Author

Yes, exactly - like in the screenshot at the very top. My understanding is that the hang was perhaps due a misconfiguration. We're statically linking but hadn't bundled up and reference the backtrace binary. But it's possible that this happens regardless, it's hard to tell at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants