Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downtime Post-Mortem - 8/24/2012 - 9:40AM PDT - 11:15AM PDT #41

Closed
mahmoudimus opened this issue Aug 25, 2012 · 1 comment
Closed

Downtime Post-Mortem - 8/24/2012 - 9:40AM PDT - 11:15AM PDT #41

mahmoudimus opened this issue Aug 25, 2012 · 1 comment
Assignees

Comments

@mahmoudimus
Copy link
Contributor

At approximately 9:40AM PDT, Balanced's centralized logging machines experienced slowness as their free disk space started approaching dangerous levels. We are currently investigating why our alerting systems did not notify us at a lower threshold. Balanced's servers were operating normally at starting 11:15AM PDT.

Internally, we centralize our logs through the industry standard rsyslog, which is supposed to spool logs locally and asynchronously deliver them to our centralized logging server. However, a misconfiguration in our rsyslog configuration caused log messages to synchronously deliver instead of asynchronously, consequently impacting the responsiveness of our API applications.

Yesterday, we added two new machines to help with the current load we've been experiencing on Balanced, but we did not anticipate the surge in logs generated.

We're going to write sanity checks for our rsyslog configurations and install a redundant alerting system to allow us to respond faster to these issues before they arise. We're also now correlating the increase of log messages to a particular server, to monitor exactly how much disk space usage accrues per new API machine.

We understand that this is unacceptable and we're working tirelessly to ensure Balanced is operating efficiently and smoothly.

@ghost ghost assigned mahmoudimus Aug 25, 2012
@mahmoudimus
Copy link
Contributor Author

#12 and #39 are relevant here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant