Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: solve a Kubernetes alert before time runs out #6

Closed
aantn opened this issue Jul 25, 2022 · 0 comments
Closed

Submission: solve a Kubernetes alert before time runs out #6

aantn opened this issue Jul 25, 2022 · 0 comments
Assignees
Labels
hackdays-august-2022 Hackdays August 2022 submission

Comments

@aantn
Copy link

aantn commented Jul 25, 2022

Topic: Window of opportunity

You're stuck in a time loop of finding a hideous bug. Some form of software can help you find the root cause. What software might that be?

The situation is that you have a Kubernetes cluster which has firing alerts. You find out about those alerts in Slack, but because you're in a very short time loop, you have to be able to investigate and solve the alert while remaining entirely in Slack.

Technology

I work on an open source project called Robusta that we can use to do this. Using Python and YAML, we can define what data to show in Slack when each alert occurs. We can also trigger remediation actions by clicking on buttons in Slack.

Here is a silly example which adds a button that lets search Stackoverflow for the current alert's name without leaving Slack. The results are then sent in a followup message in the same channel.

Screen Shot 2022-07-25 at 17 30 58

You can see the code that powers this here.

Plan

  1. Choose an alert or family of alerts that we want to enrich
  2. Decide what data we need to solve that alert
  3. Write a Robusta playbook that gathers the data for that
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hackdays-august-2022 Hackdays August 2022 submission
Projects
None yet
Development

No branches or pull requests

3 participants