Browse files

First stab at an incident response document

  • Loading branch information...
henrik committed Sep 28, 2016
1 parent a7c2f97 commit 488ec1d3e51dbe0dea45e149d668b529cb86d0d0
Showing with 42 additions and 0 deletions.
  1. +1 −0
  2. +41 −0 incidents/
@@ -26,3 +26,4 @@ For reference and discussion.
* [Attending a conference](/attending_a_conference)
* [Holding a demo](/holding_a_demo)
* [Starting a project](/starting_a_project)
+* [Incident response](/incidents)
@@ -0,0 +1,41 @@
+# Incident response
+A checklist for what to do for incidents such as site downtime.
+## Checklist
+### Until the problem is resolved
+- [ ] Assign an **incident lead** – a single person that is responsible for this checklist. They should delegate tasks explicitly.
+- [ ] If there are remote workers, **"get everone in the same room"** by setting up a video and audio link, e.g. [Zoom](
+- [ ] The incident lead should assign a **communicator**. The communicator ensures that we inform the support, auction houses, customers or other affected parties. May be in person, by chat, by phone, Auctionet system messages etc.
+ - [ ] Communicate when the problem starts.
+ - [ ] Communicate when there is some workaround.
+ - [ ] Communicate when the problem is resolved (from the affected party's standpoint).
+- [ ] The incident lead should assign a team of **deep delvers** to dig into the underlying issue.
+- [ ] The incident lead should assign a team of **quickfixers** to see what we can do right now to **minimise the impact** and **unblock affected parties**.
+- [ ] The incident lead may want to create a Trello card to keep track of things for this incident.
+Anyone not tapped by the incident lead is free to keep working on other things. It is the lead's responsibility to call for all hands if necessary.
+### Post mortem
+Not too long after the problem is resolved, we want a "post mortem" meeting.
+The goal of the meeting is to come up with any learnings and actions that let us do better work in future.
+- [ ] CTO and product owner should attend so we can decide what resources to allocate.
+- [ ] The discussion should be facilitated (have someone managing it) to keep us on track.
+#### Meeting agenda
+- [ ] Timeline: What happened? What did we do? What happened then? Where did we leave things?
+- [ ] How did this affect end users? Auction houses, buyers, sellers, support, finance, ….
+- [ ] Reflect on the post mortem. Can we do post mortems better? Update this document with any learnings.
+## Inspiration
+* [SVTi incident routines]( (if link is broken, try [Google's cache](

0 comments on commit 488ec1d

Please sign in to comment.