Typo (availabillity ==> availability)

ConfluxHQ · Aug 3, 2021 · 4ee0f72 · 4ee0f72
1 parent c81141f
commit 4ee0f72
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/reliability.md b/reliability.md
@@ -25,8 +25,8 @@ Method: Use the [*Spotify Squad Health Check*](https://labs.spotify.com/2014/0
 | 2\. **User Goals and SLIs** - What should your service/application do from the viewpoint of the user?                                                                | We do not have a clear definition of what our application or service does from the user perspective.                           | We have clear, user-centric definitions of the application/service capabilities and outcomes from a user perspective.                                                                                                                                                 |
 | 3\. **Understanding users and behavior** - Who are the users of the software and how do they interact with the software? How do you know?                                                                                   | We don't really know how our users interact with our application/service --OR-- We don't really know who our users are.                                         | We have **user personas** validated through user research and we measure and track usage of the applications/services using digital **telemetry**.                                                                                                                                                                                  |
 | 4\. **SLIs/SLOs** - How do you **know when users have experienced an outage** or unexpected behaviour in the software?                                                    | We know there is an outage or problem when users complain via chat or the help desk.                                      | We proactively monitor the user experience using synthetic transactions across the key user journeys.                                                                                                                                                                                         |
-| 5\. **Service Health** - What is the single most important **indicator or metric** you use to determine the **health and availability of your software** in production/live?                                                          | We don't have a single key metric for the health and availabillity of the application/service.                   | We have a clear, agreed key metric for each application/service and we display this figure on a team-visible dashboard. The dashboard data is updated at least every 10 minutes.                                                                                                                                                                |
-| 6\. **SLIs** - What combination of three or four **indicators or metrics** do you use (or could/would you use) to provide a **comprehensive picture of the health and availability** of your software in production/live?                                                       | We don't have a set of key metrics for the health and availabillity of the application/service.                                | We have a clear, agreed set of key metrics for each application/service and we display this figure on a team-visible dashboard. The dashboard data is updated at least every 10 minutes.                                                                                                                               |
+| 5\. **Service Health** - What is the single most important **indicator or metric** you use to determine the **health and availability of your software** in production/live?                                                          | We don't have a single key metric for the health and availability of the application/service.                   | We have a clear, agreed key metric for each application/service and we display this figure on a team-visible dashboard. The dashboard data is updated at least every 10 minutes.                                                                                                                                                                |
+| 6\. **SLIs** - What combination of three or four **indicators or metrics** do you use (or could/would you use) to provide a **comprehensive picture of the health and availability** of your software in production/live?                                                       | We don't have a set of key metrics for the health and availability of the application/service.                                | We have a clear, agreed set of key metrics for each application/service and we display this figure on a team-visible dashboard. The dashboard data is updated at least every 10 minutes.                                                                                                                               |
 | 7\. **Error Budget and similar mechanisms** - How does the team know when to **spend time on operational aspects** of the software (logging, metrics, performance, reliability, security, etc.)? Does that time actually get spent?                                                    | We spend time on operational aspects only when there is a problem that needs fixing.                                     | We allocate between 20% and 30% of our time for working on operational aspects and we check this each week. We alert if we have not spent time on operational aspects --OR-- We use SRE Error Budgets to plan our time spent on operational aspects.                                                                              |
 | 8\. **Alerting** - What proportion (approximately) of your time and effort as a team do you spend on **making alerts and operational messages more reliable and more relevant**?                                                                                       | We spend as little time as possible on alerts and operational messages - we need to focus on user-visible features.                                | We regularly spend time reviewing and improving alerts and operational messages.                                                                                                                                                      |
 | 9\. **Toil and fixing problems** - What proportion (approx) of your time gets taken up with incidents from live systems and how predictable is the time needed to fix problems?                                    | We do not deal with issues from live systems at all - we focus on new features --OR-- live issues can really affect our delivery cadence and are very disruptive.                     | We allocate a consistent amount of time for dealing with live issues --OR-- one team member is responsible for triage of live issues each week OR we rarely have problems with live issues because the software works well. |