# What you'll learn

After watching this video, you will be able to: 
* Define alerting and alerting concepts. 
* Explain the different types of alerts.
* Review popular, open-source alerting tools.
* Summarize the benefits of alerting.

# What is alerting?

![image.png](attachment:304d6d45-66ac-47fb-b664-483d3e7a0ae8.png)

* Alerting is a critical part of monitoring.
* Working together, monitoring and alerting provide insight into how your applications and infrastructure are performing.
* Alerting is the responsive element of a monitoring system.
* It helps you quickly detect and address issues before they impact users.
* Additionally, alerting proactively notifies you when monitored data indicates potential problems within your infrastructure or applications.
* It can trigger user-defined actions based on changes detected within an organization's business systems.

# Alert outputs

The most common reason for alerting is to notify those responsible for that system, service, or application.

There are two standard alert outputs: 
* **Notifications** tell you that some condition was detected.
* **Automated actions** that execute a scripted or programmatic action to mitigate an issue that's been detected.

# How alerting works?

![image.png](attachment:099a801c-2850-44f8-bab2-1be0bb5f712d.png)

Alerting works in the following way: 
* The monitoring system collects data and metrics from systems, apps, and processes.
* The monitoring system analyzes the data collected.
* If failures or anomalies are detected, an alarm is raised, and an alert is sent Investigation and mitigation are performed by the appropriate staff who work to resolve the issue.
* Then the monitoring process continues.

# How does the alerting process work? 

![image.png](attachment:31295612-4ce2-411a-9a44-d8a065207e02.png)

Here is an overview: 
* When an alarm is triggered, an alert is sent.
* An admin then investigates the issue, beginning with the metric that caused the alert to trigger.
* The admin will perform their reasoning, moving backward in search of a cause.
* The mitigation is successful when a satisfactory explanation is found.
* Mitigation puts the system back in balance.
* The metrics will indicate that, and the alarm transitions back into a 'clear' state.
* Mitigation fails when the metrics don't reflect improvement.
* Then the efficacy of the mitigation strategy should be questioned, and an alternative solution might be needed to resolve the issue fully.

# Types of alert

There are four types of alerts: 
* Metric alerts, 
* Log alerts, 
* Activity log alerts, and 
* Smart detection.

Let's take a closer look.

![image.png](attachment:52b81a6f-05ab-41da-a591-94a206fc09fb.png)

**Metric alerts** are based on raw data collected by your monitoring system.
* They provide information about the availability of resources on systems, applications, databases, and web servers.
* Metrics help you comprehend the current health of your whole infrastructure and applications.
* They can uncover trends in usage and behavior to help you understand the effects of any changes made to your applications or infrastructure.

Another type of alert is a **log alert**.
* It is different from metrics alerts.
* Log alerts use log analytics queries to evaluate resource logs at predefined intervals to see how your applications or services are and have been performing.
* Log alerts provide a trail of events that show what happened and when and are extremely important for troubleshooting.
* Activity log alerts automate the log review process, which helps disengage you from that task and free up your time.
* You set up rules and conditions to have your monitoring system alert you when those conditions occur.

**Activity log alerts** are triggered when a new event matches the defined rules or conditions.

**Smart detection** works with Application Insights that your cloud provider might provide.
* It proactively analyzes telemetry sent by your app to warn of potential performance and failure anomalies detected in your web application.
* Alerts are sent automatically if sudden changes are detected.



# Alert threshold best practices

![image.png](attachment:652bd66b-1d7b-4b79-a2ab-b83ece41a0c3.png)

Alerts are very helpful, but be careful of alerts set up too broadly or too sensitive.

**When thresholds are set up too broadly**: 
* The monitoring system might not detect real problems quickly enough, and the affected system or application might experience a higher degree of performance degradation, which could lead to downtime.
* When problems are finally identified and mitigated, the alerting configuration should be modified or adjusted to prevent the repeat of expensive outages.

**When alarm monitors are created with unnecessarily sensitive thresholds**: 
* It is highly likely that normal system operations will trigger an alarm.
* In such scenarios, the alarms will generate alerts when no harm is done.
* To remedy this issue, the baseline should be reevaluated, and respective monitors adjusted to improve the detectability of real issues.

Most alarms, however, do go off for a valid reason, and these alarms usually identify issues that can be mitigated.

# Popular open-source alerting tools

![image.png](attachment:27b4859f-6128-410e-bd4c-6dd0e074a80d.png)

It's important to know some important open-source alerting tools.

**Bosun** has regular features capable of displaying simple graphs and creating alerts using a powerful expression language for alert rules and conditions.
* It's limited to email and HTTP notification configurations only, which means connecting to Slack and other tools require additional customization.
* Bosun can use templates for notifications, which means they can look as fabulous as you would like to make them.

Another open-source alerting tool is **Cabot**.
* It was created by a company called Arachnys and doesn't collect any data itself.
* It uses another method to access data by hooking into the APIs of the alerting tools.
* It uses a pull (rather than a push) model for the data it requires to make alerting decisions.
* Cabot stores its alerting data in a Postgres database and uses a Redis cache.
* It can integrate with Google Calendar for on-call rotations using a feature called Rota.

Finally, **StatsAgg** is another popular open-source alerting tool.
* It is an alerting and metrics aggregation platform that can act as a proxy for other systems.
* It supports Graphite, StatsD, InfluxDB, and Open TSDB as inputs.
* It can also send alerts based on regular expression matching and is focused on alerting by service rather than host or instance.

# What are some of the benefits of alerting? 

![image.png](attachment:1be0472b-8ab4-4edd-9541-90b52dbd7312.png)

* Alerting enables you to spot problems anywhere in your infrastructure and applications.
* They draw your attention to the devices, applications, or systems that require observation, investigation, or intervention.
* Monitoring and automated alerts allow you to disengage from manually inspecting logs, system events, and other metrics, thus freeing up valuable time that you could use elsewhere.
* You can define situations that make sense to actively manage while relying on passive monitoring to watch for changing conditions.
* Implementing automated alerts throughout your infrastructure allows you to respond quickly to issues, minimize downtime, and provide better service.

# Summary

![image.png](attachment:7992e436-75b7-4d35-aca6-12b06695a027.png)

In this video, you learned that:
* Alerting is the responsive part of a monitoring system that helps you get alerted about the issues.
* You can configure thresholds and conditions for the monitoring system to watch, which frees up your time.
* Alerts can be passive notifications or can trigger automated actions to mitigate certain issues.
* There are four types of alerts: metric, log, activity log, and smart detection.
* Configuring alert thresholds too liberally or too sensitively can cause additional problems.
* Bosun, Cabot, and StatsAgg are three popular open-source alerting systems.
