# What Is Prometheus?

- Open source, metrics-based monitoring system
- It has a simple yet powerful data model and a query language that lets you analyze how applications and infrastucture are performing
- It does not try to solve problems outside of the metrics space
- It is primarily written in Go
- Prometheus' data model identifies each time series not just with a name, but also with an unordered set of key-value pairs called labels.
- The PromQL query language allows aggregation across any of these labels, so you can analyse not just per process but also per datacetner and per service or by any other labels that you have defined. **These can be graphed in dashboard systems such as Grafana**. 
- Alerts can be defined using the exact same PromQL query language that you use for graphing.
  - If you can graph it, you can alert on it. 

## What Is Monitoring?

Most monitoring is about the same thing: events. Events can be almost anything including:

- Receiving a HTTP request 
- Sending a HTTP 400 response 
- Entering a function 
- Reaching the else of an if statement 
- Leaving a function 
- A user logging in 
- Writing data to disk 
- Reading data from the network 
- Requesting more memory from the kernel

As a metric-based monitoring system, Prometheus is designed to track overall system health, behaviour, and performance rather than individual events. Put another way, Prometheus cares that there were 15 requests in the last minute that took 4 seconds to handle, resulted in 40 database calls, 17 cache hits, and 2 purchases by customers. The cost and code paths of the individual calls would be the concern of profiling or logging.


## Prometheus Architecture

![prometheus-architecture](assets/prometheus-arch.png)

### Client Libraries

- Metrics do not typically magically spring forth from applications
- Someone hast to add the instrumentation that produces tehm
- With usually only two or three lines of code, you can both define a metric and add your desired instrumentation inline in code you control. This is referred to as direct instrumentation. 

### Exporters

- Not all code you run is code that you can control or even have access to, and thus adding direct instrumentation isn't really an option.
- E.g: It's unlikely that operating system kernels will start outputting Prometheus-formatted metrics over to HTTP anytime soon.
- Such software often has some interface through which you can access metrics.
- An exporter is a piece of software that you deploy right beside the application you want to obtain metrics from.
- It takes requests from Prometheus, gathers the required data from the application, transforms them into the correct format, and finally returns them in a response to Prometheus. In other words, an exporter is like a small one-to-one proxy, converting data between the metrics interface of an application and the prometheus exposition format. 
- Unlike the direct instrumentation, exporters use  a different sytle of instrumentation known as **custom collectors** or **ConstMetrics**.

### Service Discovery

- Once you have all your applications instrumented and your exporters running, Prometheus needs to know where they are.
- Prometheus has integrations with many common service discovery mechanisms such as Kubernetes, EC2, Consul, etc...

### Scraping

- Service discovery and relabelling give us a list of targets to be monitored.
- Now Prometheus needs to fetch the metrics.
- Prometheus does this by sending a HTTP request called **a scrape**
- The response to the scrape is parsed and ingested into storage
- Several useful metrics are also added in, such as if the scrape succeeded and how long it took.
- Scrapes happen regularly; usually you would configure it to happen every 10 to 60 seconds for each target.


**NOTE**: Prometheus is a pull-based system. It decides when and what to scrape, based on its configuration. There are also push-based systems, where the monitoring target decides if it is going to be monitored and how often.

### Storage

- Prometheus stores data locally in a custom database.
- Distributed systems are challenging to make reliable, so Prometheus does not attempt to do any form of clustering
- The storage system can handle ingestion millions of samples per second, making it possible to monitor thousands of machines with a single Prometheus server.
- The compression algorithms used can achieve 1.3 bytes per sample on real-world data.
- **An SSD is recommended, but not strictly required**

### Dashboards

- Prometheus has a number of HTTP APIs that allow you to both request raw data and evaluate PromQL queries.
- These can be used to produce graphs and dashboards.
- Out of the box, Prometheus provides the **expression** browser.
- It uses these APIs and is suitable for ad hoc querying and data exploration, but it is not a general dashboard system.
- It is recommended that you use Grafana for dashboards. 

### Alerts

- Aggregating metrics from thousands of machines on the fly everytime you render a graph can get a little laggy.
- Recording rules allow PromQL expressions to be evaluated on a regular basis and their results ingested into the storage engine
- Alerting rules are another form of recording rules.
- Alerts are sent to the Alertmanager.

### Alert management

- Alertmanager receives alerts from Prometheus servers and turns them into notifications. 
- Notifications can include email, chat applications such as Slack and services such as PagerDuty.
- The Alertmanager does more than blindly turn alerts into notifications on a one-to-one basis.

## What Prometheus Is Not

- Promotheus is not suitable for storing event logs or individual events.
- Nor is it the best choice for high cardinality data, such as email addresses or usernames
- Prometheus is designed for operational monitoring, where small inaccuracies and race conditions due to factors like kernel scheduling and failed scrapes are a fact of life.