# Monitoring and Logging in Kubernetes

## Metrics Versus Logs

#### You first need to understand the difference between log collection and metrics collection. They are complementary but serve different purposes:

#### Metrics
- A series of numbers measured over a period of time.

#### Logs
- Logs keep track of what happens while a program is running, including any errors, warnings, or notable events that occur.



#### A example of where you would need to use both metrics and logging is when an application is performing poorly. Our first indication of the issue might be an alert of high latency on the pods hosting the application, but the metrics might not give a good indication of the issue. We then can look into our logs to investigate errors that are being emitted from the application.

## Monitoring Techniques

#### Closed-box monitoring focuses on monitoring from the outside of an application and is what’s been used traditionally when monitoring systems for components like CPU, memory, storage, and so on. Closed-box monitoring can still be useful for monitoring at the infrastructure level, but it lacks insights and context into how the application is operating. For example, to test whether a cluster is healthy, we might schedule a pod, and if it’s successful, we know that the scheduler and service discovery are healthy within our cluster, so we can assume the cluster components are healthy.

#### Open-box monitoring focuses on the details in the context of the application state, such as total HTTP requests, number of 500 errors, latency of requests, and so on. With open-box monitoring, we can begin to understand the why of our system state. It allows us to ask, “Why did the disk fill up?” and not just state, “The disk filled up.”

## Monitoring Patterns
#### You might look at monitoring and say, “How difficult can this be? We’ve always monitored our systems.” The concept of monitoring isn’t new, and we have many tools at our disposal to help us understand how our systems are performing. But platforms like Kubernetes are much more dynamic and transient, so you’ll need to change your thinking about how to monitor these environments. For example, when monitoring a virtual machine (VM) you expect that VM to be up 24/7 and all its state preserved. In Kubernetes, pods can be very dynamic and short-lived, so you need to have monitoring in place that can handle this dynamic and transient nature.

#### There are two monitoring patterns to focus on when monitoring distributed systems. The USE method, popularized by Brendan Gregg, focuses on the following:

- U—Utilization
- S—Saturation
- E—Errors

#### This method is focused on infrastructure monitoring because there are limitations on using it for application-level monitoring. The USE method is described as “For every resource, check utilization, saturation, and error rates.” This method lets you quickly identify resource constraints and error rates of your systems. For example, to check the health of the network for your nodes in the cluster, you will want to monitor the utilization, saturation, and error rate to be able to easily identify any network bottlenecks or errors in the network stack. The USE method is a tool in a larger toolbox and is not the only method you will utilize to monitor your systems.

#### Another monitoring approach, called the RED method, was popularized by Tom Wilkie. The RED method approach is focused on the following:

- R—Rate
- E—Errors
- D—Duration

#### The philosophy was taken from Google’s Four Golden Signals:

#### Latency
- How long it takes to serve a request

#### Traffic
- How much demand is placed on your system

#### Errors
- The rate of requests that are failing

#### Saturation
- How utilized your service is

#### As an example, you could use this method to monitor a frontend service running in Kubernetes to calculate the following:

- How many requests is my frontend service processing?
- How many 500 errors are users of the service receiving?
- Is the service overutilized by requests?

#### The USE and RED methods are complementary given that the USE method focuses on the infrastructure components and the RED method focuses on monitoring the end-user experience for the application.


## Kubernetes Metrics Overview
#### Now that we know the different monitoring techniques and patterns, let’s look at what components you should be monitoring in your Kubernetes cluster. A Kubernetes cluster consists of control-plane components and node components. The control-plane components consist of the API server, etcd, scheduler, and controller manager. The nodes consist of the kubelet, container runtime, kube-proxy, kube-dns, and pods. You need to monitor all these components to ensure a healthy cluster and application.

## cAdvisor

#### Container Advisor, or cAdvisor, is an open source project that collects resources and metrics for containers running on a node. cAdvisor is built into the Kubernetes kubelet, which runs on every node in the cluster. It collects memory and CPU metrics through the Linux control group (cgroup) tree. If you are not familiar with cgroups, it’s a Linux kernel feature that allows isolation of resources for CPU, disk I/O, or network I/O. cAdvisor will also collect disk metrics through statfs, which is built into the Linux kernel. These are implementation details you don’t really need to worry about, but you should understand how these metrics are exposed and the type of information you can collect. You should consider cAdvisor as the source of truth for all container metrics.

## Metrics Server

#### First, the canonical implementation of the Resource Metrics API is the metrics server. The metrics server gathers resource metrics such as CPU and memory. It gathers these metrics from the kubelet’s API and then stores them in memory. Kubernetes uses these resource metrics in the scheduler, Horizontal Pod Autoscaler (HPA), and Vertical Pod Autoscaler (VPA).

#### Second, the Custom Metrics API allows monitoring systems to collect arbitrary metrics. This allows monitoring solutions to build custom adapters that will allow for extending outside the core resource metrics. For example, Prometheus built one of the first custom metrics adapters, which allows you to use the HPA based on a custom metric. This opens up better scaling based on your use case because now you can bring in metrics like queue size and scale based on a metric that might be external to Kubernetes.

## kube-state-metrics
#### kube-state-metrics is a Kubernetes add-on that monitors the object stored in Kubernetes. Where cAdvisor and Metrics Server are used to provide detailed metrics on resource usage, kube-state-metrics is focused on identifying conditions on Kubernetes objects deployed to your cluster.

#### Pods
- How many pods are deployed to the cluster?
- How many pods are in a pending state?
- Are there enough resources to serve a pods request?

#### Deployments
- How many pods are in a running state versus a desired state?
- How many replicas are available?
- What deployments have been updated?

#### Nodes
- What’s the status of my nodes?
- What are the allottable CPU cores in my cluster?
- Are there any nodes that are unschedulable?

#### Jobs
- When did a job start?
- When did a job complete?
- How many jobs failed?