# What you'll learn

After watching this video, you'll be able to **explain the concept of sampling and logging**.

# Need for Sampling

![image.png](attachment:bf692872-eb55-4e34-8872-1ac704a16c4e.png)

* In Cloud environments and software systems, you may have large-scale applications deployed across multiple instances.
* These applications may generate a high volume of logs, which can become challenging to handle efficiently.
* Sampling and the logging infrastructure can help to optimize log processing and reduce costs.

![image.png](attachment:3c82658c-39cd-43ba-9a58-d686e7876e02.png)

* Sampling and logging is a practice of collecting only a subset of log events for analysis or storage.
* Instead of logging every single event or piece of data, a subset is selected at random or by some other criteria for recording.
* This can help reduce the amount of storage required for log data and make it easier to manage and analyze.

# Sampling strategies

![image.png](attachment:d1e30457-3de3-49c5-a2b7-e743f2bdc230.png)

Sampling strategies refer to the techniques used to select a subset of log records for analysis and storage.

There are several sampling strategies commonly used in logging.

* The **time-based sampling** selects log records at fixed time intervals, such as every minute or every hour.
* The **size-based sampling** technique selects log records based on their size, such as selecting only records that exceed a certain threshold.
* The **random sampling** technique randomly selects log records from a larger set for analysis.
* The **event-based sampling** technique selects log records based on specific events such as errors or warnings.
* Finally, the **weighted sampling** technique assigns weights to log records based on their importance or relevance, and then samples accordingly.

It is important to carefully consider which strategy is most appropriate for your needs before implementing it in your logging system.


# Examples of sampling

![image.png](attachment:48ea8115-4a6a-4be0-b65e-dda6132c5bc7.png)

Let's explore a few examples of how sampling can be used in observability to gain insights into system performance and identify potential issues.
* **CPU sampling** is about sampling the CPU usage of an application at regular intervals to determine its performance characteristics.
* **Network packet sampling** is about collecting a sample of network packets flowing through the network to identify issues with the network traffic.
* **Sampling tracing data** from distributed systems helps in identifying bottlenecks and other issues.
* **Log sampling** is about collecting logs from various sources across the system, sampling them for analysis and identifying unusual trends or patterns.
* **Error rate sampling** is about identifying and sampling errors generated by an application to trace down critical issues that need immediate attention.
* **User behavior sampling** is about analyzing user behavior through sampling data such as click streams, mouse movements, and other user interactions to improve the user experience.


# Advantages of sampling in observability

![image.png](attachment:14c4541b-53d9-4696-8f3c-a0f047795940.png)

Moving further, let us identify the advantages of using sampling and observability.
* **Reduced overhead**: Sampling reduces the amount of data collected, which in turn reduces the computational overhead and storage requirements.
* **Enhanced performance**: With less data to process, analysis and visualization can be performed more quickly, resulting in faster response times.
* **Cost-effectiveness**: By using sampling, organizations can reduce their storage costs while still getting valuable insights into system behavior.
* **Scalability**: Sampling can help with scalability challenges by allowing organizations to analyze a small portion of the data, making it easier to scale up monitoring capabilities as needed.

# Disadvantages of sampling in observability

![image.png](attachment:288d686f-57ca-46a2-abb4-5738aad6cf11.png)

It's also important to explore the disadvantages of sampling in observability.
* **Missing details**:
    * You should understand that sampling may not capture all the details.
    * Analyzing only a portion of the data can result in important information being overlooked, potentially impacting the system.
* **Inaccurate**:
    * Sampling may not be accurate and does not always represent the actual data.
    * The accuracy of sample data depends on how well the sample represents the overall population.
* **Limited resolution**:
    * Sampling and monitoring systems limits resolution.
    * With only a portion of the events captured, obtaining a detailed view of the system activity is challenging.
* **Mask outliers**: 
    * Sampling may also mask outliers.
    * Rare outliers missed by sampling can contained vital system issue insights, risking unnoticed problems.
* Finally, sampling can make it **challenging to diagnose complex performance issues** that arise from interactions between multiple variables and dependencies within a system.

# Summary

![image.png](attachment:7ac1341d-c998-4efa-8477-1c869836da26.png)

In this video, you learned that:
* Sampling and logging is the practice of collecting only a subset of log events for analysis and storage.
* Sampling strategies commonly used in logging are time-based sampling, size-based sampling, random sampling, event-based sampling, and weighted sampling.
* Some of the advantages of using sampling are reduced overhead, enhanced performance, cost-effectiveness, and improved accuracy.
* Some of the disadvantages are missing details, inaccurate data, limited resolution, and masked outliers.
