# Sampling Traces using OpenTelemetry

In this section, we will use the Trace example from Section 4 to create 100 traces and demonstrate how Probablistic and Tail-based Sampling works in OpenTelemetry.

## Probabilistic Sampling

We'll start with Probabilistic Sampling (a.k.a., Head sampling). Head sampling is a sampling technique used to make a sampling decision as early as possible. A decision to sample or drop a span or trace is not made by inspecting the trace as a whole.

For example, the most common form of head sampling is Consistent Probability Sampling. This is also be referred to as **Deterministic Sampling**. In this case, **a sampling decision is made based on the trace ID and the desired percentage of traces to sample**. This ensures that whole traces are sampled - no missing spans - at a consistent rate, such as 5% of all traces.

The upsides to head sampling are:

* Easy to understand
* Easy to configure
* Efficient
* Can be done at any point in the trace collection pipeline

**The primary downside to head sampling is that it is not possible to make a sampling decision based on data in the entire trace**. For example, you cannot ensure that all traces with an error within them are sampled with head sampling alone. For this situation and many others, you need tail sampling.

## Modify the OpenTelemetry Collector config

Here we will reconfigure our Collector to use the [Probabilistic Sampling Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/probabilisticsamplerprocessor). This processor supports several modes of sampling for **spans** and **log records**. We will only focus on trace spans in this lab.

For trace spans, this sampler supports probabilistic sampling based on **a configured sampling percentage applied to the TraceID**.

1. Edit `config.yaml`, add `probabilistic_sampler` under the `processors` section:

    ```yaml
    processors:
      probabilistic_sampler:
        sampling_percentage: 15
    ```
    ```
    ```

2. Include `probabilistic_sampler` in the `service.pipelines.traces.processors` section:

    ```yaml
        traces:
          receivers: [otlp]
          processors: [probabilistic_sampler, batch]
          exporters: [debug/basic, datadog/connector, datadog]
    ```
    ```
    ```

3. Save the config and restart the Collector.

   

## Import OpenTelemetry Modules for Traces

In [None]:
from opentelemetry import baggage, trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    ConsoleSpanExporter,
    BatchSpanProcessor
)
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.trace import Status, StatusCode
import datetime, random, socket, time, uuid
from tqdm.notebook import tqdm

## Send Traces to the Collector

We'll reuse the same example from Section 4 for sending traces, this time sending 100 traces instead of 10.

In [None]:
def getTracer(service_name):
    provider = TracerProvider(resource=Resource.create({
        "service.name": service_name,
        "service.instance.id": str(uuid.uuid4()),        
        "deployment.environment.name": "otel-adventure",
        "host.name": socket.gethostname(),
    }))
    provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="localhost:4317", insecure=True)))
    return trace.get_tracer("python", tracer_provider=provider)
    
def frontend():
    frontend_tracer = getTracer("frontend")
    with frontend_tracer.start_as_current_span("frontend") as frontend_span:
        handle_checkout()
        frontend_span.set_status(Status(StatusCode.OK))

def handle_checkout():
    checkout_tracer = getTracer("checkout")
    with checkout_tracer.start_as_current_span("checkout") as checkout_span:
        checkout_span.set_attribute("order_num", int(datetime.datetime.timestamp(datetime.datetime.now())*1000) % 100000)
        handle_payment()
        handle_shipping()
        checkout_span.set_status(Status(StatusCode.OK))
        
def handle_payment():
    payment_tracer = getTracer("payment")
    with payment_tracer.start_as_current_span("payment") as payment_span:
        payment_span.set_attribute("payment_id", str(uuid.uuid4()))
        if (random.random() < 0.1):
            payment_span.set_status(Status(StatusCode.ERROR))
        else:
            time.sleep(random.random())
            payment_span.set_status(Status(StatusCode.OK))
    
def handle_shipping():
    shipping_tracer = getTracer("shipping")
    with shipping_tracer.start_as_current_span("shipping") as shipping_span:
        shipping_span.set_attribute("tracking_num", str(uuid.uuid4()))

for n in tqdm(range(100)):
    frontend()

## Verify Results

<div class="alert alert-block alert-warning"><b>DID IT WORK???</div>
    
Recall that we configured the **Probablistic Sampler Processor** to sample 15%. As such, we'd expect 15 traces to be sampled and sent to Datadog, the other 85% should have been dropped.

### How can we verify results?

#### probablistic_sampler metrics

Open the [documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/probabilisticsamplerprocessor/documentation.md) for the `probabilistic_sampler` processor.

There are two metrics emitted by this processor, one for **logs** and one for **traces**:

* `otelcol_processor_probabilistic_sampler_count_logs_sampled`
* `otelcol_processor_probabilistic_sampler_count_traces_sampled`

We'll focus on the second one for **traces**.

To access these metrics, we already have the `telemetry` service configured in our Collector:

```yaml
service:
  telemetry:
    metrics:
      readers:
        - pull:
            exporter:
              prometheus:
                host: "localhost"
                port: 8888
```

Note that it's listening on `localhost:8888`. This service will provide **tons** of insight related to the current health of the Collector's runtime.

### Review the Collector metrics

1. Open the Collector's metrics endpoint: [http://localhost:8888/metrics](http://localhost:8888/metrics) 

2. Search the web page for the metric name: `otelcol_processor_probabilistic_sampler_count_traces_sampled`.

   There should be two instances of the same metric: one with a **label** named `sampled="false"` and one with a **label** named `sampled="true"`:

    ```
    # HELP otelcol_processor_probabilistic_sampler_count_traces_sampled Count of traces that were sampled or not
    # TYPE otelcol_processor_probabilistic_sampler_count_traces_sampled counter
    otelcol_processor_probabilistic_sampler_count_traces_sampled{policy="trace_id_w3c",sampled="false",service_instance_id="47f5a485-cd3a-4626-8c69-530946b01729",service_name="otelcol-contrib",service_version="0.112.0"} 344
    otelcol_processor_probabilistic_sampler_count_traces_sampled{policy="trace_id_w3c",sampled="true",service_instance_id="47f5a485-cd3a-4626-8c69-530946b01729",service_name="otelcol-contrib",service_version="0.112.0"} 56
    ```
    ```
    ```

    The instance where `sampled="false"` shows the number of trace spans that were *dropped*. `sampled="true"` are the spans that were allowed to pass to the output of the processor.

    In the above example, if we add `344` and `56` we get the `400` spans we initially sent. `(56 / 400) * 100%` gives us 14% which is close to our configured number of `15`.

    <div class="alert alert-block alert-info">NOTE: As more spans are sent, the number does converge on that configured value specified.</div>

#### End of Section