Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update BuilderQuery struct and add PrepareTimeseriesFilterQuery #4165

Merged
merged 3 commits into from
Dec 12, 2023

Conversation

srikanthccv
Copy link
Member

@srikanthccv srikanthccv commented Dec 6, 2023

Summary

Part 1 of #4016

Overview

Metric types

Primary metric types supported are:

  • Counter: A counter is a (cumulative/delta) metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.

  • Gauge: A gauge is a metric that represents a single numerical value that can arbitrarily change. It can go up and down. Gauges are typically used for measured values like temperatures or current memory usage, but also "counts" that can go up and down, like the number of concurrent requests.

  • Histogram: A histogram samples observations (usually things like request durations) and (cumulative/delta) counts them in configurable buckets. This allows for aggregatable calculation of quantiles.

Temporality

The definition of the word Temporality is the state of existing within or having some relationship with time. In the context of metrics, it means how the metric value changes over time. There are two types of temporality:

  • Cumulative: Cumulative metrics represent a monotonically increasing value. Cumulative metrics are always non-negative floating-point numbers and are only reset when the process restarts.

  • Delta: Delta metrics are the difference between the current value and the previous value. Delta metrics are always non-negative floating-point numbers.

Both cumulative and delta metrics are supported by the metrics service. We strongly recommend using delta temporality whenever possible.

Cumulative Counter

A cumulative counter represents a monotonically increasing count over time, reset only on restart.

Example: Total number of requests served.

Time Value
00:00 0
00:10 5
00:20 12
00:30 20
00:40 28
00:50 35
01:00 45
01:10 55
01:20 65
01:30 72
01:40 80
01:50 90
02:00 100

In this table, each row after 00:00 shows the cumulative count of requests served since the 00:00 report. For instance, at 00:20, there were 12 requests served since the 00:00 report.

Delta Counter

A delta counter shows the difference in count since the last report.

Example: Number of new requests served since last report.

Time Value
00:00 0
00:10 5
00:20 7
00:30 8
00:40 8
00:50 7
01:00 10
01:10 10
01:20 10
01:30 7
01:40 8
01:50 10
02:00 10

In this table, each row after 00:00 shows the count of new requests served since the last report. For instance, at 00:20, there were 7 new requests served since the 00:10 report.

Gauge

A gauge represents a value that can increase or decrease over time.

Example: Current number of active sessions.

Time Value
00:00 0
00:10 3
00:20 5
00:30 4
00:40 6
00:50 7
01:00 5
01:10 6
01:20 7
01:30 6
01:40 8
01:50 7
02:00 5

In this table, each row after 00:00 shows the current number of active sessions. For instance, at 00:20, there were 5 active sessions.

Cumulative Histogram

A cumulative histogram represents a monotonically increasing count of observations over time, reset only on restart.

Example: Response times categorized in buckets (e.g., <100ms, 100-200ms, 200-300ms, >300ms).

Time <100ms 100-200ms 200-300ms >300ms
00:00 0 0 0 0
00:10 5 0 0 0
00:20 10 2 0 0
00:30 15 5 1 0
00:40 20 8 2 1
00:50 25 12 3 1
01:00 30 15 5 2
01:10 35 18 7 3
01:20 40 21 10 3
01:30 45 24 12 4
01:40 50 28 14 5
01:50 55 32 15 5
02:00 60 35 17 6

In this table, each row after 00:00 shows the cumulative count of observations in each response time bucket. For instance, at 00:20, there were 10 observations with response times under 100ms, 2 observations with response times between 100-200ms, and 0 observations with response times between 200-300ms since the 00:00 report.

Delta Histogram

A delta histogram also counts observations in buckets, but the counts are the difference since the last report.

Example: New response times in the same buckets.

Time <100ms 100-200ms 200-300ms >300ms
00:00 0 0 0 0
00:10 5 0 0 0
00:20 5 2 0 0
00:30 5 3 1 0
00:40 5 3 1 1
00:50 5 4 1 0
01:00 5 3 2 1
01:10 5 3 2 1
01:20 5 3 3 0
01:30 5 3 2 1
01:40 5 4 2 1
01:50 5 4 1 0
02:00 5 3 2 1

In this table, each row after 00:00 shows the count of new observations in each response time bucket since the last report. For instance, at 00:20, there were 5 new observations with response times under 100ms and 2 new observations with response times between 100-200ms since the 00:10 report.

Time and Spatial Aggregation Explained for Metrics Data

This document clarifies the concepts of time and spatial aggregation in the context of metrics data analysis.

Time Aggregation

  • Time aggregation is the aggregation of all the measurement values for a time series over a specified aggregation interval.
  • Aggregation interval is dynamically adjusted based on the selected time range.
  • Various aggregation operators are available, including avg, sum, min, max, count, etc.
  • The result is a single value for each aggregation interval.

Spatial Aggregation

  • Combines data points from multiple time series within a specific spatial dimension(s). The spatial dimension(s) could be a host, a region, a cluster, etc.
  • Various aggregation operators are available, including avg, sum, min, max, count, etc.
  • The result is a single value representing the aggregated data for the chosen spatial dimension(s).

The following table represents the metrics data from five hosts h1, h2, h3, h4, and h5 spread across r1, r2, and r3 regions. Assume the reported value is memory usage in MB for each host. The timestamp is mm:ss (minute:second) format and ranges from 10th minute 00 seconds to 12th minute 30 seconds with a collection interval of 10 seconds. The region r1 has two hosts h1 and h2, region r2 has one host h3 and region r3 has two hosts h4 and h5. The metrics data is collected for 150 sec.

Time (h1, r1) (h2, r1) (h3, r2) (h4, r3) (h5,r3)
10:00 45 63 58 32 56
10:10 72 90 87 35 81
10:20 56 85 95 74 72
10:30 73 98 71 63 85
10:40 97 88 56 91 36
10:50 67 48 42 31 76
11:00 65 95 30 35 96
11:10 81 39 68 69 77
11:20 57 75 50 40 43
11:30 54 45 68 48 53
11:40 85 77 39 63 31
11:50 77 52 71 32 88
12:00 30 97 90 51 55
12:10 82 92 83 41 32
12:20 95 37 56 65 91
12:30 53 95 37 94 66

We can't display the raw data since it is too big. So we first perform the aggregation on the time axis for each unique series. There are 5 time series in the above table. We could use the aggregation operator avg to get the representative value for each 30 seconds. The aggregation result is shown in the following table.

ts (h1, r1) (h2, r1) (h3, r2) (h4, r3) (h5, r3)
10:00 57.6667 79.3333 80 47 69.6667
10:30 79 78 56.3333 61.6667 65.6667
11:00 67.6667 69.6667 49.3333 48 72
11:30 72 58 59.3333 47.6667 57.3333
12:00 69 75.3333 76.3333 52.3333 59.3333
12:30 53 95 37 94 66

Even this table could be too big to display if there were hundreds of hosts. The spatial aggregation is performed on the result of the time aggregation. We could use the aggregation operator sum to get the total memory usage.

Total Memory Usage for Each Region

ts r1 r2 r3
10:00 137 80 116.667
10:30 157 56.3333 127.333
11:00 137.333 49.3333 120
11:30 130 59.3333 105
12:00 144.333 76.3333 111.667
12:30 148 37 160

Total Memory Usage for Each Host

ts (h1, r1) (h2, r1) (h3, r2) (h4, r3) (h5, r3)
10:00 57.6667 79.3333 80 47 69.6667
10:30 79 78 56.3333 61.6667 65.6667
11:00 67.6667 69.6667 49.3333 48 72
11:30 72 58 59.3333 47.6667 57.3333
12:00 69 75.3333 76.3333 52.3333 59.3333
12:30 53 95 37 94 66

Note: this table is the same as the time aggregation result because each host is unique and there are no sub time series for each host. Other metrics such as DISK usage could have sub time series for each host (usage from each partition). In that case, the spatial aggregation result could be different from the time aggregation result.

Total Memory Usage from All Hosts

ts All
10:00 333.667
10:30 340.667
11:00 306.667
11:30 294.333
12:00 332.333
12:30 345

Default aggregation operators

Based on the metric type, the default time and space aggregation operators are chosen. The following table shows the default aggregation operators for each metric type.

Metric Type Default Time Aggregation Operator Default Space Aggregation Operator
Counter rate sum
Gauge avg sum
Histogram rate sum

Histograms are a special case because the value isn't a single number but a group of numbers. The most common use case is to calculate the quantiles. The current implementation supports the following quantiles: 0.5, 0.9, 0.95, 0.99. The time and space aggregation produces the distribution of observations in each bucket. The quantiles are calculated from the distribution.

Implementation Details

The schema of the metrics database tables is as follows:

CREATE TABLE signoz_metrics.time_series_v2
(
    `metric_name` LowCardinality(String),
    `fingerprint` UInt64 CODEC(DoubleDelta, LZ4),
    `timestamp_ms` Int64 CODEC(DoubleDelta, LZ4),
    `labels` String CODEC(ZSTD(5)),
    `temporality` LowCardinality(String) DEFAULT 'Unspecified' CODEC(ZSTD(5)),
    INDEX temporality_index temporality TYPE SET(3) GRANULARITY 1
)
ENGINE = ReplacingMergeTree
PARTITION BY toDate(timestamp_ms / 1000)
ORDER BY (metric_name, fingerprint)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
  • metric_name: Name of the metric
  • fingerprint: Fingerprint of the metric. This is used to identify the metric uniquely. Currently,
    we are using the hash of the labels to generate the fingerprint.
  • timestamp_ms: Timestamp of the metric when it was observed for the first time in milliseconds
  • labels: Labels of the metric; Stored as a JSON string
  • temporality: Temporality of the metric. This is used to identify the type of the metric. It can
    be one of the following values:
    • Unspecified: This is the default value.
    • Cumulative: This is used for monotonic counters.
    • Delta: This is used for non-monotonic counters.
CREATE TABLE signoz_metrics.samples_v2
(
    `metric_name` LowCardinality(String),
    `fingerprint` UInt64 CODEC(DoubleDelta, LZ4),
    `timestamp_ms` Int64 CODEC(DoubleDelta, LZ4),
    `value` Float64 CODEC(Gorilla, LZ4)
)
ENGINE = MergeTree
PARTITION BY toDate(timestamp_ms / 1000)
ORDER BY (metric_name, fingerprint, timestamp_ms)
TTL toDateTime(timestamp_ms / 1000) + toIntervalSecond(2592000)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
  • metric_name: Name of the metric
  • fingerprint: Fingerprint of the metric. This is used to identify the metric uniquely. Currently,
    we are using the hash of the labels to generate the fingerprint.
  • timestamp_ms: Timestamp of the metric in milliseconds
  • value: Value of the metric

Query preparation

As there are two tables for metrics, any query on metrics will need to join these two tables. First, we get the fingerprints of the metrics that match the query criteria from the time_series_v2 table. Then, we join the samples_v2 table with the time_series_v2 table to get the actual metric values. The are three to four steps in the query preparation:

  1. Get the fingerprints of the metrics that match the query criteria from the time_series_v2 table.
  2. Join the samples_v2 table with the time_series_v2 table to get the actual metric values.
  3. Apply the time aggregation operator on the metric values.
  4. Apply the space aggregation operator on the metric values.

A typical query looks like the following:

SELECT
    ts,
    sum(per_series_value) AS value
FROM
(
    SELECT
        fingerprint,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
        avg(value) AS per_series_value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT DISTINCT fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
    GROUP BY
        fingerprint,
        ts
    ORDER BY
        fingerprint ASC,
        ts ASC
)
WHERE isNaN(per_series_value) = 0
GROUP BY
    GROUPING SETS (
        (ts),
        ())
ORDER BY ts ASC

The query can be broken down into the following steps:

  1. Get the fingerprints of the metrics that match the query criteria from the time_series_v2 table.
SELECT DISTINCT fingerprint
FROM signoz_metrics.time_series_v2
WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
  1. Join the tables and apply the time aggregation operator on the metric values.
SELECT
    fingerprint,
    toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
    avg(value) AS per_series_value
FROM signoz_metrics.distributed_samples_v2
INNER JOIN
(
    SELECT DISTINCT fingerprint
    FROM signoz_metrics.time_series_v2
    WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
) AS filtered_time_series USING (fingerprint)
WHERE (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
GROUP BY
    fingerprint,
    ts
ORDER BY
    fingerprint ASC,
    ts ASC
  1. Apply the space aggregation operator on the metric values.
SELECT
    ts,
    sum(per_series_value) AS value
FROM
(
    SELECT
        fingerprint,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
        avg(value) AS per_series_value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT DISTINCT fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
    GROUP BY
        fingerprint,
        ts
    ORDER BY
        fingerprint ASC,
        ts ASC
)
WHERE isNaN(per_series_value) = 0
GROUP BY
    GROUPING SETS (
        (ts),
        ())
ORDER BY ts ASC

This is a simple example, things gets little complicated when we need to compute rates and percentiles.


The major changes in the metrics builder improvements are

  1. temporal and spatial aggregation
  2. functions support

I will be send a series of PRs to implement these changes. This PR is the first one in the series.

@github-actions github-actions bot added the chore label Dec 6, 2023
@srikanthccv
Copy link
Member Author

I added the comprehensive description that sets the stage for the dozen PRs I have in the pipeline for new metrics builder changes. The examples illustrate what happens in the background for the chosen small raw data. This should help you understand how metrics work. Please go through it. Let me know if there are any questions on anything not just changes in this PR. A part of this description will also go into the docs. My goal is to make you understand first since you are one of the end users.

@srikanthccv srikanthccv marked this pull request as ready for review December 6, 2023 19:11
@ankitnayan
Copy link
Collaborator

toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,

toIntervalSecond(60)) should be configurable, atleast at the API level even if we automatically decide things now in the frontend. But someday we can enable users to choose their own aggregation interval

WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')

Can there be different temporalities for the same metric_name? If yes, we should move temporality as the 1st sorting key?


Regarding INNER JOIN, say we do A (1000 rows) inner join B (100 rows), the intersection would be 100 rows. If we are interested in the 100 rows, I think A will have a lot many extra fingerprints as label filtering is not there in samples table. Does it affect performance? cc @dhawal1248

For the above query, can we use:

SELECT
        fingerprint,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
        avg(value) AS per_series_value
    FROM signoz_metrics.distributed_samples_v2
   WHERE fingerprint IN
    (
        SELECT DISTINCT fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
    )
    AND (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
    GROUP BY
        fingerprint,
        ts
    ORDER BY
        fingerprint ASC,
        ts ASC

instead of below

SELECT
        fingerprint,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
        avg(value) AS per_series_value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT DISTINCT fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
    GROUP BY
        fingerprint,
        ts
    ORDER BY
        fingerprint ASC,
        ts ASC

@srikanthccv
Copy link
Member Author

toIntervalSecond(60)) should be configurable, atleast at the API level even if we automatically decide things now in the frontend. But someday we can enable users to choose their own aggregation interval

It is configurable today. It was planned to be configurable from the frontend also but due to a bug in frontend we disabled it in UI.

Can there be different temporalities for the same metric_name? If yes, we should move temporality as the 1st sorting key?

Usually no, the exception is when someone is transitioning from one to another, they could send the same metrics with different temporalities to be backfilled and then eventually only send one temporality. We are going to do the same for span metrics see SigNoz/charts#355

Regarding INNER JOIN, say we do A (1000 rows) inner join B (100 rows), the intersection would be 100 rows. If we are interested in the 100 rows, I think A will have a lot many extra fingerprints as label filtering is not there in samples table. Does it affect performance?

It does affect; ClickHouse doesn't shine at JOINS. The "ClickHouse way" of doing things is using wide tables. The temporality is part of ORDER BY https://github.com/SigNoz/signoz-otel-collector/blob/1fe5faae2cfef2e32ee0f5021a532c10436f7a5b/migrationmanager/migrators/metrics/migrations/000001_init_db.up.sql#L43-L53 for v3 table. We are going to move to this table soon.

can we use: ... instead of below

No, when there is a group by the result should include the group by labels, It's not possible with IN because there are no labels on samples table.

This is an invalid query

SELECT
    fingerprint,
    service_name,
    toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
    avg(value) AS per_series_value
FROM signoz_metrics.distributed_samples_v2
WHERE fingerprint IN
(
    SELECT DISTINCT fingerprint
    FROM signoz_metrics.time_series_v2
    WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
)
AND (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
GROUP BY
    fingerprint,
    service_name,
    ts
ORDER BY
    fingerprint ASC,
    service_name,
    ts ASC

@dhawal1248
Copy link
Contributor

@srikanthccv can you share the code link of where we start this query prep?

@srikanthccv
Copy link
Member Author

can you share the code link of where we start this query prep?

By this I assume you are asking what exists in production today; This is the entry point

func PrepareMetricQuery(start, end int64, queryType v3.QueryType, panelType v3.PanelType, mq *v3.BuilderQuery, options Options) (string, error) {

@srikanthccv
Copy link
Member Author

I am going to merge this but you can review and ask any questions.

@srikanthccv srikanthccv merged commit 9360c61 into develop Dec 12, 2023
11 checks passed
@srikanthccv srikanthccv deleted the 4016-1-filter-sub-query branch December 12, 2023 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants