chore: update BuilderQuery struct and add PrepareTimeseriesFilterQuery #4165

srikanthccv · 2023-12-06T05:01:54Z

Summary

Part 1 of #4016

Overview

Metric types

Primary metric types supported are:

Counter: A counter is a (cumulative/delta) metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.
Gauge: A gauge is a metric that represents a single numerical value that can arbitrarily change. It can go up and down. Gauges are typically used for measured values like temperatures or current memory usage, but also "counts" that can go up and down, like the number of concurrent requests.
Histogram: A histogram samples observations (usually things like request durations) and (cumulative/delta) counts them in configurable buckets. This allows for aggregatable calculation of quantiles.

Temporality

The definition of the word Temporality is the state of existing within or having some relationship with time. In the context of metrics, it means how the metric value changes over time. There are two types of temporality:

Cumulative: Cumulative metrics represent a monotonically increasing value. Cumulative metrics are always non-negative floating-point numbers and are only reset when the process restarts.
Delta: Delta metrics are the difference between the current value and the previous value. Delta metrics are always non-negative floating-point numbers.

Both cumulative and delta metrics are supported by the metrics service. We strongly recommend using delta temporality whenever possible.

Cumulative Counter

A cumulative counter represents a monotonically increasing count over time, reset only on restart.

Example: Total number of requests served.

Time	Value
00:00	0
00:10	5
00:20	12
00:30	20
00:40	28
00:50	35
01:00	45
01:10	55
01:20	65
01:30	72
01:40	80
01:50	90
02:00	100

In this table, each row after 00:00 shows the cumulative count of requests served since the 00:00 report. For instance, at 00:20, there were 12 requests served since the 00:00 report.

Delta Counter

A delta counter shows the difference in count since the last report.

Example: Number of new requests served since last report.

Time	Value
00:00	0
00:10	5
00:20	7
00:30	8
00:40	8
00:50	7
01:00	10
01:10	10
01:20	10
01:30	7
01:40	8
01:50	10
02:00	10

In this table, each row after 00:00 shows the count of new requests served since the last report. For instance, at 00:20, there were 7 new requests served since the 00:10 report.

Gauge

A gauge represents a value that can increase or decrease over time.

Example: Current number of active sessions.

Time	Value
00:00	0
00:10	3
00:20	5
00:30	4
00:40	6
00:50	7
01:00	5
01:10	6
01:20	7
01:30	6
01:40	8
01:50	7
02:00	5

In this table, each row after 00:00 shows the current number of active sessions. For instance, at 00:20, there were 5 active sessions.

Cumulative Histogram

A cumulative histogram represents a monotonically increasing count of observations over time, reset only on restart.

Example: Response times categorized in buckets (e.g., <100ms, 100-200ms, 200-300ms, >300ms).

Time	<100ms	100-200ms	200-300ms	>300ms
00:00	0	0	0	0
00:10	5	0	0	0
00:20	10	2	0	0
00:30	15	5	1	0
00:40	20	8	2	1
00:50	25	12	3	1
01:00	30	15	5	2
01:10	35	18	7	3
01:20	40	21	10	3
01:30	45	24	12	4
01:40	50	28	14	5
01:50	55	32	15	5
02:00	60	35	17	6

In this table, each row after 00:00 shows the cumulative count of observations in each response time bucket. For instance, at 00:20, there were 10 observations with response times under 100ms, 2 observations with response times between 100-200ms, and 0 observations with response times between 200-300ms since the 00:00 report.

Delta Histogram

A delta histogram also counts observations in buckets, but the counts are the difference since the last report.

Example: New response times in the same buckets.

Time	<100ms	100-200ms	200-300ms	>300ms
00:00	0	0	0	0
00:10	5	0	0	0
00:20	5	2	0	0
00:30	5	3	1	0
00:40	5	3	1	1
00:50	5	4	1	0
01:00	5	3	2	1
01:10	5	3	2	1
01:20	5	3	3	0
01:30	5	3	2	1
01:40	5	4	2	1
01:50	5	4	1	0
02:00	5	3	2	1

In this table, each row after 00:00 shows the count of new observations in each response time bucket since the last report. For instance, at 00:20, there were 5 new observations with response times under 100ms and 2 new observations with response times between 100-200ms since the 00:10 report.

Time and Spatial Aggregation Explained for Metrics Data

This document clarifies the concepts of time and spatial aggregation in the context of metrics data analysis.

Time Aggregation

Time aggregation is the aggregation of all the measurement values for a time series over a specified aggregation interval.
Aggregation interval is dynamically adjusted based on the selected time range.
Various aggregation operators are available, including avg, sum, min, max, count, etc.
The result is a single value for each aggregation interval.

Spatial Aggregation

Combines data points from multiple time series within a specific spatial dimension(s). The spatial dimension(s) could be a host, a region, a cluster, etc.
Various aggregation operators are available, including avg, sum, min, max, count, etc.
The result is a single value representing the aggregated data for the chosen spatial dimension(s).

The following table represents the metrics data from five hosts h1, h2, h3, h4, and h5 spread across r1, r2, and r3 regions. Assume the reported value is memory usage in MB for each host. The timestamp is mm:ss (minute:second) format and ranges from 10th minute 00 seconds to 12th minute 30 seconds with a collection interval of 10 seconds. The region r1 has two hosts h1 and h2, region r2 has one host h3 and region r3 has two hosts h4 and h5. The metrics data is collected for 150 sec.

Time	(h1, r1)	(h2, r1)	(h3, r2)	(h4, r3)	(h5,r3)
10:00	45	63	58	32	56
10:10	72	90	87	35	81
10:20	56	85	95	74	72
10:30	73	98	71	63	85
10:40	97	88	56	91	36
10:50	67	48	42	31	76
11:00	65	95	30	35	96
11:10	81	39	68	69	77
11:20	57	75	50	40	43
11:30	54	45	68	48	53
11:40	85	77	39	63	31
11:50	77	52	71	32	88
12:00	30	97	90	51	55
12:10	82	92	83	41	32
12:20	95	37	56	65	91
12:30	53	95	37	94	66

We can't display the raw data since it is too big. So we first perform the aggregation on the time axis for each unique series. There are 5 time series in the above table. We could use the aggregation operator avg to get the representative value for each 30 seconds. The aggregation result is shown in the following table.

ts	(h1, r1)	(h2, r1)	(h3, r2)	(h4, r3)	(h5, r3)
10:00	57.6667	79.3333	80	47	69.6667
10:30	79	78	56.3333	61.6667	65.6667
11:00	67.6667	69.6667	49.3333	48	72
11:30	72	58	59.3333	47.6667	57.3333
12:00	69	75.3333	76.3333	52.3333	59.3333
12:30	53	95	37	94	66

Even this table could be too big to display if there were hundreds of hosts. The spatial aggregation is performed on the result of the time aggregation. We could use the aggregation operator sum to get the total memory usage.

Total Memory Usage for Each Region

ts	r1	r2	r3
10:00	137	80	116.667
10:30	157	56.3333	127.333
11:00	137.333	49.3333	120
11:30	130	59.3333	105
12:00	144.333	76.3333	111.667
12:30	148	37	160

Total Memory Usage for Each Host

ts	(h1, r1)	(h2, r1)	(h3, r2)	(h4, r3)	(h5, r3)
10:00	57.6667	79.3333	80	47	69.6667
10:30	79	78	56.3333	61.6667	65.6667
11:00	67.6667	69.6667	49.3333	48	72
11:30	72	58	59.3333	47.6667	57.3333
12:00	69	75.3333	76.3333	52.3333	59.3333
12:30	53	95	37	94	66

Note: this table is the same as the time aggregation result because each host is unique and there are no sub time series for each host. Other metrics such as DISK usage could have sub time series for each host (usage from each partition). In that case, the spatial aggregation result could be different from the time aggregation result.

Total Memory Usage from All Hosts

ts	All
10:00	333.667
10:30	340.667
11:00	306.667
11:30	294.333
12:00	332.333
12:30	345

Default aggregation operators

Based on the metric type, the default time and space aggregation operators are chosen. The following table shows the default aggregation operators for each metric type.

Metric Type	Default Time Aggregation Operator	Default Space Aggregation Operator
Counter	rate	sum
Gauge	avg	sum
Histogram	rate	sum

Histograms are a special case because the value isn't a single number but a group of numbers. The most common use case is to calculate the quantiles. The current implementation supports the following quantiles: 0.5, 0.9, 0.95, 0.99. The time and space aggregation produces the distribution of observations in each bucket. The quantiles are calculated from the distribution.

Implementation Details

The schema of the metrics database tables is as follows:

CREATE TABLE signoz_metrics.time_series_v2
(
    `metric_name` LowCardinality(String),
    `fingerprint` UInt64 CODEC(DoubleDelta, LZ4),
    `timestamp_ms` Int64 CODEC(DoubleDelta, LZ4),
    `labels` String CODEC(ZSTD(5)),
    `temporality` LowCardinality(String) DEFAULT 'Unspecified' CODEC(ZSTD(5)),
    INDEX temporality_index temporality TYPE SET(3) GRANULARITY 1
)
ENGINE = ReplacingMergeTree
PARTITION BY toDate(timestamp_ms / 1000)
ORDER BY (metric_name, fingerprint)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1

metric_name: Name of the metric
fingerprint: Fingerprint of the metric. This is used to identify the metric uniquely. Currently,
we are using the hash of the labels to generate the fingerprint.
timestamp_ms: Timestamp of the metric when it was observed for the first time in milliseconds
labels: Labels of the metric; Stored as a JSON string
temporality: Temporality of the metric. This is used to identify the type of the metric. It can
be one of the following values:
- Unspecified: This is the default value.
- Cumulative: This is used for monotonic counters.
- Delta: This is used for non-monotonic counters.

CREATE TABLE signoz_metrics.samples_v2
(
    `metric_name` LowCardinality(String),
    `fingerprint` UInt64 CODEC(DoubleDelta, LZ4),
    `timestamp_ms` Int64 CODEC(DoubleDelta, LZ4),
    `value` Float64 CODEC(Gorilla, LZ4)
)
ENGINE = MergeTree
PARTITION BY toDate(timestamp_ms / 1000)
ORDER BY (metric_name, fingerprint, timestamp_ms)
TTL toDateTime(timestamp_ms / 1000) + toIntervalSecond(2592000)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1

metric_name: Name of the metric
fingerprint: Fingerprint of the metric. This is used to identify the metric uniquely. Currently,
we are using the hash of the labels to generate the fingerprint.
timestamp_ms: Timestamp of the metric in milliseconds
value: Value of the metric

Query preparation

As there are two tables for metrics, any query on metrics will need to join these two tables. First, we get the fingerprints of the metrics that match the query criteria from the time_series_v2 table. Then, we join the samples_v2 table with the time_series_v2 table to get the actual metric values. The are three to four steps in the query preparation:

Get the fingerprints of the metrics that match the query criteria from the time_series_v2 table.
Join the samples_v2 table with the time_series_v2 table to get the actual metric values.
Apply the time aggregation operator on the metric values.
Apply the space aggregation operator on the metric values.

A typical query looks like the following:

SELECT
    ts,
    sum(per_series_value) AS value
FROM
(
    SELECT
        fingerprint,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
        avg(value) AS per_series_value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT DISTINCT fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
    GROUP BY
        fingerprint,
        ts
    ORDER BY
        fingerprint ASC,
        ts ASC
)
WHERE isNaN(per_series_value) = 0
GROUP BY
    GROUPING SETS (
        (ts),
        ())
ORDER BY ts ASC

The query can be broken down into the following steps:

Get the fingerprints of the metrics that match the query criteria from the time_series_v2 table.

SELECT DISTINCT fingerprint
FROM signoz_metrics.time_series_v2
WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')

Join the tables and apply the time aggregation operator on the metric values.

SELECT
    fingerprint,
    toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
    avg(value) AS per_series_value
FROM signoz_metrics.distributed_samples_v2
INNER JOIN
(
    SELECT DISTINCT fingerprint
    FROM signoz_metrics.time_series_v2
    WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
) AS filtered_time_series USING (fingerprint)
WHERE (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
GROUP BY
    fingerprint,
    ts
ORDER BY
    fingerprint ASC,
    ts ASC

Apply the space aggregation operator on the metric values.

SELECT
    ts,
    sum(per_series_value) AS value
FROM
(
    SELECT
        fingerprint,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
        avg(value) AS per_series_value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT DISTINCT fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
    GROUP BY
        fingerprint,
        ts
    ORDER BY
        fingerprint ASC,
        ts ASC
)
WHERE isNaN(per_series_value) = 0
GROUP BY
    GROUPING SETS (
        (ts),
        ())
ORDER BY ts ASC

This is a simple example, things gets little complicated when we need to compute rates and percentiles.

The major changes in the metrics builder improvements are

temporal and spatial aggregation
functions support

I will be send a series of PRs to implement these changes. This PR is the first one in the series.

srikanthccv · 2023-12-06T19:11:13Z

I added the comprehensive description that sets the stage for the dozen PRs I have in the pipeline for new metrics builder changes. The examples illustrate what happens in the background for the chosen small raw data. This should help you understand how metrics work. Please go through it. Let me know if there are any questions on anything not just changes in this PR. A part of this description will also go into the docs. My goal is to make you understand first since you are one of the end users.

ankitnayan · 2023-12-10T08:35:45Z

toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,

toIntervalSecond(60)) should be configurable, atleast at the API level even if we automatically decide things now in the frontend. But someday we can enable users to choose their own aggregation interval

WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')

Can there be different temporalities for the same metric_name? If yes, we should move temporality as the 1st sorting key?

Regarding INNER JOIN, say we do A (1000 rows) inner join B (100 rows), the intersection would be 100 rows. If we are interested in the 100 rows, I think A will have a lot many extra fingerprints as label filtering is not there in samples table. Does it affect performance? cc @dhawal1248

For the above query, can we use:

SELECT
        fingerprint,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
        avg(value) AS per_series_value
    FROM signoz_metrics.distributed_samples_v2
   WHERE fingerprint IN
    (
        SELECT DISTINCT fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
    )
    AND (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
    GROUP BY
        fingerprint,
        ts
    ORDER BY
        fingerprint ASC,
        ts ASC

instead of below

SELECT
        fingerprint,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
        avg(value) AS per_series_value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT DISTINCT fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
    GROUP BY
        fingerprint,
        ts
    ORDER BY
        fingerprint ASC,
        ts ASC

srikanthccv · 2023-12-10T11:04:56Z

toIntervalSecond(60)) should be configurable, atleast at the API level even if we automatically decide things now in the frontend. But someday we can enable users to choose their own aggregation interval

It is configurable today. It was planned to be configurable from the frontend also but due to a bug in frontend we disabled it in UI.

Can there be different temporalities for the same metric_name? If yes, we should move temporality as the 1st sorting key?

Usually no, the exception is when someone is transitioning from one to another, they could send the same metrics with different temporalities to be backfilled and then eventually only send one temporality. We are going to do the same for span metrics see SigNoz/charts#355

Regarding INNER JOIN, say we do A (1000 rows) inner join B (100 rows), the intersection would be 100 rows. If we are interested in the 100 rows, I think A will have a lot many extra fingerprints as label filtering is not there in samples table. Does it affect performance?

It does affect; ClickHouse doesn't shine at JOINS. The "ClickHouse way" of doing things is using wide tables. The temporality is part of ORDER BY https://github.com/SigNoz/signoz-otel-collector/blob/1fe5faae2cfef2e32ee0f5021a532c10436f7a5b/migrationmanager/migrators/metrics/migrations/000001_init_db.up.sql#L43-L53 for v3 table. We are going to move to this table soon.

can we use: ... instead of below

No, when there is a group by the result should include the group by labels, It's not possible with IN because there are no labels on samples table.

This is an invalid query

SELECT
    fingerprint,
    service_name,
    toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
    avg(value) AS per_series_value
FROM signoz_metrics.distributed_samples_v2
WHERE fingerprint IN
(
    SELECT DISTINCT fingerprint
    FROM signoz_metrics.time_series_v2
    WHERE (metric_name = 'system_memory_usage') AND (temporality = 'Unspecified') AND (JSONExtractString(labels, 'state') != 'idle')
)
AND (metric_name = 'system_memory_usage') AND (timestamp_ms >= 1701794980000) AND (timestamp_ms <= 1701796780000)
GROUP BY
    fingerprint,
    service_name,
    ts
ORDER BY
    fingerprint ASC,
    service_name,
    ts ASC

dhawal1248 · 2023-12-11T10:56:54Z

@srikanthccv can you share the code link of where we start this query prep?

srikanthccv · 2023-12-11T11:00:53Z

can you share the code link of where we start this query prep?

By this I assume you are asking what exists in production today; This is the entry point

signoz/pkg/query-service/app/metrics/v3/query_builder.go

Line 445 in fd9566d

    
           func PrepareMetricQuery(start, end int64, queryType v3.QueryType, panelType v3.PanelType, mq *v3.BuilderQuery, options Options) (string, error) {

srikanthccv · 2023-12-12T01:54:26Z

I am going to merge this but you can review and ask any questions.

chore: update BuilderQuery struct and add PrepareTimeseriesFilterQuery

0c4cd10

github-actions bot added the chore label Dec 6, 2023

Merge branch 'develop' into 4016-1-filter-sub-query

229b2ba

srikanthccv marked this pull request as ready for review December 6, 2023 19:11

srikanthccv requested review from makeavish and nityanandagohain December 6, 2023 19:11

ankitnayan requested a review from dhawal1248 December 6, 2023 19:21

dhawal1248 approved these changes Dec 11, 2023

View reviewed changes

Merge branch 'develop' into 4016-1-filter-sub-query

35143cb

srikanthccv merged commit 9360c61 into develop Dec 12, 2023
11 checks passed

srikanthccv deleted the 4016-1-filter-sub-query branch December 12, 2023 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: update BuilderQuery struct and add PrepareTimeseriesFilterQuery #4165

chore: update BuilderQuery struct and add PrepareTimeseriesFilterQuery #4165

srikanthccv commented Dec 6, 2023 •

edited

Loading

srikanthccv commented Dec 6, 2023

ankitnayan commented Dec 10, 2023

srikanthccv commented Dec 10, 2023

dhawal1248 commented Dec 11, 2023

srikanthccv commented Dec 11, 2023

srikanthccv commented Dec 12, 2023

chore: update BuilderQuery struct and add PrepareTimeseriesFilterQuery #4165

chore: update BuilderQuery struct and add PrepareTimeseriesFilterQuery #4165

Conversation

srikanthccv commented Dec 6, 2023 • edited Loading

Summary

Overview

Metric types

Temporality

Cumulative Counter

Delta Counter

Gauge

Cumulative Histogram

Delta Histogram

Time and Spatial Aggregation Explained for Metrics Data

Time Aggregation

Spatial Aggregation

Total Memory Usage for Each Region

Total Memory Usage for Each Host

Total Memory Usage from All Hosts

Default aggregation operators

Implementation Details

Query preparation

srikanthccv commented Dec 6, 2023

ankitnayan commented Dec 10, 2023

srikanthccv commented Dec 10, 2023

dhawal1248 commented Dec 11, 2023

srikanthccv commented Dec 11, 2023

srikanthccv commented Dec 12, 2023

srikanthccv commented Dec 6, 2023 •

edited

Loading