Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different result same query to Prometheus and VictoriaMetrics #237

Closed
a-illiushchenia opened this issue Nov 15, 2019 · 11 comments
Closed

Different result same query to Prometheus and VictoriaMetrics #237

a-illiushchenia opened this issue Nov 15, 2019 · 11 comments
Labels
bug Something isn't working

Comments

@a-illiushchenia
Copy link

Describe the bug
I use same query to Prometheus and VictoriaMetrics in Grafana, but have different graphs in Grafana (see screenshots):

  • graph in Prometheus higher (value = 2), but shorter (two times: 15:50:45 - 15:51:30, 15:51:45 - 15:52:30);
  • graph in Prometheus lower (value = 1), but longer (one time: 15:50:45 - 15:53:00);

To Reproduce
Use one query to Prometheus and VictoriaMetrics

Expected behavior
Graphs must be same

Screenshots
Graph for prometheus:

prometheus

Graph for victoriaMetric:

victoriaMetric

Version
victoria-metrics-20190822-120009-tags-v1.26.0-0-g1272e407

Additional context

@valyala
Copy link
Collaborator

valyala commented Nov 15, 2019

This issue can be related to the fact that Prometheus and VictoriaMetrics differently calculate increase:

IMHO, VictoriaMetrics' calculations are better than Prometheus' in this case, since they never return floating-point values for integer counter increase.

In order to prove the assumption, could you post graphs for the following query on both Prometheus and VictoriaMetrics on the same time range as graphs above?

ssw_log:counter:by_instance_level_source{job="ssw-log", level="E", instance="ss5-ss-prod-3:3903"}

This will allow calculating manually increase values according to the aforementioned algorithms for VictoriaMetrics and PRometheus.

@valyala valyala added the bug Something isn't working label Nov 15, 2019
@valyala
Copy link
Collaborator

valyala commented Nov 21, 2019

@a-illiushchenia , are there any updates?

@a-illiushchenia
Copy link
Author

Yes, and it is wery interesting:
I took enather interval:

  1. sum(ssw_log:counter:by_instance_level_source{job="ssw-log", level="E", instance="ss5-ss-prod-1:3903"})

We have increace 1 point and it dispay correct in Prometheus and VictoriaMetrics:

Prometheus:

Prom_1

Prom_2

VictoriaMetrics:

VictMetr_1

VictMetr_2

  1. sum by (instance)(increase(ssw_log:counter:by_instance_level_source{job="ssw-log", level="E"}[1m]))

Prometheus:

Prom_3

VictoriaMetrics:

VictMetr_3

@hagen1778
Copy link
Collaborator

Hi @a-illiushchenia !
Which instance has that 2 increase on Prometheus pic? Could you pls compare sum and increase for that particular instance in both Prom and VM?

@lammel
Copy link

lammel commented Nov 25, 2019

We found the prometheus implementations of increase and delta pretty useless for our data, as the extrapolation performed in the functions does not deliver correct results, Usually actual increases in the metric are ignored.

This is why we replaced our queries

sum(idelta(my_metrics{instance=~"$instance"}[$__interval]) by (something)

with:

sum(my_metrics{instance=~"$instance"} - my_metrics{instance=~"$instance"} offset $__interval >= 0) by (something)

This (although requiring double lookups of the series) correct results and does not miss increases of a time series.

@valyala
Copy link
Collaborator

valyala commented Nov 25, 2019

@a-illiushchenia , the graphs show that Prometheus returns incorrect +2 increase for the actual +1 increase for the given time series. And the increase lasts for 30 seconds, while it should last for 1 minute according to [1m] time window passed to increase() function. VictoriaMetrics' graphs look correct.

@lammel , great solution! Note that the query can be improved with with templates and remove_resets() function from Extended PromQL in the following ways:

  1. Mention my_metric{instance=~"$instance"} only once.
  2. Remove possible counter resets.

The resulting query would look like:

with (
    q = remove_resets(my_metrics{instance=~"$instance"})
)
sum(q - q offset $__interval) by (something)

@lammel
Copy link

lammel commented Nov 27, 2019

@valyala , the extended promql looks very polished. We will look into VM soon with the prometheus remote write setup.

Do the increase/delta functions work correct without extrapolation in victoria metrics (can I assume no increase is missed)?
From your update in prometheus issue 3806# I assume that VM delta function can be safely used, is efficient and correct.

@valyala
Copy link
Collaborator

valyala commented Nov 27, 2019

Do the increase/delta functions work correct without extrapolation in victoria metrics (can I assume no increase is missed)?

Yes, both functions in VictoriaMetrics should return the exact increase / delta on the given time window in square brackets. If the time window is missing, then it is equal to step value - i.e. the interval between two adjacent points on the graph.

@valyala
Copy link
Collaborator

valyala commented Nov 27, 2019

As for PromQL, it is great, but unfortunately it cannot be used with Promxy yet, since it understands only standard PromQL :( There are plans to fix this in the future - see this issue for details.

@valyala
Copy link
Collaborator

valyala commented Nov 27, 2019

@a-illiushchenia , I'm going to close this issue as working as intended. Feel free re-opening it or adding additional details if you feel that VictoriaMetrics has issues with PromQL results.

@wentfar
Copy link

wentfar commented Aug 15, 2022

this do not process application restart that causes metric reset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants