help request: prometheus collection indicator interface timeout #11274

smileby · 2024-05-22T11:23:59Z

Description

When I used prometheus to collect apisix monitoring data, I found that the /apisix/prometheus/metrics data interface occasionally took a long time, causing grafana monitoring data to be unstable. what is the reason?

How should we optimize or solve the time-consuming problem of APISIX data interface?

Except for the following log occasionally, there are no other abnormalities.

2024/05/20 21:15:27 [warn] 31025#0: *11889570993 [lua] conf_server.lua:181: report_failure(): report failure, endpoint: xxx.xxx.xxx.xxx:2579 count: 1 while connecting to upstream, client: unix:, server: , request: "POST /v3/lease/grant HTTP/1.1", upstream: "http://xxx.xxx.xxx.xxx:2579/v3/lease/grant", host: "127.0.0.1"

Environment

APISIX version (run apisix version): 3.2.0
Operating system (run uname -a): Linux localhost.localdomain 3.10.0-327.el7.x86_64 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
OpenResty / Nginx version (run openresty -V or nginx -V): openresty/1.21.4.1
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info): 3.5.4
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):

The text was updated successfully, but these errors were encountered:

hanqingwu · 2024-05-24T09:49:05Z

what is your apisix pod resource request & limit config ? is it enough ?

smileby · 2024-05-28T03:02:32Z

I used the default configuration. The shared_dict size used by prometheus-metrics is 10M. I saw on the monitoring that the peak usage of prometheus-metrics has reached 30m+, and the lowest is 21m. I don’t know if it is caused by insufficient space. My APISIX monitoring appears. Lots of volatility

smileby · 2024-05-28T03:05:47Z

When the number of connections is 2000, this problem does not exist. Can it be solved by increasing the capacity of shared_dict?

smileby · 2024-05-28T03:11:53Z

The picture below is my question. At this time, the /apisix/prometheus/metrics interface is very time-consuming.

hanqingwu · 2024-05-28T08:47:26Z

can you try add some log to check what process cost most time ?

smileby · 2024-05-28T09:33:16Z

hen I used prometheus to collect apisix monitoring data, I found that the /apisix/prometheus/metrics data interface occasionally took a long time, causing grafana monitoring data to be unstable. what is the reaso

My guess is that the amount of data is too large, resulting in time-consuming data transmission. The question I am more concerned about is why the monitoring fluctuates. Is it because the shared_dict elimination policy is triggered, resulting in data loss? Do you have any better suggestions for monitoring fluctuations, or can I try to expand shared_dict?

hanqingwu · 2024-05-28T09:53:44Z

yes, you can try to expand shared_dict
or check whether metrics loss because of timeout

smileby · 2024-05-28T10:26:27Z

yes, you can try to expand shared_dict or check whether metrics loss because of timeout

thank you for your reply.

I want to know about APISIX's shared_dict configuration of prometheus' default configuration is 10mb. However, when collecting monitoring data, it will be 2-3 times the default configuration.

smileby closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

help request: prometheus collection indicator interface timeout #11274

help request: prometheus collection indicator interface timeout #11274

smileby commented May 22, 2024 •

edited

Loading

hanqingwu commented May 24, 2024

smileby commented May 28, 2024

smileby commented May 28, 2024

smileby commented May 28, 2024

hanqingwu commented May 28, 2024

smileby commented May 28, 2024

hanqingwu commented May 28, 2024

smileby commented May 28, 2024

help request: prometheus collection indicator interface timeout #11274

help request: prometheus collection indicator interface timeout #11274

Comments

smileby commented May 22, 2024 • edited Loading

Description

When I used prometheus to collect apisix monitoring data, I found that the /apisix/prometheus/metrics data interface occasionally took a long time, causing grafana monitoring data to be unstable. what is the reason?

How should we optimize or solve the time-consuming problem of APISIX data interface?

Environment

hanqingwu commented May 24, 2024

smileby commented May 28, 2024

smileby commented May 28, 2024

smileby commented May 28, 2024

hanqingwu commented May 28, 2024

smileby commented May 28, 2024

hanqingwu commented May 28, 2024

smileby commented May 28, 2024

smileby commented May 22, 2024 •

edited

Loading