Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmagent and scrape_timeout #357

Closed
v98765 opened this issue Mar 4, 2020 · 3 comments
Closed

vmagent and scrape_timeout #357

v98765 opened this issue Mar 4, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@v98765
Copy link

v98765 commented Mar 4, 2020

Describe the bug
Глобальный параметр scrape_timeout: 30s не игнорируется, но не применяется

To Reproduce
Запустить snmp_exporter и обратиться к недоступным хостам

# arp
Address                  HWtype  HWaddress           Flags Mask            Iface
10.0.0.209                    (incomplete)                              ens32
10.0.0.211            ether   00:50:56:ba:1a:fe   C                     ens32

# time curl http://127.0.0.1:9116/snmp?module=cisco_l2\&target=10.0.0.209
An error has occurred while serving metrics:

error collecting metric Desc{fqName: "snmp_error", help: "Error scraping target", constLabels: {}, variableLabels: []}: error getting target 10.0.0.209: Request timeout (after 3 retries)

real    4m0.019s
user    0m0.016s
sys     0m0.016s


#time curl http://127.0.0.1:9116/snmp?module=cisco_l2\&target=10.0.0.211
An error has occurred while serving metrics:

error collecting metric Desc{fqName: "snmp_error", help: "Error scraping target", constLabels: {}, variableLabels: []}: error getting target 10.0.0.211: Error reading from socket: read udp 10.0.0.214:43613->10.0.0.211:161: read: connection refused

real    0m0.033s
user    0m0.016s
sys     0m0.008s

scrape.yml

global:
  scrape_interval: 60s
  scrape_timeout:  30s

Логи vmagent

2020-03-04T02:14:38.728Z        info    VictoriaMetrics/lib/promscrape/scraper.go:162   restarting scrapers for changed `file_sd_config` targets
2020-03-04T02:14:39.089Z        info    VictoriaMetrics/lib/promscrape/scraper.go:149   stopped all the 141 scrapers for `file_sd_config` targets
2020-03-04T02:14:39.089Z        info    VictoriaMetrics/lib/promscrape/scraper.go:145   starting 141 scrapers for `file_sd_config` targets
2020-03-04T02:14:44.931Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:15:44.932Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:16:44.929Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:17:44.935Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:18:44.933Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:19:44.931Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:20:44.931Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:21:44.928Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:22:44.931Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:23:44.930Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:24:44.932Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:25:44.928Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:26:44.934Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:27:44.930Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:28:44.930Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:29:44.930Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:30:44.930Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:31:44.935Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:32:44.931Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:33:44.930Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:34:44.931Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:35:44.928Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:36:44.930Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:37:44.929Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:38:44.934Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:39:44.930Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211": unexpected status code returned when scraping "http://
2020-03-04T02:40:38.730Z        info    VictoriaMetrics/lib/promscrape/scraper.go:162   restarting scrapers for changed `file_sd_config` targets
2020-03-04T02:40:38.906Z        info    VictoriaMetrics/lib/promscrape/scraper.go:149   stopped all the 141 scrapers for `file_sd_config` targets
2020-03-04T02:40:38.906Z        info    VictoriaMetrics/lib/promscrape/scraper.go:145   starting 141 scrapers for `file_sd_config` targets
2020-03-04T02:43:48.624Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.209": error when scraping "http://127.0.0.1:9116/snmp?module
2020-03-04T02:46:18.628Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.209": error when scraping "http://127.0.0.1:9116/snmp?module
2020-03-04T02:48:48.634Z        error   VictoriaMetrics/lib/promscrape/scrapework.go:127        error when scraping "http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.209": error when scraping "http://127.0.0.1:9116/snmp?module

Хочу обратить внимание на scrape_duration=150.007s и with timeout 30s: timeout, который совпадает с указанным в конфиге.

# curl http://localhost:8429/targets
job="cisco_l2" (67/69 up)
        state=down, endpoint=http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.209, labels={instance="10.0.0.209", job="cisco_l2", obj="twr"}, last_scrape=179.283s ago, scrape_duration=150.007s, error="error when scraping \"http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.209\" with timeout 30s: timeout"
        state=down, endpoint=http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211, labels={instance="10.0.0.211", job="cisco_l2", obj="twr"}, last_scrape=272.973s ago, scrape_duration=0.003s, error="unexpected status code returned when scraping \"http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.0.211\": 500; expecting 200; response body: \"An error has occurred while serving metrics:\\n\\nerror collecting metric Desc{fqName: \\\"snmp_error\\\", help: \\\"Error scraping target\\\", constLabels: {}, variableLabels: []}: error getting target 10.0.0.211: Error reading from socket: read udp 10.0.0.214:35534->10.0.0.211:161: read: connection refused\\n\""


#curl http://localhost:8429/targets
job="cisco_l2" (67/68 up)

Количество targets отличается какое-то время. В логах видно в обоих случаях.

starting 141 scrapers for `file_sd_config` targets

Т.е. в целом кол-во таргетов не изменилось. Фактически была замена адреса 10.0.0.211 на 10.0.0.209, но в /targets какое-то время показывается оба хоста.

Expected behavior

# arp
Address                  HWtype  HWaddress           Flags Mask            Iface
10.0.1.122                     (incomplete)                              ens160
# time curl http://127.0.0.1:9116/snmp?module=cisco_l2\&target=10.0.1.122
An error has occurred while serving metrics:

error collecting metric Desc{fqName: "snmp_error", help: "Error scraping target", constLabels: {}, variableLabels: []}: error getting target 10.0.1.122: Request timeout (after 3 retries)

real    4m0.018s
user    0m0.010s
sys     0m0.018s

Опрос так же 4 минуты.
Данные с Prometheus http://10.0.1.20:9090/targets

Endpoint 	State 	Labels 	Last Scrape 	Scrape Duration 	Error
http://127.0.0.1:9116/snmp
module="cisco_l2" target="10.0.1.122" 	down 	instance="10.0.1.122" job="cisco_l2" obj="r51" 	1m29.085s ago 	30s 	Get http://127.0.0.1:9116/snmp?module=cisco_l2&target=10.0.1.122: context deadline exceeded

Scrape Duration 30s по причине context deadline exceeded

Version

vmagent-20200303-194035-tags-v1.34.2-0-g0b1e877a

Used command-line flags

@valyala valyala added the bug Something isn't working label Mar 6, 2020
valyala added a commit that referenced this issue Mar 9, 2020
This should prevent from the following unexpected side-effects of idempotent request retries:
- increased actual timeout when scraping the target comparing to the configured scrape_timeout
- increased load on the target

Updates #357
valyala added a commit that referenced this issue Mar 9, 2020
This should prevent from the following unexpected side-effects of idempotent request retries:
- increased actual timeout when scraping the target comparing to the configured scrape_timeout
- increased load on the target

Updates #357
@valyala
Copy link
Collaborator

valyala commented Mar 9, 2020

The issue should be fixed in the commit 7c432da .

It will be available in the next release. In the mean time it is possible to build vmagent from sources according to these docs.

@v98765
Copy link
Author

v98765 commented Mar 10, 2020

спасибо. все работает.
scrape_duration=30.002s, error="error when scraping .. with timeout 30s: timeout"
vmagent-20200310-015134-v1.34.2-14-g7c432da

@valyala
Copy link
Collaborator

valyala commented Mar 11, 2020

The bugfix is available starting from v1.34.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants