Return error code when scrape fails #48

MiguelNdeCarvalho · 2021-03-19T12:09:09Z

https://prometheus.io/docs/instrumenting/writing_exporters/#failed-scrapes

geerlingguy · 2021-04-09T02:41:22Z

Could this lead to the container going into an 'unhealthy' state? I'm using this in a Pi-based internet monitoring setup, and it seems like a couple times a day the container just goes unhealthy, and the metrics endpoint returns nothing. In the logs, there are no errors that are output:

[2021-04-08 23:52:12.008] [error] Trying to get interface information on non-initialized socket.
08/04/2021 23:52:47 - Server: 23097 | Jitter: 4.009 ms | Ping: 48.013 ms | Download: 19.63 Mb/s | Upload:5.55 Mb/s
[pid: 1|app: 0|req: 517/517] 172.18.0.6 () {32 vars in 477 bytes} [Thu Apr  8 23:52:06 2021] GET /metrics => generated 2764 bytes in 40519 msecs (HTTP/1.1 200) 1 headers in 93 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 518/518] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:52:47 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 519/519] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:52:51 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 520/520] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:53:22 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 521/521] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:53:52 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 522/522] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:54:22 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 523/523] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:54:52 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 524/524] 127.0.0.1 () {28 vars in 292 bytes} [Thu Apr  8 23:55:22 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 525/525] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:55:52 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 526/526] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:56:23 2021] GET / => generated 88 bytes in 2 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 527/527] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:56:53 2021] GET / => generated 88 bytes in 2 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)

And running date on the container:

$ docker exec 4088 date
Fri Apr  9 02:40:26 UTC 2021

(So you can see it's been a few hours it was down.)

I'm monitoring a Starlink connection, so I'm wondering if maybe if the network goes away completely sometimes, it causes an unhandled exception or something (but even then... you'd think it would kill the flask app, not just keep it running doing nothing).

MiguelNdeCarvalho · 2021-04-09T17:33:34Z

Hey,

@geerlingguy I don't know what to say eheh, I'm one of your subscribers. That's really good to see that you are using my project WOW.
Now let's talk about the exporter.
I will release v3.1 still today with:

Drop uWSGI and use waitress instead (will be a lighter image and more easy to setup outside of container)
Update prometheus-client

Basically the check is just checking if the flask app is still working. When your exporter stops scraping, what is the status of exporter in Prometheus client? Are you using an specific server to do the tests?
I'm willing to fix this issue, but I will need some logs from your side please. I have the exporter running on my side for weeks without a single problem.

Thanks,
MiguelNdeCarvalho

MiguelNdeCarvalho · 2021-04-09T19:32:06Z

Hey again,

@geerlingguy can you try the v3.1 and see if you continue with the same problems?

Thanks,
MiguelNdeCarvalho

geerlingguy · 2021-04-10T16:05:20Z

@MiguelNdeCarvalho - Thanks! I just updated the Pi to 3.1 after seeing the same dropout this morning. There are no additional logs (just ... stops it seems), and if I log into the container and run wget localhost:9798 it just times out after a while (see: geerlingguy/internet-monitoring#1 (comment)).

I will keep monitoring the speedtest container and see if it stays up longer this time.

Note that I have another Pi running on my local cable Internet connection with the exact same config, and it keeps running (has been going two weeks now). So I wonder if something about the connection with Starlink (vs. Cable) is throwing the app for a loop (in some weird way that is not triggering an exception).

MiguelNdeCarvalho · 2021-04-10T16:28:44Z

Hey again @geerlingguy,

Basically yeasterday @Doacola has deployed the stack that you have done in his Pi4 (32 Bits) and he didn't got that weird behaviour that you are getting in the Pi4 connected to the Starlink. Right now I have done some tests:

1st - Deployed exporter in docker and I have done the first trigger
2nd - Triggered 2nd test but disconnected network after I started it, it just returned a 500 as it should
3rd -Triggered 3rd test still without connecting to the internet
4th - Reconnected the internet and triggered the 4th test and it worked as it should

I think this should be related to the speedtest cli itself. I think to debug this, you should install the speedtest cli and run an cronjob to trigger it every 30 minutes as your configuration and save the output from that runs to a file. Then we could check to see if the problem is coming from that or not.

Thanks,
MiguelNdeCarvalho

Ps. I hope that you talk about how you are monitoring it in next video 😉

MiguelNdeCarvalho · 2021-05-19T09:48:27Z

Everything is working fine now, so I'm going to close this

MiguelNdeCarvalho added the enhancement New feature or request label Mar 19, 2021

MiguelNdeCarvalho self-assigned this Mar 19, 2021

MiguelNdeCarvalho assigned Doacola Mar 19, 2021

geerlingguy mentioned this issue Apr 9, 2021

Speedtest container goes unhealthy after a few hours (consistently) geerlingguy/internet-monitoring#1

Closed

MiguelNdeCarvalho closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return error code when scrape fails #48

Return error code when scrape fails #48

MiguelNdeCarvalho commented Mar 19, 2021

geerlingguy commented Apr 9, 2021

MiguelNdeCarvalho commented Apr 9, 2021 •

edited

Loading

MiguelNdeCarvalho commented Apr 9, 2021

geerlingguy commented Apr 10, 2021

MiguelNdeCarvalho commented Apr 10, 2021 •

edited

Loading

MiguelNdeCarvalho commented May 19, 2021

Return error code when scrape fails #48

Return error code when scrape fails #48

Comments

MiguelNdeCarvalho commented Mar 19, 2021

geerlingguy commented Apr 9, 2021

MiguelNdeCarvalho commented Apr 9, 2021 • edited Loading

MiguelNdeCarvalho commented Apr 9, 2021

geerlingguy commented Apr 10, 2021

MiguelNdeCarvalho commented Apr 10, 2021 • edited Loading

MiguelNdeCarvalho commented May 19, 2021

MiguelNdeCarvalho commented Apr 9, 2021 •

edited

Loading

MiguelNdeCarvalho commented Apr 10, 2021 •

edited

Loading