Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return error code when scrape fails #48

Closed
MiguelNdeCarvalho opened this issue Mar 19, 2021 · 6 comments
Closed

Return error code when scrape fails #48

MiguelNdeCarvalho opened this issue Mar 19, 2021 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@MiguelNdeCarvalho
Copy link
Owner

https://prometheus.io/docs/instrumenting/writing_exporters/#failed-scrapes

@MiguelNdeCarvalho MiguelNdeCarvalho added the enhancement New feature or request label Mar 19, 2021
@MiguelNdeCarvalho MiguelNdeCarvalho self-assigned this Mar 19, 2021
@geerlingguy
Copy link

Could this lead to the container going into an 'unhealthy' state? I'm using this in a Pi-based internet monitoring setup, and it seems like a couple times a day the container just goes unhealthy, and the metrics endpoint returns nothing. In the logs, there are no errors that are output:

[2021-04-08 23:52:12.008] [error] Trying to get interface information on non-initialized socket.
08/04/2021 23:52:47 - Server: 23097 | Jitter: 4.009 ms | Ping: 48.013 ms | Download: 19.63 Mb/s | Upload:5.55 Mb/s
[pid: 1|app: 0|req: 517/517] 172.18.0.6 () {32 vars in 477 bytes} [Thu Apr  8 23:52:06 2021] GET /metrics => generated 2764 bytes in 40519 msecs (HTTP/1.1 200) 1 headers in 93 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 518/518] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:52:47 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 519/519] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:52:51 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 520/520] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:53:22 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 521/521] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:53:52 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 522/522] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:54:22 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 523/523] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:54:52 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 524/524] 127.0.0.1 () {28 vars in 292 bytes} [Thu Apr  8 23:55:22 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 525/525] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:55:52 2021] GET / => generated 88 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 526/526] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:56:23 2021] GET / => generated 88 bytes in 2 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)
[pid: 1|app: 0|req: 527/527] 127.0.0.1 () {28 vars in 293 bytes} [Thu Apr  8 23:56:53 2021] GET / => generated 88 bytes in 2 msecs (HTTP/1.1 200) 2 headers in 79 bytes (1 switches on core 0)

And running date on the container:

$ docker exec 4088 date
Fri Apr  9 02:40:26 UTC 2021

(So you can see it's been a few hours it was down.)

I'm monitoring a Starlink connection, so I'm wondering if maybe if the network goes away completely sometimes, it causes an unhandled exception or something (but even then... you'd think it would kill the flask app, not just keep it running doing nothing).

@MiguelNdeCarvalho
Copy link
Owner Author

MiguelNdeCarvalho commented Apr 9, 2021

Hey,

@geerlingguy I don't know what to say eheh, I'm one of your subscribers. That's really good to see that you are using my project WOW.
Now let's talk about the exporter.
I will release v3.1 still today with:

  • Drop uWSGI and use waitress instead (will be a lighter image and more easy to setup outside of container)
  • Update prometheus-client

Basically the check is just checking if the flask app is still working. When your exporter stops scraping, what is the status of exporter in Prometheus client? Are you using an specific server to do the tests?
I'm willing to fix this issue, but I will need some logs from your side please. I have the exporter running on my side for weeks without a single problem.

Thanks,
MiguelNdeCarvalho

@MiguelNdeCarvalho
Copy link
Owner Author

Hey again,

@geerlingguy can you try the v3.1 and see if you continue with the same problems?

Thanks,
MiguelNdeCarvalho

@geerlingguy
Copy link

@MiguelNdeCarvalho - Thanks! I just updated the Pi to 3.1 after seeing the same dropout this morning. There are no additional logs (just ... stops it seems), and if I log into the container and run wget localhost:9798 it just times out after a while (see: geerlingguy/internet-monitoring#1 (comment)).

I will keep monitoring the speedtest container and see if it stays up longer this time.

Note that I have another Pi running on my local cable Internet connection with the exact same config, and it keeps running (has been going two weeks now). So I wonder if something about the connection with Starlink (vs. Cable) is throwing the app for a loop (in some weird way that is not triggering an exception).

@MiguelNdeCarvalho
Copy link
Owner Author

MiguelNdeCarvalho commented Apr 10, 2021

Hey again @geerlingguy,

Basically yeasterday @Doacola has deployed the stack that you have done in his Pi4 (32 Bits) and he didn't got that weird behaviour that you are getting in the Pi4 connected to the Starlink. Right now I have done some tests:

1st - Deployed exporter in docker and I have done the first trigger
2nd - Triggered 2nd test but disconnected network after I started it, it just returned a 500 as it should
3rd -Triggered 3rd test still without connecting to the internet
4th - Reconnected the internet and triggered the 4th test and it worked as it should

I think this should be related to the speedtest cli itself. I think to debug this, you should install the speedtest cli and run an cronjob to trigger it every 30 minutes as your configuration and save the output from that runs to a file. Then we could check to see if the problem is coming from that or not.

Thanks,
MiguelNdeCarvalho

Ps. I hope that you talk about how you are monitoring it in next video 😉

@MiguelNdeCarvalho
Copy link
Owner Author

Everything is working fine now, so I'm going to close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants