uptime: not updated after a crash #180

Krysztophe · 2018-04-09T13:05:38Z

pg_conf_load_time() et pg_postmaster_start_time are not updated when a crash occurs and the postmaster restarts all its children. The uptime service does not raise an alert.

Idea: check pg_stat_activity.backend_start for some vital process like checkpointer? (10+)
I have no idea to track unexpected restarts before 10 though.

Krysztophe · 2018-04-11T08:40:19Z

Plan:

(10+) use the backend_start from the checkpointer from pg_stat_activity

Before v10:

Searching for the start time of the checkpointer process? It would work only if check_pga is running on the PG server; risk of mixing processes from different instances
Searching in the logs is IMHO out of scope of this service

Krysztophe · 2018-05-01T08:35:23Z

See PR #182 for 10+

For 9.1 to 9.6:

I'm wondering if pg_stat_get_db_stat_reset_time and pg_stat_get_bgwriter_stat_reset_time() (9.1+) would be helpful.

Rule : If the oldest not-NULL of these dates is after the pg_postmaster_start_time(), take it as uptime. Possibly take a given database as a reference (you never reset stats manually on template1)?

ioguix · 2018-05-02T07:05:13Z

Hi,

You can not rely on *_reset_time functions as users can use them whenever they want.

Moreover, terminating all backends to reset the shared_buffers is not a real restart on its own. What you seems to seek for is a way to detect a backend crashed and when it did (and I have no idea how to do it right now).

Krysztophe · 2018-05-02T07:30:38Z

You can not rely on *_reset_time functions as users can use them whenever they want.

Right, that is only because I cannot rely on backends restart time before PG10 (remotely at least). If you reset all stats time for all databases (even template1?), you usually know it.

terminating all backends to reset the shared_buffers is not a real restart on its own

From user's point of view, it is : connections dropped, transactions canceled....

Such a thing is usually worth an investigation. And I know no automated way to detect it with check_pga. Such a restart is not obvious on weekly charts.

ioguix · 2018-05-02T07:47:33Z

From user's point of view, it is : connections dropped, transactions canceled....
Such a thing is usually worth an investigation. And I know no automated way to detect it with check_pga.

If it worth investigating (and it does), investigating imply you can read the logs which are packaed with WARNING/ERROR messages in such situation :)

But I agree an alert from the supervision might be useful...if possible.

Such a restart is not obvious on weekly charts.

I suppose the cache hit miss ratio should drop after the shared buffers reset.

Krysztophe · 2018-05-02T11:20:12Z

I suppose the cache hit miss ratio should drop after the shared buffers reset.

Not so obvious if you do not really search for it. Especially on a weekly OPM graph.

ioguix · 2018-05-02T11:53:03Z

indeed. However, if you set alert on cache hit/miss ratio, you should catch one with a very very low ratio.

I agree this is not the best and straight solution for this issue, but I have no other idea right now :/

ioguix · 2018-05-02T11:56:25Z

Note that even for 10+, your solution is an non-direct side effet as well :/ A much better one, but not direct anyway...

Krysztophe · 2019-01-29T16:56:53Z

#182 merged (thanks ioguix). I do not see a way to detect the crash and restart before v10, so I close this issue.

Krysztophe closed this as completed Jan 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uptime: not updated after a crash #180

uptime: not updated after a crash #180

Krysztophe commented Apr 9, 2018

Krysztophe commented Apr 11, 2018

Krysztophe commented May 1, 2018 •

edited

ioguix commented May 2, 2018

Krysztophe commented May 2, 2018

ioguix commented May 2, 2018

Krysztophe commented May 2, 2018

ioguix commented May 2, 2018

ioguix commented May 2, 2018 •

edited

Krysztophe commented Jan 29, 2019

uptime: not updated after a crash #180

uptime: not updated after a crash #180

Comments

Krysztophe commented Apr 9, 2018

Krysztophe commented Apr 11, 2018

Krysztophe commented May 1, 2018 • edited

ioguix commented May 2, 2018

Krysztophe commented May 2, 2018

ioguix commented May 2, 2018

Krysztophe commented May 2, 2018

ioguix commented May 2, 2018

ioguix commented May 2, 2018 • edited

Krysztophe commented Jan 29, 2019

Krysztophe commented May 1, 2018 •

edited

ioguix commented May 2, 2018 •

edited