Skip to content

Invalid Message when Unexcluded Tables Have Never Been Vacuumed #1

@theory

Description

@theory

We're running check_postgres.pl like this:

check_postgres.pl -u nagios --action=last_vacuum --exclude=~^pg --db=csi

The output is:

No matching tables found due to exclusion/inclusion options

But that can't be right. The query that ends up running is (reformatted):

SELECT current_database(), nspname, relname,
       CASE WHEN v IS NULL THEN -1 ELSE round(extract(epoch FROM now()-v)) END,
       CASE WHEN v IS NULL THEN '?' ELSE TO_CHAR(v, 'HH24:MI FMMonth DD, YYYY') END
  FROM (
          SELECT nspname, relname, GREATEST(
              pg_stat_get_last_vacuum_time(c.oid), 
              pg_stat_get_last_autovacuum_time(c.oid)
          ) AS v
            FROM pg_class c, pg_namespace n
           WHERE relkind = 'r'
             AND n.oid = c.relnamespace
             AND n.nspname <> 'information_schema'
           ORDER BY 3
  ) AS foo;

When I run that manually, I get rows like:

csi | pg_catalog  | pg_authid       | 390524 | 23:16 March 20, 2010
csi | public      | foo             |     -1 | ?
csi | pg_catalog  | pg_auth_members |     -1 | ?

So we do have a table, "foo", that is not excluded. However, it's never been vacuumed, so round is set to -1. The error should not be that no matching tables were found because of the exclusion, because a table is found and not excluded, but has never been vacuumed. So it should probably say something like "no unvacuumed tables found" instead.

I think that the reason it works that way is this bit of code in check_last_vacuum_analyze():

    SLURP: while ($db->{slurp} =~ /(\S+)\s+\| (\S+)\s+\| (\S+)\s+\|\s+(\-?\d+) \| (.+)\s*$/gm) {
        my ($dbname,$schema,$name,$time,$ptime) = ($1,$2,$3,$4,$5);
        $maxtime = -3 if $maxtime == -1;
        if (skip_item($name, $schema)) {
            $maxtime = -2 if $maxtime < 1;
            next SLURP;
        }

So looking at the three rows returned above, it looks like:

  • row one is excluded and $maxtime set to -2
  • row two is not excluded, but $maxtime is -1 and so gets set to -3
  • row three is excluded and $maxtime set to -2

Since the last row fetched set $maxtime to -2, this code then gets triggered:

    if ($maxtime == -2) {
        add_unknown msg('no-match-table');
    }

But that's wrong. I think what needs to happen is that it needs to know that unexcluded rows were returned (the second row in this example) but were never vacuumed. Not sure how you'd go about that using $maxtime as a flag; maybe you need some other flag? Maybe something like this?

--- a/check_postgres.pl
+++ b/check_postgres.pl
@@ -3469,6 +3469,7 @@ sub check_last_vacuum_analyze {
        my ($minrel,$maxrel) = ('?','?'); ## no critic
        my $mintime = 0; ## used for MRTG only
        my $count = 0;
+                my $unskipped;
        SLURP: while ($db->{slurp} =~ /(\S+)\s+\| (\S+)\s+\| (\S+)\s+\|\s+(\-?\d+) \| (.+)\s*$/gm) {
            my ($dbname,$schema,$name,$time,$ptime) = ($1,$2,$3,$4,$5);
            $maxtime = -3 if $maxtime == -1;
@@ -3476,6 +3477,7 @@ sub check_last_vacuum_analyze {
                $maxtime = -2 if $maxtime < 1;
                next SLURP;
            }
+                        $unskipped ||= 1;
            $db->{perf} .= " $dbname.$schema.$name=${time}s;$warning;$critical" if $time >= 0;
            if ($time > $maxtime) {
                $maxtime = $time;
@@ -3497,7 +3499,7 @@ sub check_last_vacuum_analyze {
        }

        if ($maxtime == -2) {
-           add_unknown msg('no-match-table');
+           add_unknown msg($unskipped ? 'no-vacuumed-table' : 'no-match-table');
        }
        elsif ($maxtime < 0) {
            add_unknown $type eq 'vacuum' ? msg('vac-nomatch-v') : msg('vac-nomatch-a');

Thanks.

David

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions