Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid services filtering restrictions when calculating hosts statistics #75

Conversation

petr-castulik
Copy link

Hi,

we are using the Icinga roles filtering monitoring/filter/objects = "service_description!=something".

It implies that the table icinga_services is joined in the inner query, even in the case of a hosts only based count, i.e, the counts are always services related.

Current query

select ...
from icinga_objects as ho
inner join icinga_hosts as h on h.host_object_id = ho.object_id and ho.is_active = 1 and ho.objecttype_id = 1
inner join icinga_hoststatus as hs on hs.host_object_id = ho.object_id
left join icinga_services as s on s.host_object_id = h.host_object_id
left join icinga_objects as so on so.object_id = s.service_object_id and so.is_active = 1 and so.objecttype_id = 2
left join icinga_customvariablestatus as "c_postgres_version"
            on "c_postgres_version".varname = 'postgres_version' and "c_postgres_version".object_id = ho.object_id
where (((so.name2 != 'something' or so.name2 is null)
group by ho.object_id, h.host_id, "c_postgres_version".varvalue

I think the inner query should skip any services related filtering in such a case, which results into a query with the correct counts

select ...
from icinga_objects as ho
inner join icinga_hosts as h on h.host_object_id = ho.object_id and ho.is_active = 1 and ho.objecttype_id = 1
inner join icinga_hoststatus as hs on hs.host_object_id = ho.object_id
left join icinga_customvariablestatus as "c_postgres_version"
            on "c_postgres_version".varname = 'postgres_version' and "c_postgres_version".object_id = ho.object_id
group by ho.object_id, h.host_id, "c_postgres_version".varvalue

@cla-bot
Copy link

cla-bot bot commented Dec 13, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Petr Castulik.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@nilmerg
Copy link
Member

nilmerg commented Dec 13, 2021

This is how this restriction works everywhere else as well. I doubt that it's a good idea to change that in this module that doesn't even provide it. There's a feature request over at Icinga Web 2 for this: Icinga/icingaweb2#3550

Icinga DB Web already supports that though (https://icinga.com/docs/icinga-db/latest/icinga-db-web/doc/04-Security/#restrictions) which is why it still probably won't be a thing for the monitoring module.

I'd rather not merge this.

@petr-castulik
Copy link
Author

Thank you for your answer. I understand your concern. I'm databasist, hence avoiding of any join of useless tables is my first approach.

It's really pitty that counts are wrong becuase it makes this module useless for our case.

There are another approaches for sure. For example using the distinct keyword in aggregate functions.

Please let me know if you have any preferable way and I could try to prepare another solution.

@nilmerg
Copy link
Member

nilmerg commented Dec 14, 2021

The join is necessary to resolve the restriction. It is supposed to apply to any and all objects. That's what other users expect and rightly do so as well when using this module.

In which way are counts wrong? If you restrict by service_description, you have to expect to see less hosts. If, on the other hand, you see more hosts than possible that would hint at a grouping issue. Though, the GROUP BY you posted above looks correct to me.

@petr-castulik
Copy link
Author

Let me try to explain it with some example. Suppose we are monitoring just one host matching "hosts based cube dimension" and we are monitoring 10 services on this host.

In the case without any restrictions the hosts counts are correct. The inner query is without any join of the icinga_services table because this table is not needed and the result of the inner query is just one line.

But in the case of some service_description restriction which for example filters out 2 of 10 services on the host. The inner query now joins icinga_services table to filter out services restriction and hence the result of the inner query are 8 lines. The hosts count is wrong because it uses count(*) as "hosts_cnt".

@nilmerg
Copy link
Member

nilmerg commented Dec 15, 2021

I see! But this is clearly a grouping issue. Please create an issue for it.

This has to be solved by building the query differently, and not by omitting the restriction entirely. The fixed query should look like this:

SELECT rollup."env",
       rollup."hosts_cnt",
       rollup."hosts_down",
       rollup."hosts_unhandled_down",
       rollup."hosts_unreachable",
       rollup."hosts_unhandled_unreachable"
FROM (SELECT sub."env",
             SUM(hosts_cnt)                   AS hosts_cnt,
             SUM(hosts_down)                  AS hosts_down,
             SUM(hosts_unhandled_down)        AS hosts_unhandled_down,
             SUM(hosts_unreachable)           AS hosts_unreachable,
             SUM(hosts_unhandled_unreachable) AS hosts_unhandled_unreachable
      FROM (SELECT "c_env"."varvalue"                                    AS "env",
                   COUNT(*)                                              AS "hosts_cnt",
                   SUM(CASE WHEN hs.current_state = 1 THEN 1 ELSE 0 END) AS "hosts_down",
                   SUM(CASE
                           WHEN hs.current_state = 1 AND hs.problem_has_been_acknowledged = 0 AND
                                hs.scheduled_downtime_depth = 0 THEN 1
                           ELSE 0 END)                                   AS "hosts_unhandled_down",
                   SUM(CASE WHEN hs.current_state = 2 THEN 1 ELSE 0 END) AS "hosts_unreachable",
                   SUM(CASE
                           WHEN hs.current_state = 2 AND hs.problem_has_been_acknowledged = 0 AND
                                hs.scheduled_downtime_depth = 0 THEN 1
                           ELSE 0 END)                                   AS "hosts_unhandled_unreachable"
            FROM icinga_objects AS ho
                     INNER JOIN icinga_hosts AS h
                                ON h.host_object_id = ho.object_id AND ho.is_active = 1 AND ho.objecttype_id = 1
                     INNER JOIN icinga_hoststatus AS hs ON hs.host_object_id = ho.object_id
                     LEFT JOIN icinga_customvariablestatus AS "c_env"
                               ON "c_env".varname = 'env' AND "c_env".object_id = ho.object_id
            WHERE ((EXISTS(SELECT 1
                           FROM icinga_objects AS sub_so
                                    INNER JOIN icinga_services AS sub_s
                                               ON sub_s.service_object_id = sub_so.object_id AND
                                                  sub_so.is_active = 1 AND sub_so.objecttype_id = 2
                           WHERE ((sub_so.name2 LIKE '%random%') AND sub_s.host_object_id = ho.object_id))))
            GROUP BY "c_env".varvalue) AS sub
      GROUP BY ROLLUP("env")) AS rollup
ORDER BY (rollup."env" IS NOT NULL) ASC, rollup."env" ASC, (rollup."hosts_cnt" IS NOT NULL) ASC, rollup."hosts_cnt" ASC,
         (rollup."hosts_down" IS NOT NULL) ASC, rollup."hosts_down" ASC,
         (rollup."hosts_unhandled_down" IS NOT NULL) ASC, rollup."hosts_unhandled_down" ASC,
         (rollup."hosts_unreachable" IS NOT NULL) ASC, rollup."hosts_unreachable" ASC,
         (rollup."hosts_unhandled_unreachable" IS NOT NULL) ASC, rollup."hosts_unhandled_unreachable" ASC;

I've produced that by adjusting a few things in the monitoring module, but since this only applies to the cube module, this isn't the proper solution. I've created #76 instead. Please try that.

@petr-castulik
Copy link
Author

Thank you for your time. Your solution gives us the correct counts.

I'm not an Icinga expert hence I don't dare to dispute whether a host without any active service or with all services restricted should be counted or not - it is not our case anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants