Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
[dev.icinga.com #11387] IDO: historical contact notifications table column notification_id is off-by-one #4035
This issue has been migrated from Redmine: https://dev.icinga.com/issues/11387
Created by elippmann on 2016-03-15 09:41:05 +00:00
Icinga 2 has pending updates to contactnotifications not written "immediately" into the IDO after a notification has been sent. The contacts of the first notification are always written into the database. But the contacts of the second notification are not written into the database until a third notification has been sent and so on. This can be observed independently for every object.
2016-03-23 16:58:38 +00:00 by mfriedrich df2adb1
2016-03-29 13:12:24 +00:00 by mfriedrich 83e0bcd
2016-04-20 08:07:24 +00:00 by mfriedrich 0cbedf4
Updated by elippmann on 2016-03-15 09:41:26 +00:00
Updated by mfriedrich on 2016-03-21 10:40:01 +00:00
Hm, I think I found it. The notification_id is off by one which causes the Web2 query to fail. Once you restart the core the notification_id cache is invalid and it starts again.
Updated by mfriedrich on 2016-03-21 12:30:21 +00:00
At first glance the notification is inserted and the notification_id is properly stored for the queries into contactnotifications, so the relationship in icingaweb2 works.
This time the notification_id is incremented and stored after the contactnotifications are inserted. That way the notification does not have any related contactnotifications.
Updated by mfriedrich on 2016-03-21 19:24:41 +00:00
The culprit is that we do not have any notification object similar to hosts, host groups where we could easily store and fetch the insert id in a "common" way. The previous implementation in #5103 and #5265 worked fine as long as the queries are executed one after the other. The recent changes for the IdoMysqlConnection driver using client multi statements does not necessarily set the notification_id after the notifications table was populated (and leaves the old insert id in place).
Having the old notification_id when inserting the contact notifications explains the current buggy behaviour. The fix is rather ugly - invalidate the cache for the current notification object and its insert id before the notifications insert query is run, and update it afterwards. This happens inside the whole ExecuteQuery "transaction". That way any other query for contact notifications depending on the notification_id will be added to the end of the query queue. One problem which could arise are two notifications fired at the same time for the same notification object.
Again breaks the queries.
So in addition to fixing and invalidating the cache we need a way to determine for one notification with a different historical timeline and multiple notifications happening in a short amount of time.
I found a way to pass the current timestamp ("now" as double) next to the notification object to the query which then is used as unique key pair inside the m_NotificationInsertID map. In order to resolve the value later (and to check whether it is valid or not) I've also added a new DbValue which is then used in FieldToEscapedString().
The culprit lies in the fact the FieldToEscapedString only accepts the Value type, so we cannot pass an std::pair here for the notification object and timestamp. The ugly workaround is to use a Dictionary which is then used to select the accurate notification insert id.
One thing I need to check with that cache - we may leak memory over time as the m_NotificationInsertID list grows. The m_ObjectInsertID list does not grow historically, compared to the notification id being an historical event. It probably makes sense to clear the list in a timer and remove timestamps older than X seconds.
Updated by mfriedrich on 2016-03-22 15:23:30 +00:00