[dev.icinga.com #3008] insert on duplicate key with multiple unique keys marked unsafe in mysql >= 5.5.24; rewrite all queries #1085
This issue has been migrated from Redmine: https://dev.icinga.com/issues/3008
Created by tgelf on 2012-08-23 15:30:09 +00:00
The SQL statements executed by ido2db are ugly but replication safe, even with statement-based replication. Unfortunately MySQL starting with version 5.5.24 flags the INSERT... ON DUPLICATE KEY UPDATE queries executed by Icinga being "unsafe", as there is more than one UNIQUE KEY in effect. MySQL changelog says:
Details are available at http://bugs.mysql.com/bug.php?id=58637. While their "solution" is disputable, the sad side of this "enhancement" is that Icinga floods your mysql.err logs with bazillions of warnings: 200 MB of warnings during a single Icinga startup and 20 GB in less than two days on the platform I'm currently working on. This can quickly fill your disk - and a full MySQL data partition can freeze your Icinga process (not only the IDO).
There are two quick workarounds right now:
IDO could try to avoid being flagged as unsafe by adjusting it's indexes. For example the icinga_hoststatus table looks like this right now:
The second row (UNIQUE KEY hoststatus_id) is completely useless, the primary key is already unique. The unique object_id absolutely makes sense. The hoststatus_id is completely useless, but removing it would probably require changes to the IDO.
A solution for this table could look as follows:
MySQL then no longer complains about this unsafe statement. The same constraints as before remain in effect. However, there are many other tables with the same problem. We all know that INSERT... ON DUPLICATE KEY UPDATE is an awful construct, but it has the advantage of allowing developers being lazy ;-) Unfortunatly in combination with excessive unique keys as used in the IDO schema it becomes a nightmare. Cleaning them up might partly fix this issue, but I doubt this will be not that easy for some of the affected tables.
NB: Top talker is icinga_customvariablestatus. Is it really necessary to refresh them with each host/service status update?
2012-08-25 15:23:35 +00:00 by mfriedrich b317a69
2012-08-25 16:10:05 +00:00 by mfriedrich f78b5b0
2012-08-31 19:47:24 +00:00 by mfriedrich cd6e70d
2012-09-02 14:35:45 +00:00 by mfriedrich e04c554
2012-09-15 15:31:53 +00:00 by mfriedrich 544662a
2012-09-15 17:31:22 +00:00 by mfriedrich 6f50920
2012-09-16 13:52:01 +00:00 by mfriedrich 5c200e7
2012-09-16 15:24:57 +00:00 by mfriedrich 6450695
2012-09-16 15:34:04 +00:00 by mfriedrich f21bbfb
Updated by robe on 2012-08-25 14:10:35 +00:00
I just looked into this with dnsmichi - given the current situation of IDO the best way to move forward would be to remove the non-standard "ON DUPLICATE" queries and merge the MySQL and Postgres queries. The latter use an "UPDATE, if no rows changed, INSERT" which is implemented in the ido code.
Updated by mfriedrich on 2012-08-25 15:51:46 +00:00
problem is rather to be safe:
this update query will result in a warning.
and not changing anything.
which basically leads to the question, why from_unixtime(0) causes an error.
hey, cool, -1 now works. or NULL too. but NOT 0. hey mysql, you suck.
another example which fails
see, from_unixtime(0) is also there.
so, basically, once the query is correct (say the from_unixtime(0) are revoked/rewritten), the updates will finally run through.
question - write a stored procedure which wraps 0 values around from unix_time()? catch all 0s in the code (horrible). huh?
Updated by mfriedrich on 2012-08-25 16:17:28 +00:00
the function warnings can be resolved by implicitely fetching that within ido2db_db_timet_to_sql(), and making 0 the NULL string on the function.
but that does not solve the problem with affected rows == 0 ... strangely enough. that query needs a deeper analysis. probably we're getting 2 updates in a row, where one is affecting, and one is not, causing the wrong insert.
Updated by mfriedrich on 2012-08-29 17:04:09 +00:00
next round - we get 2x the same comments, one time ADD one time LOAD. this is due to the fact, that i have kept the flapping comment inside the retention.dat, as well as the core automatically detects the flapping once more, and adds that comment as well. both actions trigger a nebcallback on core startup.
previously, the ido2db insert on duplicate key update procedure would have just thrown an exception, which was handled internally, and none would have seen this. now, we have the problem that the insert should not happen - so we possibly need to add an expensive SELECT before actually trying to insert the comment. or we use the shitty replace into functionality?
Updated by mfriedrich on 2012-08-29 17:12:44 +00:00
i guess we shouldn't put much effort into REPLACE, as this is also an extension to the mysql standard, and might get deprecated (namely "flagged unsafe") too.
Updated by mfriedrich on 2012-08-31 16:23:57 +00:00
Updated by mfriedrich on 2012-08-31 17:27:14 +00:00
ok, there's a certain amount of data which hits ido2db in the same second, therefore we've got 2 exact same update queries, where the second update is not returning any affected values. mysql shell treats this right
Updated by mfriedrich on 2012-08-31 19:45:21 +00:00
hm, i guess the faulty return of "affected rows" is just plain mysql dbi driver specific. because in doing so on a mysql shell, it works - everything changed, and affected is also 1. in doing so with postgresql, it also works.
probably the driver handlers affected and matched differently within the driver layer itsself. on debian wheezy i got
need to recheck on wheezy/sid if the error is consistent.
i don't have time to debug the shitty dbd driver though.
in order to stay safe, and not to lose too much performance (because status UPDATES happen n-1 of the time, while the initial insert only happens once on initial startup - since we do not clean any *status tables at all), we'll double check this for sanity.
this will move the SELECT (which is another performance killing query) out of the standard-to-be-executed queries, and adds it as safety check before inserting anything into the table which would violate the unique constraints.
currently done for
though there might be more tables and queries affected by that, so we should possibly add safety calls to each query former known as "insert-on-duplicate-key-update-mysql-foobar".
Updated by mfriedrich on 2012-09-02 14:21:47 +00:00
the problematic queries are duplicated nebtype events to be dumped into the database at the exact timestamp (where the query does not make a difference on the nanoseconds).
which we solved for comments and commenthistory table.
pending fix for scheduleddowntime and downtimehistory
Updated by mfriedrich on 2012-09-16 15:42:01 +00:00
rewrote all mysql update/insert queries to have a safety select, as it seems that at least on debian, the libdbi mysql driver is broken, always returning 0 on affected rows.
since this is now a complete rewrite, everybody test NOW! currently in test/ido.