Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #11534] DowntimesExpireTimerHandler crashes Icinga2 with <unknown function> #4095

Closed
icinga-migration opened this issue Apr 5, 2016 · 13 comments
Milestone

Comments

@icinga-migration
Copy link
Member

@icinga-migration icinga-migration commented Apr 5, 2016

This issue has been migrated from Redmine: https://dev.icinga.com/issues/11534

Created by PowellEB on 2016-04-05 19:19:12 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2016-04-12 10:10:03 +00:00)
Target Version: 2.4.5
Last Update: 2016-05-02 13:28:17 +00:00 (in Redmine)

Icinga Version: 2.4.4-1
Backport?: Already backported
Include in Changelog: 0

Moved from old icinga2 (2.4.4) Centos to new hardware on Ubuntu 14.04 (2.4.4-1) with two Node cluster.

Since new database on Ubuntu, dumped all old downtimes (author_name, downtime_type,comment_data,scheduled_start_time,scheduled_end_time, name).

Added old downtimes to new database from external command (icinga2.cmd). All went in fine.

40 minutes later, icinga2 #01 crashed. Two of the crash reports are attached. and below in message.

Could not get #01 to start. Did a config check on icinga2 #02, check failed, then icinga2 #02 also crashed.

It looked like icinga2 had processed (added) downtimes that had already ended, then when DowntimesExpireTimerHandler tried
to expire them it crashed and caused icinga2 to stop.

The only way to get the config check to pass and run icinga2 was to remove downtimes from /var/lib/icinga2/api/packages/_api/....../conf.d/downtimes

Both nodes are running now. However we cannot delete old downtimes now.
After some time, checking icinga_downtimehistory there are duplicate "internal_downtime_id" entries now.

(1) reporting the issue, and crash info below + attachements.

(2) how to repair the current inconsistencies for api & icinga_downtimehistory ?
assume:

dump all downtimes from icinga_downtimehistory
clear ...../conf.d/downtimes on both nodes of icinga2
truncate icinga_downtimehistory
add downtimes back via external command (ensuring no incoming downtimes are expired)

If this will work, but question is how to reset the "internal_downtime_id" counter so we do not get duplicate ids again?

Text from Crash:

**
Application information:
Application version: r2.4.4-1
Installation root: /usr
Sysconf directory: /etc
Run directory: /run
Local state directory: /var
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

System information:
Platform: Ubuntu
Platform version: 14.04, Trusty Tahr
Kernel: Linux
Kernel version: 3.13.0-24-generic
Architecture: x86_64
Stacktrace:

(0) libpthread.so.0: (+0x10340) [0x7f64286fe340]
(1) libc.so.6: gsignal (+0x39) [0x7f6427497f79]
(2) libc.so.6: abort (+0x148) [0x7f642749b388]
(3) libc.so.6: (+0x2fe36) [0x7f6427490e36]
(4) libc.so.6: (+0x2fee2) [0x7f6427490ee2]
(5) libicinga.so: (+0x184ec3) [0x7f6422f5dec3]
(6) libicinga.so: icinga::Downtime::RemoveDowntime(icinga::String const&, bool, bool, boost::intrusive_ptricinga::MessageOrigin const&) (+0x6e7) [0x7f6422f78c17]
(7) libicinga.so: icinga::Downtime::DowntimesExpireTimerHandler() (+0x359) [0x7f6422f79a69]
(8) libbase.so: boost::signals2::detail::signal_impl<void (boost::intrusive_ptricinga::Timer const&), boost::signals2::optional_last_value, int, std::less, boost::function<void (boost::intrusive_ptricinga::Timer const&)>, boost::function<void (boost::signals2::connection const&, boost::intrusive_ptricinga::Timer const&)>, boost::signals2::mutex>::operator()(boost::intrusive_ptricinga::Timer const&) (+0x1cc) [0x7f6428473f6c]
(9) libbase.so: icinga::Timer::Call() (+0x29) [0x7f64284216a9]
(10) libbase.so: icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) (+0x326) [0x7f642841e496]
(11) libboost_thread.so.1.54.0: (+0xba4a) [0x7f6428d89a4a]
(12) libpthread.so.0: (+0x8182) [0x7f64286f6182]
(13) libc.so.6: clone (+0x6d) [0x7f642755c30d]


  • This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
  • please submit a bug report at https://dev.icinga.org/ and include this stack trace as well as any other
  • information that might be useful in order to reproduce this problem.
    *****

Failed to launch GDB: No such file or directory

**

Attachments

Changesets

2016-04-12 10:05:43 +00:00 by gbeutner 974ca9f

Fix crash in Downtime::DowntimesExpireTimerHandler

fixes #11534
fixes #11559

2016-04-20 08:09:34 +00:00 by gbeutner 159681c

Fix crash in Downtime::DowntimesExpireTimerHandler

fixes #11534
fixes #11559

Relations:

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 6, 2016

Updated by mfriedrich on 2016-04-06 15:58:03 +00:00

  • Category set to libicinga
  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
  • Priority changed from Normal to High
  • Parent Id set to 11312

Can you please install gdb and generate a backtrace once this issue happens again? Thanks.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 6, 2016

Updated by PowellEB on 2016-04-06 17:34:46 +00:00

gdb is now installed. we will get trace when it happens again.

For the downtime inconsistency resolution is this the correct direction,
and also the internal_downtime_id reset?

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 6, 2016

Updated by mfriedrich on 2016-04-06 17:38:07 +00:00

The internal_downtime_id problem is discussed in #11382 and a possible fix is available through the snapshot packages. Please test them.

Kind regards,
Michael

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 7, 2016

Updated by mfriedrich on 2016-04-07 08:25:51 +00:00

  • Target Version set to 2.4.6
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 12, 2016

Updated by gbeutner on 2016-04-12 09:40:20 +00:00

This might actually be a duplicate of #11559.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 12, 2016

Updated by gbeutner on 2016-04-12 09:40:50 +00:00

  • Duplicates set to 11559
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 12, 2016

Updated by gbeutner on 2016-04-12 10:06:07 +00:00

  • Assigned to changed from mfriedrich to gbeutner
  • Target Version changed from 2.4.6 to 2.4.5
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 12, 2016

Updated by gbeutner on 2016-04-12 10:10:03 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 974ca9f.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 20, 2016

Updated by gbeutner on 2016-04-20 06:35:34 +00:00

  • Include in Changelog changed from 1 to 0
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 20, 2016

Updated by gbeutner on 2016-04-20 08:15:55 +00:00

  • Backport? changed from Not yet backported to Already backported
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented May 2, 2016

Updated by mfriedrich on 2016-05-02 13:27:40 +00:00

  • Parent Id deleted 11312
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented May 2, 2016

Updated by mfriedrich on 2016-05-02 13:28:17 +00:00

I guess there is a problem with this patch for expiring downtimes.

The check for active objects looks like this

bool ConfigObject::IsActive(void) const
{
        return GetActive();
}

but the downtime class overrides that function with the same signature.

bool Downtime::IsActive(void) const
{
        double now = Utility::GetTime();

        if (now < GetStartTime() ||
                now > GetEndTime())
                return false;

        if (GetFixed())
                return true;

        double triggerTime = GetTriggerTime();

        if (triggerTime == 0)
                return false;

        return (triggerTime + GetDuration() < now);
}

That way the patch does not work. I'll create a follow-up issue for that.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented May 2, 2016

Updated by mfriedrich on 2016-05-02 13:34:11 +00:00

  • Relates set to 11711
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.