Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #11686] Icinga Crash with the workflow Create_Host-> Downtime for the Host -> Delete Downtime -> Remove Host #4170

Closed
icinga-migration opened this issue Apr 27, 2016 · 18 comments

Comments

Projects
None yet
1 participant
@icinga-migration
Copy link
Member

commented Apr 27, 2016

This issue has been migrated from Redmine: https://dev.icinga.com/issues/11686

Created by Christian_vlc on 2016-04-27 09:38:26 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2016-05-09 12:32:38 +00:00)
Target Version: 2.4.8
Last Update: 2016-05-11 07:38:56 +00:00 (in Redmine)

Icinga Version: r2.4.7-1
Backport?: Not yet backported
Include in Changelog: 1

Hi,

Icinga Crash if I remove one host after that I created/removed one downtime for this host:

In the dmesg ist one segfault error, but in Icinga logs are any Error (debug not too).
icinga2[23538]: segfault at 38 ip 00007fb9ed09343a sp 00007fb9dbc2eaf0 error 4 in libstdc**.so.6.0.17[7fb9ed031000+e8000]

After this error, I must restart Icinga and then I can remove this host.

I tried with flexibel and fixed Downtimes.. .but every time the same result...

Some Idea??

Attachments

Changesets

2016-05-09 12:30:12 +00:00 by gbeutner b8e911b

Make sure the dependency graph is properly updated when adding and removing objects

fixes #11686
fixes #11374

2016-05-12 09:08:21 +00:00 by gbeutner d82db2a

Make sure the dependency graph is properly updated when adding and removing objects

fixes #11686
fixes #11374

Relations:

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2016

Updated by saurabh_hirani on 2016-04-28 10:19:43 +00:00

I verified the same and this is the case. Create host, downtime host, delete downtime, remove host - leads to the exact same error:

Apr 28 07:39:12 vagrant kernel: [11121.769790] traps: icinga2[22623] general protection ip:7f199c3c722f sp:7f1995c80950 error:0 in libstdc**.so.6.0.16[7f199c313000+e2000]

This is a pretty normal activity as hosts will transition through this cycle. Causing an entire icinga2 instance to fail is highly problematic in production systems.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2016

Updated by Christian_vlc on 2016-04-28 12:35:31 +00:00

I tried with:

1- Only the downtime on the Host (without the Services)
2- Downtime for Host and Service

But it doesnt help

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 2, 2016

Updated by mfriedrich on 2016-05-02 07:11:15 +00:00

  • Priority changed from Normal to High
  • Parent Id set to 11415
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 2, 2016

Updated by mfriedrich on 2016-05-02 13:45:55 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 2, 2016

Updated by mfriedrich on 2016-05-02 14:27:12 +00:00

[2016-05-02 16:25:22 +0200] information/ApiListener: New client connection (no client certificate)
[2016-05-02 16:25:22 +0200] information/HttpServerConnection: Request: PUT /v1/objects/hosts/11686-host (root)
[2016-05-02 16:25:22 +0200] information/ConfigCompiler: Compiling config file: /usr/local/icinga2/var/lib/icinga2/api/packages/_api/mbmif.int.netways.de-1442309540-1/conf.d/hosts/11686-host.conf
[2016-05-02 16:25:22 +0200] information/ConfigItem: Committing config items
[2016-05-02 16:25:22 +0200] warning/ApplyRule: Apply rule 'satellite-host' (in /usr/local/icinga2/etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
[2016-05-02 16:25:22 +0200] information/ConfigItem: Instantiated 1 Host.
[2016-05-02 16:25:22 +0200] information/ConfigItem: Triggering Start signal for config items
[2016-05-02 16:25:22 +0200] information/ConfigItem: Activated all objects.
[2016-05-02 16:25:24 +0200] information/Checkable: Checking for configured notifications for object 'flapping-host'
[2016-05-02 16:25:24 +0200] information/Checkable: Checkable 'flapping-host' does not have any notifications.
[2016-05-02 16:25:25 +0200] information/Checkable: Checking for configured notifications for object 'flapping-host'
[2016-05-02 16:25:25 +0200] information/Checkable: Checkable 'flapping-host' does not have any notifications.
[2016-05-02 16:25:26 +0200] information/IdoMysqlConnection: Query queue items: 0, query rate: 24.7167/s (1483/min 7469/5min 19113/15min);
[2016-05-02 16:25:37 +0200] information/ApiListener: New client connection (no client certificate)
[2016-05-02 16:25:37 +0200] information/HttpServerConnection: Request: POST /v1/actions/schedule-downtime (root)
[2016-05-02 16:25:37 +0200] information/ConfigCompiler: Compiling config file: /usr/local/icinga2/var/lib/icinga2/api/packages/_api/mbmif.int.netways.de-1442309540-1/conf.d/downtimes/11686-host!mbmif.int.netways.de-1462199137-0.conf
[2016-05-02 16:25:37 +0200] information/ConfigItem: Committing config items
[2016-05-02 16:25:37 +0200] warning/ApplyRule: Apply rule 'satellite-host' (in /usr/local/icinga2/etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
[2016-05-02 16:25:37 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2016-05-02 16:25:37 +0200] information/ConfigItem: Triggering Start signal for config items
[2016-05-02 16:25:37 +0200] information/ConfigItem: Activated all objects.
[2016-05-02 16:25:37 +0200] information/Checkable: Checking for configured notifications for object '11686-host'
[2016-05-02 16:25:37 +0200] information/Checkable: Checkable '11686-host' does not have any notifications.
[2016-05-02 16:25:41 +0200] information/IdoMysqlConnection: Query queue items: 0, query rate: 24/s (1440/min 7345/5min 19408/15min);
[2016-05-02 16:25:47 +0200] warning/PluginCheckTask: Check command for object 'mbmif.int.netways.de!swap' (PID: 17786, arguments: '/usr/local/sbin/check_swap' '-c' '25%' '-w' '50%') terminated with exit code 128, output: execvpe(/usr/local/sbin/check_swap) failed: No such file or directory

[2016-05-02 16:25:50 +0200] information/ApiListener: New client connection (no client certificate)
[2016-05-02 16:25:50 +0200] information/HttpServerConnection: Request: POST /v1/actions/remove-downtime (root)
[2016-05-02 16:25:51 +0200] information/Checkable: Checking for configured notifications for object '11686-host'
[2016-05-02 16:25:51 +0200] information/Checkable: Checkable '11686-host' does not have any notifications.
[2016-05-02 16:25:56 +0200] information/IdoMysqlConnection: Query queue items: 1, query rate: 22.5833/s (1355/min 7303/5min 19771/15min);
[2016-05-02 16:25:56 +0200] information/Checkable: Checking for configured notifications for object 'flapping-host'
[2016-05-02 16:25:56 +0200] information/Checkable: Checkable 'flapping-host' does not have any notifications.
[2016-05-02 16:25:57 +0200] information/Checkable: Checking for configured notifications for object 'flapping-host'
[2016-05-02 16:25:57 +0200] information/Checkable: Checkable 'flapping-host' does not have any notifications.
[2016-05-02 16:26:02 +0200] information/ApiListener: New client connection (no client certificate)
[2016-05-02 16:26:02 +0200] information/HttpServerConnection: Request: DELETE /v1/objects/hosts/11686-host (root)
[2016-05-02 16:26:11 +0200] information/IdoMysqlConnection: Query queue items: 0, query rate: 24.05/s (1443/min 7250/5min 20157/15min);
[2016-05-02 16:26:26 +0200] information/IdoMysqlConnection: Query queue items: 0, query rate: 25.6333/s (1538/min 7369/5min 20651/15min);
[2016-05-02 16:26:27 +0200] information/ApiListener: New client connection (no client certificate)
[2016-05-02 16:26:27 +0200] information/HttpServerConnection: Request: DELETE /v1/objects/hosts/11686-host?cascade[]=1 (root)
Process 15170 stopped
* thread #27: tid = 0xa26e44, 0x00007fff996a4b21 libc++abi.dylib`__dynamic_cast + 37, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff0)
    frame #0: 0x00007fff996a4b21 libc++abi.dylib`__dynamic_cast + 37
libc++abi.dylib`__dynamic_cast:
->  0x7fff996a4b21 <+37>: movq   -0x10(%rax), %r13
    0x7fff996a4b25 <+41>: addq   %rbx, %r13
    0x7fff996a4b28 <+44>: movq   -0x8(%rax), %r14
    0x7fff996a4b2c <+48>: movq   %r15, -0x78(%rbp)
(lldb) bt
* thread #27: tid = 0xa26e44, 0x00007fff996a4b21 libc++abi.dylib`__dynamic_cast + 37, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff0)
  * frame #0: 0x00007fff996a4b21 libc++abi.dylib`__dynamic_cast + 37
    frame #1: 0x00000001005dd4da libremote.dylib`boost::intrusive_ptr boost::dynamic_pointer_cast(p=0x0000000107abd910) + 90 at intrusive_ptr.hpp:287
    frame #2: 0x000000010072c63a libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 890 at configobjectutility.cpp:182
    frame #3: 0x000000010072d03c libremote.dylib`icinga::ConfigObjectUtility::DeleteObject(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 300 at configobjectutility.cpp:235
    frame #4: 0x0000000100735239 libremote.dylib`icinga::DeleteObjectHandler::HandleRequest(this=0x000000010190d420, user=0x0000700000dce370, request=0x0000000108a3b468, response=0x0000700000dce178) + 4601 at deleteobjecthandler.cpp:89
    frame #5: 0x0000000100798432 libremote.dylib`icinga::HttpHandler::ProcessRequest(user=0x0000700000dce370, request=0x0000000108a3b468, response=0x0000700000dce178) + 3874 at httphandler.cpp:99
    frame #6: 0x000000010079378a libremote.dylib`icinga::HttpServerConnection::ProcessMessageAsync(this=0x0000000103003200, request=0x0000000108a3b468) + 3962 at httpserverconnection.cpp:185
    frame #7: 0x0000000100796293 libremote.dylib`void boost::_mfi::mf1::call, icinga::HttpRequest>(this=0x0000000108a3b450, u=0x0000000108a3b460, (null)=0x0000000000000000, b1=0x0000000108a3b468) const + 131 at mem_fn_template.hpp:156
    frame #8: 0x00000001007961c1 libremote.dylib`void boost::_mfi::mf1::operator(this=0x0000000108a3b450, u=0x0000000108a3b460, a1=0x0000000108a3b468) >(boost::intrusive_ptr&, icinga::HttpRequest&) const + 49 at mem_fn_template.hpp:171
    frame #9: 0x000000010079615c libremote.dylib`void boost::_bi::list2 >, boost::_bi::value >::operator(this=0x0000000108a3b460, (null)=type @ 0x0000700000dce598, f=0x0000000108a3b450, a=0x0000700000dce5c0, (null)=0), boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1&, boost::_bi::list0&, int) + 108 at bind.hpp:319
    frame #10: 0x00000001007960ea libremote.dylib`boost::_bi::bind_t, boost::_bi::list2 >, boost::_bi::value > >::operator(this=0x0000000108a3b450)() + 58 at bind.hpp:1222
    frame #11: 0x0000000100795e30 libremote.dylib`boost::detail::function::void_function_obj_invoker0, boost::_bi::list2 >, boost::_bi::value > >, void>::invoke(function_obj_ptr=0x0000700000dce7d8) + 32 at function_template.hpp:159
    frame #12: 0x0000000100f274c8 libbase.dylib`boost::function0::operator(this=0x0000700000dce7d0)() const + 120 at function_template.hpp:770
    frame #13: 0x0000000101130bf4 libbase.dylib`icinga::WorkQueue::WorkerThreadProc(this=0x00000001030032d8) + 3364 at workqueue.cpp:212
    frame #14: 0x0000000101133999 libbase.dylib`boost::_mfi::mf0::operator(this=0x0000000108799e20, p=0x00000001030032d8)(icinga::WorkQueue*) const + 105 at mem_fn_template.hpp:49
    frame #15: 0x00000001011342e0 libbase.dylib`void boost::_bi::list1 >::operator(this=0x0000000108799e30, (null)=type @ 0x0000700000dcee48, f=0x0000000108799e20, a=0x0000700000dcee70, (null)=0), boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0&, boost::_bi::list0&, int) + 80 at bind.hpp:259
    frame #16: 0x000000010113428a libbase.dylib`boost::_bi::bind_t, boost::_bi::list1 > >::operator(this=0x0000000108799e20)() + 58 at bind.hpp:1222
    frame #17: 0x000000010113421c libbase.dylib`boost::detail::thread_data, boost::_bi::list1 > > >::run(this=0x0000000108799c20) + 28 at thread.hpp:116
    frame #18: 0x000000010007f5e9 libboost_thread-mt.dylib`boost::(anonymous namespace)::thread_proxy(void*) + 185
    frame #19: 0x00007fff9cfca99d libsystem_pthread.dylib`_pthread_body + 131
    frame #20: 0x00007fff9cfca91a libsystem_pthread.dylib`_pthread_start + 168
    frame #21: 0x00007fff9cfc8351 libsystem_pthread.dylib`thread_start + 13
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 2, 2016

Updated by mfriedrich on 2016-05-02 14:27:58 +00:00

  • File added icinga2_api_11686.paw

Paw test file attached.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 2, 2016

Updated by mfriedrich on 2016-05-02 14:51:29 +00:00

(lldb) up
frame #1: 0x00000001005dd4da libremote.dylib`boost::intrusive_ptr boost::dynamic_pointer_cast(p=0x0000000107abd910) + 90 at intrusive_ptr.hpp:287
   284
   285  template intrusive_ptr dynamic_pointer_cast(intrusive_ptr const & p)
   286  {
-> 287      return dynamic_cast(p.get());
   288  }
   289
   290  // operator<<
(lldb)
frame #2: 0x000000010072c63a libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 890 at configobjectutility.cpp:182
   179      }
   180
   181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
-> 182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
   185              continue;
(lldb) p pobj
(Ptr) $0 = {
  px = 0x0000000104ec8ba0
}
(lldb) p *pobj.px
(icinga::Object) $1 = {
  m_References = 17293822569376304612
  m_Mutex = 4377542660
  m_LockOwner = 0x0000000000000000
}

(lldb) b configobjectutility.cpp:171
Breakpoint 1: where = libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(boost::intrusive_ptr const&, bool, boost::intrusive_ptr const&) + 47 at configobjectutility.cpp:171, address = 0x000000010072c2ef
(lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 15170 exited with status = 9 (0x00000009)
Process 18063 launched: '/usr/local/icinga2/lib/icinga2/sbin/icinga2' (x86_64)

Process 18063 stopped
* thread #19: tid = 0xa2a64c, 0x000000010072c2ef libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 47 at configobjectutility.cpp:171, stop reason = breakpoint 1.1
    frame #0: 0x000000010072c2ef libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 47 at configobjectutility.cpp:171
   168
   169  bool ConfigObjectUtility::DeleteObjectHelper(const ConfigObject::Ptr& object, bool cascade, const Array::Ptr& errors)
   170  {
-> 171      std::vector parents = DependencyGraph::GetParents(object);
   172
   173      if (!parents.empty() && !cascade) {
   174          if (errors)
(lldb) p *object.px
(icinga::ConfigObject) $3 = {
  m_Zone = {
    px = 0x0000000000000000
  }
}

Looks like a race condition to me. At this stage the host object is not fully available when enforcing a dynamic_pointer_cast. Turns out when debugging and stepping into the functions, it works as expected from inside the debugger.

(lldb) br del

(lldb) b icinga::DeleteObjectHandler::HandleRequest
Breakpoint 2: where = libremote.dylib`icinga::DeleteObjectHandler::HandleRequest(boost::intrusive_ptr const&, icinga::HttpRequest&, icinga::HttpResponse&) + 53 at deleteobjecthandler.cpp:37, address = 0x0000000100734075
(lldb) r

Process 18209 stopped
* thread #19: tid = 0xa2ca8f, 0x0000000100734075 libremote.dylib`icinga::DeleteObjectHandler::HandleRequest(this=0x0000000101d05510, user=0x0000700000dce370, request=0x00000001082ecb68, response=0x0000700000dce178) + 53 at deleteobjecthandler.cpp:37, stop reason = breakpoint 2.1
    frame #0: 0x0000000100734075 libremote.dylib`icinga::DeleteObjectHandler::HandleRequest(this=0x0000000101d05510, user=0x0000700000dce370, request=0x00000001082ecb68, response=0x0000700000dce178) + 53 at deleteobjecthandler.cpp:37
   34
   35   bool DeleteObjectHandler::HandleRequest(const ApiUser::Ptr& user, HttpRequest& request, HttpResponse& response)
   36   {
-> 37       if (request.RequestUrl->GetPath().size() < 3 || request.RequestUrl->GetPath().size() > 4)
   38           return false;
   39
   40       if (request.RequestMethod != "DELETE")
(lldb) p request
(icinga::HttpRequest) $4 = {
  Complete = true
  RequestMethod = (m_Data = "DELETE")
  RequestUrl = {
    px = 0x0000000107d7ff10
  }
  ProtocolVersion = HttpVersion11
  Headers = {
    px = 0x0000000108a82fc0
  }
  m_Stream = {
    px = 0x0000000107db1230
  }
  m_ChunkContext = {
    px = 0x0000000000000000
    pn = {
      pi_ = 0x0000000000000000
    }
  }
  m_State = HttpRequestBody
  m_Body = {
    px = 0x0000000000000000
  }
}
(lldb) p *(request.RequestUrl).px
(icinga::Url) $5 = {
  icinga::Object = {
    m_References = 1
    m_Mutex = 0
    m_LockOwner = 0x0000000000000000
  }
  m_Scheme = (m_Data = "")
  m_Username = (m_Data = "")
  m_Password = (m_Data = "")
  m_Host = (m_Data = "")
  m_Port = (m_Data = "")
  m_Path = size=4 {
    [0] = (m_Data = "v1")
    [1] = (m_Data = "objects")
    [2] = (m_Data = "hosts")
    [3] = (m_Data = "11686-host")
  }
  m_Query = size=1 {
    [0] = {
      first = (m_Data = "cascade")
      second = size=1 {
        [0] = (m_Data = "1")
      }
    }
  }
  m_Fragment = (m_Data = "")
}

* thread #19: tid = 0xa2ca8f, 0x00000001007344e7 libremote.dylib`icinga::DeleteObjectHandler::HandleRequest(this=0x0000000101d05510, user=0x0000700000dce370, request=0x00000001082ecb68, response=0x0000700000dce178) + 1191 at deleteobjecthandler.cpp:54, stop reason = step over
    frame #0: 0x00000001007344e7 libremote.dylib`icinga::DeleteObjectHandler::HandleRequest(this=0x0000000101d05510, user=0x0000700000dce370, request=0x00000001082ecb68, response=0x0000700000dce178) + 1191 at deleteobjecthandler.cpp:54
   51       qd.Types.insert(type->GetName());
   52       qd.Permission = "objects/delete/" + type->GetName();
   53
-> 54       Dictionary::Ptr params = HttpUtility::FetchRequestParameters(request);
   55
   56       params->Set("type", type->GetName());
   57
(lldb) p qd
(icinga::QueryDescription) $20 = {
  Types = size=1 {
    [0] = (m_Data = "Host")
  }
  Provider = {
    px = 0x0000000000000000
  }
  Permission = (m_Data = "objects/delete/Host")
}

Process 18629 stopped
* thread #28: tid = 0xa2f575, 0x000000010072c5f8 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 824 at configobjectutility.cpp:181, stop reason = step over
    frame #0: 0x000000010072c5f8 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 824 at configobjectutility.cpp:181
   178          return false;
   179      }
   180
-> 181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
   182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
(lldb)
Process 18629 stopped
* thread #28: tid = 0xa2f575, 0x000000010072c627 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 871 at configobjectutility.cpp:182, stop reason = step over
    frame #0: 0x000000010072c627 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 871 at configobjectutility.cpp:182
   179      }
   180
   181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
-> 182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
   185              continue;
(lldb) p pobj
(Ptr) $104 = {
  px = 0x0000000107fad590
}
(lldb) p *pobj.px
(icinga::Object) $105 = {
  m_References = 2
  m_Mutex = 4428846544
  m_LockOwner = 0x0000000000000000
}
(lldb) n
[2016-05-02 16:45:50 +0200] information/Application: We jumped forward in time: 18.3771 seconds
[2016-05-02 16:45:50 +0200] information/IdoMysqlConnection: Query queue items: 111, query rate: 0.116667/s (7/min 107/5min 2805/15min); empty in 8 minutes and 32 seconds
[2016-05-02 16:45:50 +0200] warning/ThreadPool: Event call took 18.3772s
Process 18629 stopped
* thread #28: tid = 0xa2f575, 0x000000010072c646 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 902 at configobjectutility.cpp:184, stop reason = step over
    frame #0: 0x000000010072c646 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 902 at configobjectutility.cpp:184
   181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
   182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
-> 184          if (!parentObj)
   185              continue;
   186
   187          DeleteObjectHelper(parentObj, cascade, errors);
(lldb) p parentObj
(icinga::ConfigObject::Ptr) $107 = {}
(lldb) n
[2016-05-02 16:46:21 +0200] warning/ThreadPool: Event call took 31.6886s
[2016-05-02 16:46:21 +0200] warning/ThreadPool: Event call took 31.6885s
[2016-05-02 16:46:21 +0200] warning/ThreadPool: Event call took 57.4722s
[2016-05-02 16:46:21 +0200] warning/ThreadPool: Event call took 57.4719s
[2016-05-02 16:46:21 +0200] warning/ThreadPool: Event call took 31.6885s
Process 18629 stopped
* thread #28: tid = 0xa2f575, 0x000000010072c662 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 930 at configobjectutility.cpp:185, stop reason = step over
    frame #0: 0x000000010072c662 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 930 at configobjectutility.cpp:185
   182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
-> 185              continue;
   186
   187          DeleteObjectHelper(parentObj, cascade, errors);
   188      }
(lldb) n
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 2, 2016

Updated by mfriedrich on 2016-05-02 15:10:39 +00:00

For some reason DependencyGraph::GetParents() returns in incomplete parent object vector which causes trouble here. If you stop the debugger and let another thread finish updating the dependency graph, everything is fine.

Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072c2ef libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 47 at configobjectutility.cpp:171, stop reason = breakpoint 3.1
    frame #0: 0x000000010072c2ef libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 47 at configobjectutility.cpp:171
   168
   169  bool ConfigObjectUtility::DeleteObjectHelper(const ConfigObject::Ptr& object, bool cascade, const Array::Ptr& errors)
   170  {
-> 171      std::vector parents = DependencyGraph::GetParents(object);
   172
   173      if (!parents.empty() && !cascade) {
   174          if (errors)
(lldb) n
[2016-05-02 17:07:46 +0200] warning/ThreadPool: Event call took 4.66426s
[2016-05-02 17:07:46 +0200] information/IdoMysqlConnection: Query queue items: 1, query rate: 41.65/s (2499/min 2499/5min 2499/15min);
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072c334 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 116 at configobjectutility.cpp:173, stop reason = step over
    frame #0: 0x000000010072c334 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 116 at configobjectutility.cpp:173
   170  {
   171      std::vector parents = DependencyGraph::GetParents(object);
   172
-> 173      if (!parents.empty() && !cascade) {
   174          if (errors)
   175              errors->Add("Object cannot be deleted because other objects depend on it. "
   176                  "Use cascading delete to delete it anyway.");
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072c423 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 355 at configobjectutility.cpp:181, stop reason = step over
    frame #0: 0x000000010072c423 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 355 at configobjectutility.cpp:181
   178          return false;
   179      }
   180
-> 181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
   182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
(lldb)
[2016-05-02 17:07:49 +0200] warning/ThreadPool: Event call took 2.53329s
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d830 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::begin, std::__1::allocator > >, mpl_::bool_ >(col=0x0000000100808d85, (null)=0x00000001005cd3f5, (null)=0x0000700000dccd00) at foreach.hpp:669, stop reason = step over
    frame #0: 0x000000010072d830 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::begin, std::__1::allocator > >, mpl_::bool_ >(col=0x0000000100808d85, (null)=0x00000001005cd3f5, (null)=0x0000700000dccd00) at foreach.hpp:669
   666  template
   667  inline auto_any::type>
   668  begin(auto_any_t col, type2type *, boost::mpl::false_ *) // lvalue
-> 669  {
   670      typedef BOOST_DEDUCED_TYPENAME type2type::type type;
   671      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iterator;
   672      return auto_any::type>(
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d844 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::begin, std::__1::allocator > >, mpl_::bool_ >(col=0x0000700000dccc28, (null)=0x0000000000000000, (null)=0x0000000000000000) + 20 at foreach.hpp:673, stop reason = step over
    frame #0: 0x000000010072d844 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::begin, std::__1::allocator > >, mpl_::bool_ >(col=0x0000700000dccc28, (null)=0x0000000000000000, (null)=0x0000000000000000) + 20 at foreach.hpp:673
   670      typedef BOOST_DEDUCED_TYPENAME type2type::type type;
   671      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iterator;
   672      return auto_any::type>(
-> 673          iterator(boost::begin(BOOST_FOREACH_DEREFOF((auto_any_cast(col))))));
   674  }
   675
   676  #ifdef BOOST_FOREACH_RUN_TIME_CONST_RVALUE_DETECTION
(lldb)
[2016-05-02 17:07:50 +0200] warning/ThreadPool: Event call took 3.66974s
[2016-05-02 17:07:50 +0200] warning/ThreadPool: Event call took 2.58695s
[2016-05-02 17:07:50 +0200] warning/ThreadPool: Event call took 3.6702s
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d861 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::begin, std::__1::allocator > >, mpl_::bool_ >(col=0x0000700000dccc28, (null)=0x0000000000000000, (null)=0x0000000000000000) + 49 at foreach.hpp:672, stop reason = step over
    frame #0: 0x000000010072d861 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::begin, std::__1::allocator > >, mpl_::bool_ >(col=0x0000700000dccc28, (null)=0x0000000000000000, (null)=0x0000000000000000) + 49 at foreach.hpp:672
   669  {
   670      typedef BOOST_DEDUCED_TYPENAME type2type::type type;
   671      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iterator;
-> 672      return auto_any::type>(
   673          iterator(boost::begin(BOOST_FOREACH_DEREFOF((auto_any_cast(col))))));
   674  }
   675
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072c49c libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 476 at configobjectutility.cpp:181, stop reason = step over
    frame #0: 0x000000010072c49c libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 476 at configobjectutility.cpp:181
   178          return false;
   179      }
   180
-> 181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
   182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d870 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::end, std::__1::allocator > >, mpl_::bool_ >(col=0x0000000100808d85, (null)=0x00000001005cd3f5, (null)=0x0000700000dccd00) at foreach.hpp:709, stop reason = step over
    frame #0: 0x000000010072d870 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::end, std::__1::allocator > >, mpl_::bool_ >(col=0x0000000100808d85, (null)=0x00000001005cd3f5, (null)=0x0000700000dccd00) at foreach.hpp:709
   706  template
   707  inline auto_any::type>
   708  end(auto_any_t col, type2type *, boost::mpl::false_ *) // lvalue
-> 709  {
   710      typedef BOOST_DEDUCED_TYPENAME type2type::type type;
   711      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iterator;
   712      return auto_any::type>(
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d884 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::end, std::__1::allocator > >, mpl_::bool_ >(col=0x0000700000dccc28, (null)=0x0000000000000000, (null)=0x0000000000000000) + 20 at foreach.hpp:713, stop reason = step over
    frame #0: 0x000000010072d884 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::end, std::__1::allocator > >, mpl_::bool_ >(col=0x0000700000dccc28, (null)=0x0000000000000000, (null)=0x0000000000000000) + 20 at foreach.hpp:713
   710      typedef BOOST_DEDUCED_TYPENAME type2type::type type;
   711      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iterator;
   712      return auto_any::type>(
-> 713          iterator(boost::end(BOOST_FOREACH_DEREFOF((auto_any_cast(col))))));
   714  }
   715
   716  #ifdef BOOST_FOREACH_RUN_TIME_CONST_RVALUE_DETECTION
(lldb)
[2016-05-02 17:07:52 +0200] warning/ThreadPool: Event call took 1.94095s
[2016-05-02 17:07:52 +0200] warning/ThreadPool: Event call took 1.94038s
[2016-05-02 17:07:52 +0200] warning/ThreadPool: Event call took 1.93902s
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d8a1 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::end, std::__1::allocator > >, mpl_::bool_ >(col=0x0000700000dccc28, (null)=0x0000000000000000, (null)=0x0000000000000000) + 49 at foreach.hpp:712, stop reason = step over
    frame #0: 0x000000010072d8a1 libremote.dylib`boost::foreach_detail_::auto_any, std::__1::allocator > >, mpl_::bool_ >::type> boost::foreach_detail_::end, std::__1::allocator > >, mpl_::bool_ >(col=0x0000700000dccc28, (null)=0x0000000000000000, (null)=0x0000000000000000) + 49 at foreach.hpp:712
   709  {
   710      typedef BOOST_DEDUCED_TYPENAME type2type::type type;
   711      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iterator;
-> 712      return auto_any::type>(
   713          iterator(boost::end(BOOST_FOREACH_DEREFOF((auto_any_cast(col))))));
   714  }
   715
(lldb)
[2016-05-02 17:07:52 +0200] warning/ThreadPool: Event call took 3.50434s
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072c505 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 581 at configobjectutility.cpp:181, stop reason = step over
    frame #0: 0x000000010072c505 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 581 at configobjectutility.cpp:181
   178          return false;
   179      }
   180
-> 181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
   182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d8b0 libremote.dylib`bool boost::foreach_detail_::done, std::__1::allocator > >, mpl_::bool_ >(cur=0x0000700000dccd50, end=0x0000000100808d85, (null)=0x00000001005ce875) at foreach.hpp:740, stop reason = step over
    frame #0: 0x000000010072d8b0 libremote.dylib`bool boost::foreach_detail_::done, std::__1::allocator > >, mpl_::bool_ >(cur=0x0000700000dccd50, end=0x0000000100808d85, (null)=0x00000001005ce875) at foreach.hpp:740
   737  //
   738  template
   739  inline bool done(auto_any_t cur, auto_any_t end, type2type *)
-> 740  {
   741      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iter_t;
   742      return auto_any_cast(cur) == auto_any_cast(end);
   743  }
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d8c4 libremote.dylib`bool boost::foreach_detail_::done, std::__1::allocator > >, mpl_::bool_ >(cur=0x0000700000dccc18, end=0x0000700000dccc08, (null)=0x0000000000000000) + 20 at foreach.hpp:742, stop reason = step over
    frame #0: 0x000000010072d8c4 libremote.dylib`bool boost::foreach_detail_::done, std::__1::allocator > >, mpl_::bool_ >(cur=0x0000700000dccc18, end=0x0000700000dccc08, (null)=0x0000000000000000) + 20 at foreach.hpp:742
   739  inline bool done(auto_any_t cur, auto_any_t end, type2type *)
   740  {
   741      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iter_t;
-> 742      return auto_any_cast(cur) == auto_any_cast(end);
   743  }
   744
   745  #ifndef BOOST_NO_FUNCTION_TEMPLATE_ORDERING
(lldb)
[2016-05-02 17:07:54 +0200] warning/ThreadPool: Event call took 2.25659s
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072c58d libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 717 at configobjectutility.cpp:181, stop reason = step over
    frame #0: 0x000000010072c58d libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 717 at configobjectutility.cpp:181
   178          return false;
   179      }
   180
-> 181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
   182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d920 libremote.dylib`boost::foreach_detail_::foreach_reference, std::__1::allocator > >, mpl_::bool_ >::type boost::foreach_detail_::deref, std::__1::allocator > >, mpl_::bool_ >(cur=0x0000000100808d85, (null)=0x00000001005cd3f5) at foreach.hpp:769, stop reason = step over
    frame #0: 0x000000010072d920 libremote.dylib`boost::foreach_detail_::foreach_reference, std::__1::allocator > >, mpl_::bool_ >::type boost::foreach_detail_::deref, std::__1::allocator > >, mpl_::bool_ >(cur=0x0000000100808d85, (null)=0x00000001005cd3f5) at foreach.hpp:769
   766  template
   767  inline BOOST_DEDUCED_TYPENAME foreach_reference::type
   768  deref(auto_any_t cur, type2type *)
-> 769  {
   770      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iter_t;
   771      return *auto_any_cast(cur);
   772  }
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072d930 libremote.dylib`boost::foreach_detail_::foreach_reference, std::__1::allocator > >, mpl_::bool_ >::type boost::foreach_detail_::deref, std::__1::allocator > >, mpl_::bool_ >(cur=0x0000700000dccc18, (null)=0x0000000000000000) + 16 at foreach.hpp:771, stop reason = step over
    frame #0: 0x000000010072d930 libremote.dylib`boost::foreach_detail_::foreach_reference, std::__1::allocator > >, mpl_::bool_ >::type boost::foreach_detail_::deref, std::__1::allocator > >, mpl_::bool_ >(cur=0x0000700000dccc18, (null)=0x0000000000000000) + 16 at foreach.hpp:771
   768  deref(auto_any_t cur, type2type *)
   769  {
   770      typedef BOOST_DEDUCED_TYPENAME foreach_iterator::type iter_t;
-> 771      return *auto_any_cast(cur);
   772  }
   773
   774  /////////////////////////////////////////////////////////////////////////////
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072c5f8 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 824 at configobjectutility.cpp:181, stop reason = step over
    frame #0: 0x000000010072c5f8 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 824 at configobjectutility.cpp:181
   178          return false;
   179      }
   180
-> 181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
   182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
(lldb)
Process 19909 stopped
* thread #28: tid = 0xa3b3f4, 0x000000010072c627 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 871 at configobjectutility.cpp:182, stop reason = step over
    frame #0: 0x000000010072c627 libremote.dylib`icinga::ConfigObjectUtility::DeleteObjectHelper(object=0x0000700000dcd0b8, cascade=true, errors=0x0000700000dcd048) + 871 at configobjectutility.cpp:182
   179      }
   180
   181      BOOST_FOREACH(const Object::Ptr& pobj, parents) {
-> 182          ConfigObject::Ptr parentObj = dynamic_pointer_cast(pobj);
   183
   184          if (!parentObj)
   185              continue;
(lldb) p parents
(std::__1::vector, std::__1::allocator > >) $147 = size=1 {
  [0] = {
    px = 0x00000001088d59f0
  }
}
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 6, 2016

Updated by saurabh_hirani on 2016-05-06 14:26:34 +00:00

Sorry to be rushing but is there a targeted fix for it? This is making icinga2 API unsuable and causing crashes multiple times a day.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 6, 2016

Updated by mfriedrich on 2016-05-06 14:34:04 +00:00

If you can help me find a fix, I'll happily schedule a release target. Apparently I know where the problem lies but investigating on a fix takes longer than expected.

Kind regards,
Michael

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 7, 2016

Updated by saurabh_hirani on 2016-05-07 04:04:37 +00:00

dnsmichi wrote:

If you can help me find a fix, I'll happily schedule a release target. Apparently I know where the problem lies but investigating on a fix takes longer than expected.

Kind regards,
Michael

Got that. Thanks for the update Michael. I am not good at C/C** - so I cannot decipher the coding part clearly enough to contribute back. But I can help out with scenario testing in any way if possible.

--
regards,
Saurabh

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 9, 2016

Updated by gbeutner on 2016-05-09 12:29:43 +00:00

  • Assigned to changed from mfriedrich to gbeutner
  • Target Version set to 2.4.8
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 9, 2016

Updated by gbeutner on 2016-05-09 12:31:11 +00:00

  • Relates set to 11374
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 9, 2016

Updated by gbeutner on 2016-05-09 12:32:38 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset b8e911b.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 10, 2016

Updated by gbeutner on 2016-05-10 09:54:53 +00:00

  • Relates set to 11750
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 10, 2016

Updated by mfriedrich on 2016-05-10 11:21:26 +00:00

  • Relates set to 11753
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 10, 2016

Updated by mfriedrich on 2016-05-10 11:36:48 +00:00

  • Relates set to 11370
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented May 11, 2016

Updated by mfriedrich on 2016-05-11 07:38:56 +00:00

  • Parent Id deleted 11415
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.