rsyslog/logrotate configuration snippets #17

fghaas · 2012-01-06T21:07:41Z

Fixed patch set, originally discussed here:

http://oss.clusterlabs.org/pipermail/pacemaker/2012-January/012626.html

beekhof · 2012-02-17T03:12:43Z

Closing.

As stated elsewhere, diverting all logging from pacemaker's daemons away from syslog, especially to per-daemon files isn't something we want to encourage.

The preferred approach involves ratcheting down the amount of logging we're sending to syslog in the first place.
Happily, switching to libqb will still allow us to be horribly verbose when logging to a file.

fghaas · 2012-02-17T07:11:01Z

"As stated elsewhere," logging to both the split-out logfiles and a central log location would be a matter of simply commenting out the & ~ lines in the rsyslog config snippets. Also "as stated elsewhere", shipping with those lines commented out by default would be something I would be perfectly fine with.

beekhof · 2012-02-17T10:01:31Z

Doubling the size of the logs is an interesting approach but does nothing to address the main objection.
This is not an approach we are interested in pursuing.

fghaas · 2012-02-17T10:10:20Z

You're forgetting the royal wave. :)

beekhof · 2012-02-17T10:15:16Z

I tried being polite, but you're like a dog with a bone sometimes ;-)

fghaas · 2012-02-17T10:16:06Z

Yes, Your Highness. Woof.

With "service" class of resources, by chance, lrmd hangs on futex() syscall: root@node2:~ # cat /proc/2503/stack [<ffffffff810fa0c0>] futex_wait_queue_me+0xc0/0x130 [<ffffffff810faf23>] futex_wait+0x163/0x250 [<ffffffff810fc870>] do_futex+0xe0/0x540 [<ffffffff810fcd3e>] SyS_futex+0x6e/0x140 [<ffffffff815e142e>] entry_SYSCALL_64_fastpath+0x12/0x6d [<ffffffffffffffff>] 0xffffffffffffffff Cluster no longer behaves and cannot recover from the situation. According to the backtrace, it seems due to the reentrancy of dbus_connection_dispatch(): (gdb) bt #0 0x00007f07f7d2e0af in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ClusterLabs#1 0x00007f07f6c29925 in _dbus_connection_acquire_dispatch (connection=0x13411f0) at dbus-connection.c:4142 #2 0x00007f07f6c2b3bc in dbus_connection_dispatch (connection=connection@entry=0x13411f0) at dbus-connection.c:4577 #3 0x00007f07f8d88e50 in pcmk_dbus_connection_dispatch (connection=connection@entry=0x13411f0, new_status=new_status@entry=DBUS_DISPATCH_DATA_REMAINS, data=data@entry=0x0) at dbus.c:410 ClusterLabs#4 0x00007f07f6c29b70 in _dbus_connection_update_dispatch_status_and_unlock (connection=0x13411f0, new_status=DBUS_DISPATCH_DATA_REMAINS) at dbus-connection.c:4346 ClusterLabs#5 0x00007f07f6c29f79 in check_for_reply_and_update_dispatch_unlocked (connection=connection@entry=0x13411f0, pending=pending@entry=0x135a8b0) at dbus-connection.c:2355 ClusterLabs#6 0x00007f07f6c2a08b in _dbus_connection_block_pending_call (pending=0x135a8b0) at dbus-connection.c:2461 ClusterLabs#7 0x00007f07f6c396ba in dbus_pending_call_block (pending=<optimized out>) at dbus-pending-call.c:741 ClusterLabs#8 0x00007f07f8d8929c in pcmk_dbus_send_recv (msg=msg@entry=0x1340940, connection=0x13411f0, error=error@entry=0x7ffc5d148fc0, timeout=-1) at dbus.c:141 ClusterLabs#9 0x00007f07f8d8d2d7 in systemd_unit_by_name (arg_name=arg_name@entry=0x133dcb0 "service", op=op@entry=0x0) at systemd.c:296 ClusterLabs#10 0x00007f07f8d8d45b in systemd_unit_exists (name=name@entry=0x133dcb0 "service") at systemd.c:416 ClusterLabs#11 0x00007f07f8d83dc5 in resources_find_service_class (agent=0x133dcb0 "service") at services.c:88 ClusterLabs#12 0x0000000000405b05 in action_complete (action=0x134e0b0) at lrmd.c:876 ClusterLabs#13 0x00007f07f8d867e3 in operation_finalize (op=0x134e0b0) at services_linux.c:257 ClusterLabs#14 0x00007f07f8d899d8 in pcmk_dbus_lookup_result (reply=reply@entry=0x135cc80, data=data@entry=0x1355e30) at dbus.c:289 ClusterLabs#15 0x00007f07f8d89ba4 in pcmk_dbus_lookup_cb (pending=<optimized out>, user_data=0x1355e30) at dbus.c:334 ClusterLabs#16 0x00007f07f6c28032 in complete_pending_call_and_unlock (connection=0x13411f0, pending=0x135a2c0, message=<optimized out>) at dbus-connection.c:2331 ClusterLabs#17 0x00007f07f6c2b401 in dbus_connection_dispatch (connection=connection@entry=0x13411f0) at dbus-connection.c:4626 ClusterLabs#18 0x00007f07f8d88e50 in pcmk_dbus_connection_dispatch (connection=connection@entry=0x13411f0, new_status=new_status@entry=DBUS_DISPATCH_DATA_REMAINS, data=data@entry=0x0) at dbus.c:410 ClusterLabs#19 0x00007f07f6c29b70 in _dbus_connection_update_dispatch_status_and_unlock (connection=0x13411f0, new_status=DBUS_DISPATCH_DATA_REMAINS) at dbus-connection.c:4346 ClusterLabs#20 0x00007f07f6c29ca6 in _dbus_connection_handle_watch (watch=<optimized out>, condition=1, data=0x13411f0) at dbus-connection.c:1520 ClusterLabs#21 0x00007f07f6c40f2a in dbus_watch_handle (watch=watch@entry=0x133d6a0, flags=flags@entry=1) at dbus-watch.c:722 ClusterLabs#22 0x00007f07f8d887da in pcmk_dbus_watch_dispatch (userdata=0x133d6a0) at dbus.c:448 ClusterLabs#23 0x00007f07f8fcfef7 in mainloop_gio_callback (gio=<optimized out>, condition=G_IO_IN, data=0x133f210) at mainloop.c:673 ClusterLabs#24 0x00007f07f82a0015 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 ClusterLabs#25 0x00007f07f82a0388 in ?? () from /usr/lib64/libglib-2.0.so.0 ClusterLabs#26 0x00007f07f82a064a in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0 ClusterLabs#27 0x0000000000402c0e in main (argc=<optimized out>, argv=0x7ffc5d149818) at main.c:476 As described in: https://dbus.freedesktop.org/doc/api/html/group__DBusConnection.html#ga55ff88cd22c0672441c7deffbfb68fbf , dbus_connection_dispatch() MUST NOT BE CALLED from inside the DBusDispatchStatusFunction. It seems that pcmk_dbus_watch_dispatch() is an appropriate place to it instead.

With "service" class of resources, by chance, lrmd hangs on futex() syscall: root@node2:~ # cat /proc/2503/stack [<ffffffff810fa0c0>] futex_wait_queue_me+0xc0/0x130 [<ffffffff810faf23>] futex_wait+0x163/0x250 [<ffffffff810fc870>] do_futex+0xe0/0x540 [<ffffffff810fcd3e>] SyS_futex+0x6e/0x140 [<ffffffff815e142e>] entry_SYSCALL_64_fastpath+0x12/0x6d [<ffffffffffffffff>] 0xffffffffffffffff Cluster no longer behaves and cannot recover from the situation. According to the backtrace, it seems due to the reentrancy of dbus_connection_dispatch(): (gdb) bt #0 0x00007f07f7d2e0af in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ClusterLabs#1 0x00007f07f6c29925 in _dbus_connection_acquire_dispatch (connection=0x13411f0) at dbus-connection.c:4142 #2 0x00007f07f6c2b3bc in dbus_connection_dispatch (connection=connection@entry=0x13411f0) at dbus-connection.c:4577 #3 0x00007f07f8d88e50 in pcmk_dbus_connection_dispatch (connection=connection@entry=0x13411f0, new_status=new_status@entry=DBUS_DISPATCH_DATA_REMAINS, data=data@entry=0x0) at dbus.c:410 ClusterLabs#4 0x00007f07f6c29b70 in _dbus_connection_update_dispatch_status_and_unlock (connection=0x13411f0, new_status=DBUS_DISPATCH_DATA_REMAINS) at dbus-connection.c:4346 ClusterLabs#5 0x00007f07f6c29f79 in check_for_reply_and_update_dispatch_unlocked (connection=connection@entry=0x13411f0, pending=pending@entry=0x135a8b0) at dbus-connection.c:2355 ClusterLabs#6 0x00007f07f6c2a08b in _dbus_connection_block_pending_call (pending=0x135a8b0) at dbus-connection.c:2461 ClusterLabs#7 0x00007f07f6c396ba in dbus_pending_call_block (pending=<optimized out>) at dbus-pending-call.c:741 ClusterLabs#8 0x00007f07f8d8929c in pcmk_dbus_send_recv (msg=msg@entry=0x1340940, connection=0x13411f0, error=error@entry=0x7ffc5d148fc0, timeout=-1) at dbus.c:141 ClusterLabs#9 0x00007f07f8d8d2d7 in systemd_unit_by_name (arg_name=arg_name@entry=0x133dcb0 "service", op=op@entry=0x0) at systemd.c:296 ClusterLabs#10 0x00007f07f8d8d45b in systemd_unit_exists (name=name@entry=0x133dcb0 "service") at systemd.c:416 ClusterLabs#11 0x00007f07f8d83dc5 in resources_find_service_class (agent=0x133dcb0 "service") at services.c:88 ClusterLabs#12 0x0000000000405b05 in action_complete (action=0x134e0b0) at lrmd.c:876 ClusterLabs#13 0x00007f07f8d867e3 in operation_finalize (op=0x134e0b0) at services_linux.c:257 ClusterLabs#14 0x00007f07f8d899d8 in pcmk_dbus_lookup_result (reply=reply@entry=0x135cc80, data=data@entry=0x1355e30) at dbus.c:289 ClusterLabs#15 0x00007f07f8d89ba4 in pcmk_dbus_lookup_cb (pending=<optimized out>, user_data=0x1355e30) at dbus.c:334 ClusterLabs#16 0x00007f07f6c28032 in complete_pending_call_and_unlock (connection=0x13411f0, pending=0x135a2c0, message=<optimized out>) at dbus-connection.c:2331 ClusterLabs#17 0x00007f07f6c2b401 in dbus_connection_dispatch (connection=connection@entry=0x13411f0) at dbus-connection.c:4626 ClusterLabs#18 0x00007f07f8d88e50 in pcmk_dbus_connection_dispatch (connection=connection@entry=0x13411f0, new_status=new_status@entry=DBUS_DISPATCH_DATA_REMAINS, data=data@entry=0x0) at dbus.c:410 ClusterLabs#19 0x00007f07f6c29b70 in _dbus_connection_update_dispatch_status_and_unlock (connection=0x13411f0, new_status=DBUS_DISPATCH_DATA_REMAINS) at dbus-connection.c:4346 ClusterLabs#20 0x00007f07f6c29ca6 in _dbus_connection_handle_watch (watch=<optimized out>, condition=1, data=0x13411f0) at dbus-connection.c:1520 ClusterLabs#21 0x00007f07f6c40f2a in dbus_watch_handle (watch=watch@entry=0x133d6a0, flags=flags@entry=1) at dbus-watch.c:722 ClusterLabs#22 0x00007f07f8d887da in pcmk_dbus_watch_dispatch (userdata=0x133d6a0) at dbus.c:448 ClusterLabs#23 0x00007f07f8fcfef7 in mainloop_gio_callback (gio=<optimized out>, condition=G_IO_IN, data=0x133f210) at mainloop.c:673 ClusterLabs#24 0x00007f07f82a0015 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 ClusterLabs#25 0x00007f07f82a0388 in ?? () from /usr/lib64/libglib-2.0.so.0 ClusterLabs#26 0x00007f07f82a064a in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0 ClusterLabs#27 0x0000000000402c0e in main (argc=<optimized out>, argv=0x7ffc5d149818) at main.c:476 As described in: https://dbus.freedesktop.org/doc/api/html/group__DBusConnection.html#ga55ff88cd22c0672441c7deffbfb68fbf , dbus_connection_dispatch() MUST NOT BE CALLED from inside the DBusDispatchStatusFunction. It seems that pcmk_dbus_watch_dispatch() is an appropriate place to do it instead.

ed51cc9 introduced an issue that'd cause reentrancy of mainloop_del_fd() on stopping crmd, which would complain like: "crit: GLib: Source ID 22 was not found when attempting to remove it" ``` #0 0xb7f29cc9 in __kernel_vsyscall () ClusterLabs#1 0xb7b538e2 in raise () from /lib/libc.so.6 #2 0xb7b54fd1 in abort () from /lib/libc.so.6 #3 0xb7e51a1e in crm_abort (file=0xb7e83c9f "logging.c", function=0xb7e85b00 <__FUNCTION__.21612> "crm_glib_handler", line=73, assert_condition=0x63c6a0 "Source ID 22 was not found when attempting to remove it", do_core=1, do_fork=<optimized out>) at utils.c:689 ClusterLabs#4 0xb7e7257f in crm_glib_handler (log_domain=0xb7d93e8e "GLib", flags=G_LOG_LEVEL_CRITICAL, message=0x63c6a0 "Source ID 22 was not found when attempting to remove it", user_data=0x0) at logging.c:73 ClusterLabs#5 0xb7d5055c in g_logv () from /usr/lib/libglib-2.0.so.0 ClusterLabs#6 0xb7d506a5 in g_log () from /usr/lib/libglib-2.0.so.0 ClusterLabs#7 0xb7d484f6 in g_source_remove () from /usr/lib/libglib-2.0.so.0 ClusterLabs#8 0xb7e6f921 in mainloop_del_fd (client=0x61a110) at mainloop.c:862 ClusterLabs#9 0xb7e6f9b8 in mainloop_del_ipc_client (client=0x61a110) at mainloop.c:797 ClusterLabs#10 0x005069a8 in pe_subsystem_free () at pengine.c:43 ClusterLabs#11 pe_ipc_destroy (user_data=0x0) at pengine.c:126 ClusterLabs#12 0xb7e6dab4 in mainloop_gio_destroy (c=0x61a110) at mainloop.c:748 ClusterLabs#13 0xb7d46088 in ?? () from /usr/lib/libglib-2.0.so.0 ClusterLabs#14 0xb7d46b6a in ?? () from /usr/lib/libglib-2.0.so.0 ClusterLabs#15 0xb7d484cd in g_source_remove () from /usr/lib/libglib-2.0.so.0 ClusterLabs#16 0xb7e6f921 in mainloop_del_fd (client=0x61a110) at mainloop.c:862 ClusterLabs#17 0xb7e6f9b8 in mainloop_del_ipc_client (client=0x61a110) at mainloop.c:797 ClusterLabs#18 0x00506cac in pe_subsystem_free () at pengine.c:43 ClusterLabs#19 do_pe_control (action=2251799813685248, cause=C_FSA_INTERNAL, cur_state=S_STOPPING, current_input= I_STOP, msg_data=0x645600) at pengine.c:211 ClusterLabs#20 0x004f0b66 in do_fsa_action (fsa_data=fsa_data@entry=0x645600, an_action=<optimized out>, function= 0x506ac0 <do_pe_control>) at fsa.c:139 ClusterLabs#21 0x004f2c8c in s_crmd_fsa_actions (fsa_data=0x645600) at fsa.c:405 ClusterLabs#22 0x004f43c7 in s_crmd_fsa (cause=C_FSA_INTERNAL) at fsa.c:233 ClusterLabs#23 0x004fea0b in crm_fsa_trigger (user_data=0x0) at callbacks.c:298 ClusterLabs#24 0xb7e6e7ca in crm_trigger_dispatch (source=0x60c4a0, callback=0x4fe9d0 <crm_fsa_trigger>, userdata=0x60c4a0) at mainloop.c:109 ClusterLabs#25 0xb7d49944 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 ClusterLabs#26 0xb7d49d49 in ?? () from /usr/lib/libglib-2.0.so.0 ClusterLabs#27 0xb7d4a0f9 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 ClusterLabs#28 0x004efc68 in crmd_init () at main.c:162 ClusterLabs#29 0x004ef937 in main (argc=<optimized out>, argv=<optimized out>) at main.c:121 ```

fghaas added 2 commits January 5, 2012 22:49

extra: add rsyslog configuration snippet

9e9bafd

extra: add logrotate configuration snippet

12f7beb

beekhof closed this Feb 17, 2012

gao-yan mentioned this pull request Dec 14, 2016

Fix: dbus: Prevent lrmd from hanging on dbus calls #1201

Merged

gao-yan mentioned this pull request Mar 23, 2023

Fix: controller: avoid use-after-free when disconnecting proxy IPCs during shutdown #3060

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rsyslog/logrotate configuration snippets #17

rsyslog/logrotate configuration snippets #17

fghaas commented Jan 6, 2012

beekhof commented Feb 17, 2012

fghaas commented Feb 17, 2012

beekhof commented Feb 17, 2012

fghaas commented Feb 17, 2012

beekhof commented Feb 17, 2012

fghaas commented Feb 17, 2012

rsyslog/logrotate configuration snippets #17

rsyslog/logrotate configuration snippets #17

Conversation

fghaas commented Jan 6, 2012

beekhof commented Feb 17, 2012

fghaas commented Feb 17, 2012

beekhof commented Feb 17, 2012

fghaas commented Feb 17, 2012

beekhof commented Feb 17, 2012

fghaas commented Feb 17, 2012