Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coredump occurs when I restart sssd-ifp.service with sssd.service is inactive #6324

Closed
huangzq6 opened this issue Aug 25, 2022 · 11 comments
Closed
Labels
Closed: Fixed Issue was closed as fixed.

Comments

@huangzq6
Copy link

huangzq6 commented Aug 25, 2022

HI, coredump occurs when I restart sssd-ifp.service with sssd.service is inactive. The probability of occurrence is very high,or you can restart the sssd-ifp.service and sssd.service meanwhile, it will crash.
I am checking for the reason, it maybe a double free.

the system is with 5.10 kernel,
some message is that:
sssd: 2.6.1-1
glibc: 2.34-70

the coredump file log is as follows(some ):
localhost:/home/hzq # gdb /usr/sbin/sssd core.sssd.0.b3ab7388b13f4a979fa9fe86fcaae558.57906.1661343892000000
GNU gdb (GDB) openEuler 11.1-1.h2.openEuler
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-openEuler-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/sssd...
Reading symbols from /usr/lib/debug//usr/sbin/sssd-2.6.1-1.h7.openEuler.aarch64.debug...
[New LWP 57906]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/sssd -i --logger=files'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (threadid=281473457532960, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
Missing separate debuginfos, use: dnf debuginfo-install ...
(gdb) bt
#0 __pthread_kill_implementation (threadid=281473457532960, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x0000ffffa4ac7d14 in __pthread_kill_internal (signo=, threadid=) at pthread_kill.c:78
#2 0x0000ffffa4a83cbc in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x0000ffffa4a71d2c in __GI_abort () at abort.c:79
#4 0x0000ffffa4dfbdf8 in _talloc_free () from /usr/lib64/libtalloc.so.2
#5 0x0000ffffa4e7f190 in sbus_connection_destructor (conn=0xaaaafa94b6c0) at src/sbus/connection/sbus_connection.c:78
#6 0x0000ffffa4dfaef8 in ?? () from /usr/lib64/libtalloc.so.2
#7 0x0000ffffa4dfab58 in ?? () from /usr/lib64/libtalloc.so.2
#8 0x0000ffffa4dfab58 in ?? () from /usr/lib64/libtalloc.so.2
#9 0x0000ffffa4dfab58 in ?? () from /usr/lib64/libtalloc.so.2
#10 0x0000ffffa4dfab58 in ?? () from /usr/lib64/libtalloc.so.2
#11 0x0000ffffa55a00f0 in server_atexit () at src/util/server.c:45
#12 0x0000ffffa4a864d0 in __run_exit_handlers (status=0, listp=0xffffa4be7698 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true,
run_dtors=run_dtors@entry=true) at exit.c:113
#13 0x0000ffffa4a8662c in __GI_exit (status=) at exit.c:143
#14 0x0000aaaad64b6f58 in monitor_quit (mt_ctx=, ret=0) at src/monitor/monitor.c:1438
#15 0x0000aaaad64b76d0 in monitor_quit_signal (ev=, se=, signum=15, count=, siginfo=,
private_data=) at src/monitor/monitor.c:1455
#16 0x0000ffffa4e24248 in tevent_common_invoke_signal_handler () from /usr/lib64/libtevent.so.0
#17 0x0000ffffa4e243f8 in tevent_common_check_signal () from /usr/lib64/libtevent.so.0
#18 0x0000ffffa4e228c4 in ?? () from /usr/lib64/libtevent.so.0
#19 0x0000ffffa4e1f4d4 in _tevent_loop_once () from /usr/lib64/libtevent.so.0
#20 0x0000ffffa4e1f7b0 in tevent_common_loop_wait () from /usr/lib64/libtevent.so.0
#21 0x0000ffffa55a1510 in server_loop (main_ctx=0xaaaafa93ee60) at src/util/server.c:733
#22 0x0000aaaad64b5974 in main (argc=, argv=) at src/monitor/monitor.c:2582

@huangzq6
Copy link
Author

sorry for late to update the message
i think the problem can be fixed by the codes


src/monitor/monitor.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/src/monitor/monitor.c b/src/monitor/monitor.c
index 55cb083..53e7895 100644
--- a/src/monitor/monitor.c
+++ b/src/monitor/monitor.c
@@ -337,6 +337,10 @@ monitor_sbus_RegisterService(TALLOC_CTX *mem_ctx,
return ret;
}

  • if (strcasecmp(name, "ifp") == 0) {

  •    svc->socket_activated = true;
    
  • }

  • *_monitor_version = MONITOR_VERSION;

    return EOK;
    --
    2.27.0

some analyse below:
When the responder reconnects and gets old information in the service list (get_service_in_the_list), the corresponding socket_activated flag will be set to false (svc->socket_activated = false).
The above behavior caused the main process to exit (monitor_quit), did not set the corresponding destructor to NULL (talloc_set_destructor(svc->conn, NULL)), and finally caused double-free during destructor, resulting in coredump.
Therefore, it is necessary to set the corresponding socket_activated flag to true at the end of the responder service reconnection.

In fact, all services should have the above settings, but I have only reproduced the coredump scenario after ifp reconnection, and other responders have not been tested for reproduction. In order to keep the minimum modification, an if judgment is added here, and only the scene of ifp is processed.
The complete solution to this problem requires further optimization of the socket_activated setting mechanism.
i will do further research about that.

@huangzq6
Copy link
Author

the reproduce step is:
systemctl restart sssd-ifp
systemctl restart sssd
systemctl stop sssd

sssd version is 2.6.1

@alexey-tikhonov
Copy link
Member

Hi @huangzq6,

what's in the services option of your sssd.conf? Does it list ifp?

@huangzq6 huangzq6 reopened this Jan 20, 2023
@huangzq6
Copy link
Author

hi, @alexey-tikhonov
sorry for late to reply, i am on vacation now.
I'm just doing a robustness test for sssd service, so i do not confiure ifp in sssd.conf.
if need, i will attach it later.

@alexey-tikhonov
Copy link
Member

Hi,

  1. I wasn't able to reproduce a crash on RHEL8.6 with sssd-2.6.2
  2. according to the backtrace provided, crash hapens at exit path, so hopefully severity isn't too high
  3. taking into account plans to get rid of monitor entirely, it probably doesn't make sense to hunt this bug

Patch proposed in #6324 (comment) doesn't take into account 'ifp' can be configured in sssd.conf to be run by sssd service (and not by sssd-ifp).

@alexey-tikhonov
Copy link
Member

alexey-tikhonov commented Feb 3, 2023

When the responder reconnects and gets old information in the service list (get_service_in_the_list), the corresponding socket_activated flag will be set to false (svc->socket_activated = false).

Ah... Do you mean:

  • when socket activated service connects for a first time, it's added to mt_ctx->svc_list by socket_activated_service_not_found() with a proper socket_activated = true
  • but when it reconnects again, get_service_in_the_list() finds it in mt_ctx->svc_list and overwrites socket_activated = false unconditionally
    ?

@alexey-tikhonov
Copy link
Member

Hi @huangzq6,

could you please check if following patch resolves an issue for you:

diff --git a/src/monitor/monitor.c b/src/monitor/monitor.c
index 6d3d05cd3..0972465f9 100644
--- a/src/monitor/monitor.c
+++ b/src/monitor/monitor.c
@@ -242,7 +242,6 @@ get_service_in_the_list(struct mt_ctx *mt_ctx,
 
     for (svc = mt_ctx->svc_list; svc != NULL; svc = svc->next) {
         if (strcasecmp(svc->identity, svc_name) == 0) {
-            svc->socket_activated = false;
             *_svc = svc;
             return EOK;
         }
@@ -1802,6 +1801,8 @@ static int start_service(struct mt_svc *svc)
 
     DEBUG(SSSDBG_CONF_SETTINGS,"Queueing service %s for startup\n", svc->name);
 
+    svc->socket_activated = false;
+
     tv = tevent_timeval_current();
 
     /* Add a timed event to start up the service.

?

@huangzq6
Copy link
Author

Hi @huangzq6,

could you please check if following patch resolves an issue for you:

diff --git a/src/monitor/monitor.c b/src/monitor/monitor.c
index 6d3d05cd3..0972465f9 100644
--- a/src/monitor/monitor.c
+++ b/src/monitor/monitor.c
@@ -242,7 +242,6 @@ get_service_in_the_list(struct mt_ctx *mt_ctx,
 
     for (svc = mt_ctx->svc_list; svc != NULL; svc = svc->next) {
         if (strcasecmp(svc->identity, svc_name) == 0) {
-            svc->socket_activated = false;
             *_svc = svc;
             return EOK;
         }
@@ -1802,6 +1801,8 @@ static int start_service(struct mt_svc *svc)
 
     DEBUG(SSSDBG_CONF_SETTINGS,"Queueing service %s for startup\n", svc->name);
 
+    svc->socket_activated = false;
+
     tv = tevent_timeval_current();
 
     /* Add a timed event to start up the service.

?

hi, thanks for your patch.
I have patched it and retry in sssd-2.6.1, it never coredump again.

@HelloCarry
Copy link
Contributor

HelloCarry commented Feb 11, 2023

Hi @alexey-tikhonov, I also tried your modification and found it worked. This is the PR submitted according to your suggestion
#6574

alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 13, 2023
When socket activated service connects for a first time, it is added to
`mt_ctx->svc_list` by `socket_activated_service_not_found()` with a proper
`socket_activated = true`.
But when it reconnects again, `get_service_in_the_list()` finds it in
`mt_ctx->svc_list` and overwrites `socket_activated = false` unconditionally.
This patch moves moves `socket_activated = false` to `start_service()`.

Resolves: SSSD#6324
@alexey-tikhonov
Copy link
Member

Thank you for the confirmations.

alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 16, 2023
When socket activated service connects for a first time, it is added to
`mt_ctx->svc_list` by `socket_activated_service_not_found()` with a proper
`socket_activated = true`.
But when it reconnects again, `get_service_in_the_list()` finds it in
`mt_ctx->svc_list` and overwrites `socket_activated = false` unconditionally.
This patch moves `socket_activated = false` to `start_service()`.

Resolves: SSSD#6324
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 16, 2023
When socket activated service connects for the first time, it is added to
`mt_ctx->svc_list` by `socket_activated_service_not_found()` with a proper
`socket_activated = true`.
But when it reconnects again, `get_service_in_the_list()` finds it in
`mt_ctx->svc_list` and overwrites `socket_activated = false` unconditionally.
This patch moves `socket_activated = false` to `start_service()`.

Resolves: SSSD#6324
alexey-tikhonov added a commit that referenced this issue Feb 17, 2023
When socket activated service connects for the first time, it is added to
`mt_ctx->svc_list` by `socket_activated_service_not_found()` with a proper
`socket_activated = true`.
But when it reconnects again, `get_service_in_the_list()` finds it in
`mt_ctx->svc_list` and overwrites `socket_activated = false` unconditionally.
This patch moves `socket_activated = false` to `start_service()`.

Resolves: #6324

Reviewed-by: Iker Pedrosa <ipedrosa@redhat.com>
Reviewed-by: Pavel Březina <pbrezina@redhat.com>
(cherry picked from commit d4f7ed6)
@alexey-tikhonov
Copy link
Member

Pushed PR: #6575

  • master
    • d4f7ed6 - MONITOR: fix socket_activated flag initialization
  • sssd-2-8
    • ae0accc - MONITOR: fix socket_activated flag initialization

@alexey-tikhonov alexey-tikhonov added the Closed: Fixed Issue was closed as fixed. label Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed: Fixed Issue was closed as fixed.
Projects
None yet
3 participants