You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
testing sssd with overrides (users and groups) on centos 7.2 (sssd 1.13.0, libldb 1.1.25), i soon ran in to problems with sssd_nss crashing. dmesg shows:
sssd_nss[28935]: segfault at 51 ip 00007fa5e39d46af sp 00007ffcd6f18290 error 4 in libldb.so.1.1.25[7fa5e39c4000+2d000]
backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97
97 if (strdn && strdn->data
(gdb) bt
#0 0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97
#1 0x00007f4b5188109a in sysdb_add_group_member_overrides (domain=domain@entry=0x7f4b52296120, obj=0x7f4b522a3e40) at src/db/sysdb_views.c:1308
#2 0x00007f4b5187373c in sysdb_getgrgid_with_views (mem_ctx=mem_ctx@entry=0x7f4b52295ea0, domain=domain@entry=0x7f4b52296120, gid=65751, res=res@entry=0x7f4b522a3260) at src/db/sysdb_search.c:659
#3 0x00007f4b51ef292c in nss_cmd_getgrgid_search (dctx=dctx@entry=0x7f4b522a3240) at src/responder/nss/nsssrv_cmd.c:3349
#4 0x00007f4b51ef672d in nss_cmd_getbyid (cmd=<optimized out>, cctx=0x7f4b522a0900) at src/responder/nss/nsssrv_cmd.c:1975
#5 0x00007f4b51f01b2e in client_cmd_execute (sss_cmds=0x7f4b521182e0 <nss_cmds>, cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:249
#6 client_recv (cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:283
#7 client_fd_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, ptr=<optimized out>) at src/responder/common/responder_common.c:335
#8 0x00007f4b4e15bd0b in epoll_event_loop_once () from /lib64/libtevent.so.0
#9 0x00007f4b4e15a1d7 in std_event_loop_once () from /lib64/libtevent.so.0
#10 0x00007f4b4e15636d in _tevent_loop_once () from /lib64/libtevent.so.0
#11 0x00007f4b4e15650b in tevent_common_loop_wait () from /lib64/libtevent.so.0
#12 0x00007f4b4e15a177 in std_event_loop_wait () from /lib64/libtevent.so.0
#13 0x00007f4b51891553 in server_loop (main_ctx=0x7f4b5228d2a0) at src/util/server.c:668
#14 0x00007f4b51eeaf77 in main (argc=8, argv=<optimized out>) at src/responder/nss/nsssrv.c:626
running sssd with -d 9, and having added ldb tracing to src/db/sysdb.c (line 59 in the unpatched source):
ret = ldb_connect(ldb, filename, LDB_FLG_ENABLE_TRACING, NULL);
things die just after retrieving the override information for a member of a group, e.g. (username/domain name removed):
it always seems to fail on the first or second member of the group, and it is always when the group being looked at has overrides (gid).
when dropping back to the previous ldb packages for centos 7.2 (ldb 1.1.20) everything seems to work just fine, so i looked at the differences, and it seems that this change, which was added in ldb 1.1.24 might be significant:
i have had a bit of a poke around, but can't say i have been able to work out exactly why this is the case ...
i would like to have been able to give a better report of the exact cause of the problem, but have unfortunately run out of time to look at this for now.
at the moment, i can stick with ldb-1.1.20, but that's not really a long term solution. i did also do some quick testing with sssd-1.14.0, and the problem remains.
let me know if i can provide any more information.
description: testing sssd with overrides (users and groups) on centos 7.2 (sssd 1.13.0, libldb 1.1.25), i soon ran in to problems with sssd_nss crashing. dmesg shows:
sssd_nss[28935]: segfault at 51 ip 00007fa5e39d46af sp 00007ffcd6f18290 error 4 in libldb.so.1.1.25[7fa5e39c4000+2d000]
backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97
97 if (strdn && strdn->data
(gdb) bt
#0 0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97 #1 0x00007f4b5188109a in sysdb_add_group_member_overrides (domain=domain@entry=0x7f4b52296120, obj=0x7f4b522a3e40) at src/db/sysdb_views.c:1308 #2 0x00007f4b5187373c in sysdb_getgrgid_with_views (mem_ctx=mem_ctx@entry=0x7f4b52295ea0, domain=domain@entry=0x7f4b52296120, gid=65751, res=res@entry=0x7f4b522a3260) at src/db/sysdb_search.c:659 #3 0x00007f4b51ef292c in nss_cmd_getgrgid_search (dctx=dctx@entry=0x7f4b522a3240) at src/responder/nss/nsssrv_cmd.c:3349 #4 0x00007f4b51ef672d in nss_cmd_getbyid (cmd=, cctx=0x7f4b522a0900) at src/responder/nss/nsssrv_cmd.c:1975 #5 0x00007f4b51f01b2e in client_cmd_execute (sss_cmds=0x7f4b521182e0 <nss_cmds>, cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:249 #6 client_recv (cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:283 #7 client_fd_handler (ev=, fde=, flags=, ptr=) at src/responder/common/responder_common.c:335 #8 0x00007f4b4e15bd0b in epoll_event_loop_once () from /lib64/libtevent.so.0 #9 0x00007f4b4e15a1d7 in std_event_loop_once () from /lib64/libtevent.so.0 #10 0x00007f4b4e15636d in _tevent_loop_once () from /lib64/libtevent.so.0 #11 0x00007f4b4e15650b in tevent_common_loop_wait () from /lib64/libtevent.so.0 #12 0x00007f4b4e15a177 in std_event_loop_wait () from /lib64/libtevent.so.0 #13 0x00007f4b51891553 in server_loop (main_ctx=0x7f4b5228d2a0) at src/util/server.c:668 #14 0x00007f4b51eeaf77 in main (argc=8, argv=) at src/responder/nss/nsssrv.c:626
running sssd with -d 9, and having added ldb tracing to src/db/sysdb.c (line 59 in the unpatched source):
ret = ldb_connect(ldb, filename, LDB_FLG_ENABLE_TRACING, NULL);
things die just after retrieving the override information for a member of a group, e.g. (username/domain name removed):
it always seems to fail on the first or second member of the group, and it is always when the group being looked at has overrides (gid).
when dropping back to the previous ldb packages for centos 7.2 (ldb 1.1.20) everything seems to work just fine, so i looked at the differences, and it seems that this change, which was added in ldb 1.1.24 might be significant:
i have had a bit of a poke around, but can't say i have been able to work out exactly why this is the case ...
i would like to have been able to give a better report of the exact cause of the problem, but have unfortunately run out of time to look at this for now.
at the moment, i can stick with ldb-1.1.20, but that's not really a long term solution. i did also do some quick testing with sssd-1.14.0, and the problem remains.
let me know if i can provide any more information.
thanks,
richard => testing sssd with overrides (users and groups) on centos 7.2 (sssd 1.13.0, libldb 1.1.25), i soon ran in to problems with sssd_nss crashing. dmesg shows:
sssd_nss[28935]: segfault at 51 ip 00007fa5e39d46af sp 00007ffcd6f18290 error 4 in libldb.so.1.1.25[7fa5e39c4000+2d000]
backtrace:
{{{
Program received signal SIGSEGV, Segmentation fault.
0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97
97 if (strdn && strdn->data
(gdb) bt
#0 0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97 #1 0x00007f4b5188109a in sysdb_add_group_member_overrides (domain=domain@entry=0x7f4b52296120, obj=0x7f4b522a3e40) at src/db/sysdb_views.c:1308 #2 0x00007f4b5187373c in sysdb_getgrgid_with_views (mem_ctx=mem_ctx@entry=0x7f4b52295ea0, domain=domain@entry=0x7f4b52296120, gid=65751, res=res@entry=0x7f4b522a3260) at src/db/sysdb_search.c:659 #3 0x00007f4b51ef292c in nss_cmd_getgrgid_search (dctx=dctx@entry=0x7f4b522a3240) at src/responder/nss/nsssrv_cmd.c:3349 #4 0x00007f4b51ef672d in nss_cmd_getbyid (cmd=, cctx=0x7f4b522a0900) at src/responder/nss/nsssrv_cmd.c:1975 #5 0x00007f4b51f01b2e in client_cmd_execute (sss_cmds=0x7f4b521182e0 <nss_cmds>, cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:249 #6 client_recv (cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:283 #7 client_fd_handler (ev=, fde=, flags=, ptr=) at src/responder/common/responder_common.c:335 #8 0x00007f4b4e15bd0b in epoll_event_loop_once () from /lib64/libtevent.so.0 #9 0x00007f4b4e15a1d7 in std_event_loop_once () from /lib64/libtevent.so.0 #10 0x00007f4b4e15636d in _tevent_loop_once () from /lib64/libtevent.so.0 #11 0x00007f4b4e15650b in tevent_common_loop_wait () from /lib64/libtevent.so.0 #12 0x00007f4b4e15a177 in std_event_loop_wait () from /lib64/libtevent.so.0 #13 0x00007f4b51891553 in server_loop (main_ctx=0x7f4b5228d2a0) at src/util/server.c:668 #14 0x00007f4b51eeaf77 in main (argc=8, argv=) at src/responder/nss/nsssrv.c:626
}}}
running sssd with -d 9, and having added ldb tracing to src/db/sysdb.c (line 59 in the unpatched source):
ret = ldb_connect(ldb, filename, LDB_FLG_ENABLE_TRACING, NULL);
things die just after retrieving the override information for a member of a group, e.g. (username/domain name removed):
it always seems to fail on the first or second member of the group, and it is always when the group being looked at has overrides (gid).
when dropping back to the previous ldb packages for centos 7.2 (ldb 1.1.20) everything seems to work just fine, so i looked at the differences, and it seems that this change, which was added in ldb 1.1.24 might be significant:
}}}
i have had a bit of a poke around, but can't say i have been able to work out exactly why this is the case ...
i would like to have been able to give a better report of the exact cause of the problem, but have unfortunately run out of time to look at this for now.
at the moment, i can stick with ldb-1.1.20, but that's not really a long term solution. i did also do some quick testing with sssd-1.14.0, and the problem remains.
let me know if i can provide any more information.
I doubt that bug is in libldb.
There is much higher chance that bug is directly in sssd.
Could you try to reproduce with latest 1.13[1]? (1.14 might have some bugs. I would say there is a high change that bug is fixed in 1.13. And if it is not fixed then could you provide steps to reproduce crash?
i'm not sure that it really matters which attributes are overwritten, this is just what i was testing when things started to fail.
then, to trigger the problem, something like:
systemctl stop sssd
\rm /var/lib/sss/mc/*
systemctl start sssd
groups user1
systemctl stop sssd
\rm /var/lib/sss/mc/*
systemctl start sssd
groups user2
systemctl stop sssd
\rm /var/lib/sss/mc/*
systemctl start sssd
groups user3
repeat as necessary. for me the problems seem to start once there are at least 2 memberuid entries for the group in the cache ldb file.
where the segmentation fault occurs (around sysdb_views.c:1307 in upatched 1.13.4 source):
for (c = 0; c < members->num_values; c++) {
member_dn = ldb_dn_from_ldb_val(tmp_ctx, domain->sysdb->ldb,
&members->values[c]);
it seems to fail after a couple of iterations on the overridden group (e.g. c=2), at which point members has been overwritten / wiped out somewhere along the way.
that's about as far as i have managed to get just now ... if i have a bit of time over the weekend, i'll see if i can give it another look.
hope that's of some use, but let me know if you need any more details from me.
BTW, I tested the patch a it works for me.
Thank you very much for time and you analysis.
I really appreciate it.
I am changing the title little bit. because the same crash can happen libldb-1.1.20. It worked for you only by a chance with that version. It depends on glibc/kernel whether it will move memory or just extend current block.
summary: using overides causes segfault in libldb > 1.1.23 => using overides causes segfault in libldb
i applied the patch against 1.13.4, and ran the same tests which had reliably been causing the crashes, put it through a debugger, etc., added 'loads' of user and group overrides, set it off looking up information for 2000 or so users, and all seems to be working just fine - no issues or signs of any trouble at all.
so, looks good to me.
glad to have been of some use in tracking down the problem, and thanks for your help in working things out, and getting it fixed.
i'm quite happy to test the packages when available. i'm guessing that will be when 7.3 beta is released, unless i should also be looking somewhere else for the updated packages?
Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/3118
testing sssd with overrides (users and groups) on centos 7.2 (sssd 1.13.0, libldb 1.1.25), i soon ran in to problems with sssd_nss crashing. dmesg shows:
sssd_nss[28935]: segfault at 51 ip 00007fa5e39d46af sp 00007ffcd6f18290 error 4 in libldb.so.1.1.25[7fa5e39c4000+2d000]
backtrace:
running sssd with -d 9, and having added ldb tracing to src/db/sysdb.c (line 59 in the unpatched source):
ret = ldb_connect(ldb, filename, LDB_FLG_ENABLE_TRACING, NULL);
things die just after retrieving the override information for a member of a group, e.g. (username/domain name removed):
it always seems to fail on the first or second member of the group, and it is always when the group being looked at has overrides (gid).
when dropping back to the previous ldb packages for centos 7.2 (ldb 1.1.20) everything seems to work just fine, so i looked at the differences, and it seems that this change, which was added in ldb 1.1.24 might be significant:
reverting this changes stops things from crashing, as does just adding 1 to num_elements in the talloc_realloc call, e.g.:
i have had a bit of a poke around, but can't say i have been able to work out exactly why this is the case ...
i would like to have been able to give a better report of the exact cause of the problem, but have unfortunately run out of time to look at this for now.
at the moment, i can stick with ldb-1.1.20, but that's not really a long term solution. i did also do some quick testing with sssd-1.14.0, and the problem remains.
let me know if i can provide any more information.
thanks,
richard
Comments
Comment from lslebodn at 2016-08-02 21:24:24
Fields changed
description: testing sssd with overrides (users and groups) on centos 7.2 (sssd 1.13.0, libldb 1.1.25), i soon ran in to problems with sssd_nss crashing. dmesg shows:
sssd_nss[28935]: segfault at 51 ip 00007fa5e39d46af sp 00007ffcd6f18290 error 4 in libldb.so.1.1.25[7fa5e39c4000+2d000]
backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97
97 if (strdn && strdn->data
(gdb) bt
#0 0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97
#1 0x00007f4b5188109a in sysdb_add_group_member_overrides (domain=domain@entry=0x7f4b52296120, obj=0x7f4b522a3e40) at src/db/sysdb_views.c:1308
#2 0x00007f4b5187373c in sysdb_getgrgid_with_views (mem_ctx=mem_ctx@entry=0x7f4b52295ea0, domain=domain@entry=0x7f4b52296120, gid=65751, res=res@entry=0x7f4b522a3260) at src/db/sysdb_search.c:659
#3 0x00007f4b51ef292c in nss_cmd_getgrgid_search (dctx=dctx@entry=0x7f4b522a3240) at src/responder/nss/nsssrv_cmd.c:3349
#4 0x00007f4b51ef672d in nss_cmd_getbyid (cmd=, cctx=0x7f4b522a0900) at src/responder/nss/nsssrv_cmd.c:1975
#5 0x00007f4b51f01b2e in client_cmd_execute (sss_cmds=0x7f4b521182e0 <nss_cmds>, cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:249
#6 client_recv (cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:283
#7 client_fd_handler (ev=, fde=, flags=, ptr=) at src/responder/common/responder_common.c:335
#8 0x00007f4b4e15bd0b in epoll_event_loop_once () from /lib64/libtevent.so.0
#9 0x00007f4b4e15a1d7 in std_event_loop_once () from /lib64/libtevent.so.0
#10 0x00007f4b4e15636d in _tevent_loop_once () from /lib64/libtevent.so.0
#11 0x00007f4b4e15650b in tevent_common_loop_wait () from /lib64/libtevent.so.0
#12 0x00007f4b4e15a177 in std_event_loop_wait () from /lib64/libtevent.so.0
#13 0x00007f4b51891553 in server_loop (main_ctx=0x7f4b5228d2a0) at src/util/server.c:668
#14 0x00007f4b51eeaf77 in main (argc=8, argv=) at src/responder/nss/nsssrv.c:626
running sssd with -d 9, and having added ldb tracing to src/db/sysdb.c (line 59 in the unpatched source):
ret = ldb_connect(ldb, filename, LDB_FLG_ENABLE_TRACING, NULL);
things die just after retrieving the override information for a member of a group, e.g. (username/domain name removed):
[sssd[nss]] [ldb] (0x4000): ldb_trace_response: ENTRY
dn: overrideAnchorUUID=:LOCAL:name\3Dusername,cn\3Dusers,cn\3Ddomainname,cn\3Dsysdb,cn=LOCAL,cn=views,cn=sysdb
loginShell: /bin/tcsh
name: username
objectClass: userOverride
overrideObjectDN: name=username,cn=users,cn=domainname,cn=sysdb
uidNumber: 6272
[sssd[nss]] [ldb] (0x4000): Destroying timer event 0x7f4b522a3cc0 "ltdb_timeout"
[sssd[nss]] [ldb] (0x4000): Ending timer event 0x7f4b522ae710 "ltdb_callback"
[sssd[nss]] [sysdb_add_group_member_overrides] (0x4000): Added [username] to [overridememberUid].
it always seems to fail on the first or second member of the group, and it is always when the group being looked at has overrides (gid).
when dropping back to the previous ldb packages for centos 7.2 (ldb 1.1.20) everything seems to work just fine, so i looked at the differences, and it seems that this change, which was added in ldb 1.1.24 might be significant:
--- ldb-1.1.20/ldb_tdb/ldb_search.c 2014-09-16 19:04:31.000000000 +0100
+++ ldb-1.1.25/ldb_tdb/ldb_search.c 2015-12-10 11:01:40.000000000 +0000
@@ -407,10 +407,18 @@
}
reverting this changes stops things from crashing, as does just adding 1 to num_elements in the talloc_realloc call, e.g.:
--- ldb-1.1.25/ldb_tdb/ldb_search.c 2015-12-10 11:01:40.000000000 +0000
+++ ldb-1.1.25.test/ldb_tdb/ldb_search.c 2016-08-02 16:37:01.823488833 +0100
@@ -410,7 +410,7 @@
i have had a bit of a poke around, but can't say i have been able to work out exactly why this is the case ...
i would like to have been able to give a better report of the exact cause of the problem, but have unfortunately run out of time to look at this for now.
at the moment, i can stick with ldb-1.1.20, but that's not really a long term solution. i did also do some quick testing with sssd-1.14.0, and the problem remains.
let me know if i can provide any more information.
thanks,
richard => testing sssd with overrides (users and groups) on centos 7.2 (sssd 1.13.0, libldb 1.1.25), i soon ran in to problems with sssd_nss crashing. dmesg shows:
sssd_nss[28935]: segfault at 51 ip 00007fa5e39d46af sp 00007ffcd6f18290 error 4 in libldb.so.1.1.25[7fa5e39c4000+2d000]
backtrace:
{{{
Program received signal SIGSEGV, Segmentation fault.
0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97
97 if (strdn && strdn->data
(gdb) bt
#0 0x00007f4b514276af in ldb_dn_from_ldb_val (mem_ctx=mem_ctx@entry=0x7f4b52297300, ldb=0x7f4b5229dad0, strdn=0x25) at ../common/ldb_dn.c:97
#1 0x00007f4b5188109a in sysdb_add_group_member_overrides (domain=domain@entry=0x7f4b52296120, obj=0x7f4b522a3e40) at src/db/sysdb_views.c:1308
#2 0x00007f4b5187373c in sysdb_getgrgid_with_views (mem_ctx=mem_ctx@entry=0x7f4b52295ea0, domain=domain@entry=0x7f4b52296120, gid=65751, res=res@entry=0x7f4b522a3260) at src/db/sysdb_search.c:659
#3 0x00007f4b51ef292c in nss_cmd_getgrgid_search (dctx=dctx@entry=0x7f4b522a3240) at src/responder/nss/nsssrv_cmd.c:3349
#4 0x00007f4b51ef672d in nss_cmd_getbyid (cmd=, cctx=0x7f4b522a0900) at src/responder/nss/nsssrv_cmd.c:1975
#5 0x00007f4b51f01b2e in client_cmd_execute (sss_cmds=0x7f4b521182e0 <nss_cmds>, cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:249
#6 client_recv (cctx=0x7f4b522a0900) at src/responder/common/responder_common.c:283
#7 client_fd_handler (ev=, fde=, flags=, ptr=) at src/responder/common/responder_common.c:335
#8 0x00007f4b4e15bd0b in epoll_event_loop_once () from /lib64/libtevent.so.0
#9 0x00007f4b4e15a1d7 in std_event_loop_once () from /lib64/libtevent.so.0
#10 0x00007f4b4e15636d in _tevent_loop_once () from /lib64/libtevent.so.0
#11 0x00007f4b4e15650b in tevent_common_loop_wait () from /lib64/libtevent.so.0
#12 0x00007f4b4e15a177 in std_event_loop_wait () from /lib64/libtevent.so.0
#13 0x00007f4b51891553 in server_loop (main_ctx=0x7f4b5228d2a0) at src/util/server.c:668
#14 0x00007f4b51eeaf77 in main (argc=8, argv=) at src/responder/nss/nsssrv.c:626
}}}
running sssd with -d 9, and having added ldb tracing to src/db/sysdb.c (line 59 in the unpatched source):
ret = ldb_connect(ldb, filename, LDB_FLG_ENABLE_TRACING, NULL);
things die just after retrieving the override information for a member of a group, e.g. (username/domain name removed):
{{{
[sssd[nss]] [ldb] (0x4000): ldb_trace_response: ENTRY
dn: overrideAnchorUUID=:LOCAL:name\3Dusername,cn\3Dusers,cn\3Ddomainname,cn\3Dsysdb,cn=LOCAL,cn=views,cn=sysdb
loginShell: /bin/tcsh
name: username
objectClass: userOverride
overrideObjectDN: name=username,cn=users,cn=domainname,cn=sysdb
uidNumber: 6272
[sssd[nss]] [ldb] (0x4000): Destroying timer event 0x7f4b522a3cc0 "ltdb_timeout"
[sssd[nss]] [ldb] (0x4000): Ending timer event 0x7f4b522ae710 "ltdb_callback"
[sssd[nss]] [sysdb_add_group_member_overrides] (0x4000): Added [username] to [overridememberUid].
}}}
it always seems to fail on the first or second member of the group, and it is always when the group being looked at has overrides (gid).
when dropping back to the previous ldb packages for centos 7.2 (ldb 1.1.20) everything seems to work just fine, so i looked at the differences, and it seems that this change, which was added in ldb 1.1.24 might be significant:
{{{
--- ldb-1.1.20/ldb_tdb/ldb_search.c 2014-09-16 19:04:31.000000000 +0100
+++ ldb-1.1.25/ldb_tdb/ldb_search.c 2015-12-10 11:01:40.000000000 +0000
@@ -407,10 +407,18 @@
}
}}}
reverting this changes stops things from crashing, as does just adding 1 to num_elements in the talloc_realloc call, e.g.:
{{{
--- ldb-1.1.25/ldb_tdb/ldb_search.c 2015-12-10 11:01:40.000000000 +0000
+++ ldb-1.1.25.test/ldb_tdb/ldb_search.c 2016-08-02 16:37:01.823488833 +0100
@@ -410,7 +410,7 @@
}}}
i have had a bit of a poke around, but can't say i have been able to work out exactly why this is the case ...
i would like to have been able to give a better report of the exact cause of the problem, but have unfortunately run out of time to look at this for now.
at the moment, i can stick with ldb-1.1.20, but that's not really a long term solution. i did also do some quick testing with sssd-1.14.0, and the problem remains.
let me know if i can provide any more information.
thanks,
richard
Comment from lslebodn at 2016-08-02 21:31:28
I doubt that bug is in libldb.
There is much higher chance that bug is directly in sssd.
Could you try to reproduce with latest 1.13[1]? (1.14 might have some bugs. I would say there is a high change that bug is fixed in 1.13. And if it is not fixed then could you provide steps to reproduce crash?
[1] https://copr.fedorainfracloud.org/coprs/g/sssd/sssd-1-13/
cc: => lslebodn
Comment from rrigby at 2016-08-05 10:58:27
thanks for getting back to me. i'm afraid i haven't had too much time for further testing.
unfortunately, the problem still exists in current 1.13.
to reproduce, this seems to work for me ... override a group:
override a few users who are members of that group:
i'm not sure that it really matters which attributes are overwritten, this is just what i was testing when things started to fail.
then, to trigger the problem, something like:
repeat as necessary. for me the problems seem to start once there are at least 2 memberuid entries for the group in the cache ldb file.
where the segmentation fault occurs (around sysdb_views.c:1307 in upatched 1.13.4 source):
it seems to fail after a couple of iterations on the overridden group (e.g. c=2), at which point members has been overwritten / wiped out somewhere along the way.
that's about as far as i have managed to get just now ... if i have a bit of time over the weekend, i'll see if i can give it another look.
hope that's of some use, but let me know if you need any more details from me.
thanks,
richard
Comment from lslebodn at 2016-08-05 11:51:54
If you have a time then it will be good to try valgrind.
Add following line into nss section in sssd.conf
restart sssd (you might need to have SELinux in permissive mode) reproduce a crash and then provide valgrind log file.
Comment from rrigby at 2016-08-05 17:54:04
sssd_nss valgrind output
valgrind_nss_15269.log
Comment from rrigby at 2016-08-05 18:42:43
thanks for the suggestion - valgrind log attached, and i think i can now see what is going on.
members hangs off obj.
during the members loop mentioned previously, there is a call to ldb_msg_add_string (line 1445 in unpatched 1.13.4 source):
which eventually reaches _ldb_msg_add_el in ldb_msg.c, where this happens:
this reallocates obj, and invalidates members.
hope that's of some use, and thanks again for your help.
richard
Comment from lslebodn at 2016-08-05 21:12:07
testing patch
test.diff
Comment from lslebodn at 2016-08-05 21:14:32
You are right
Could you test attached untested patch?
https://fedorahosted.org/sssd/wiki/Contribute#BuildingSSSD
Comment from lslebodn at 2016-08-05 22:33:57
BTW, I tested the patch a it works for me.
Thank you very much for time and you analysis.
I really appreciate it.
I am changing the title little bit. because the same crash can happen libldb-1.1.20. It worked for you only by a chance with that version. It depends on glibc/kernel whether it will move memory or just extend current block.
summary: using overides causes segfault in libldb > 1.1.23 => using overides causes segfault in libldb
Comment from lslebodn at 2016-08-05 22:34:11
Fields changed
owner: somebody => lslebodn
status: new => assigned
Comment from rrigby at 2016-08-05 23:42:33
i applied the patch against 1.13.4, and ran the same tests which had reliably been causing the crashes, put it through a debugger, etc., added 'loads' of user and group overrides, set it off looking up information for 2000 or so users, and all seems to be working just fine - no issues or signs of any trouble at all.
so, looks good to me.
glad to have been of some use in tracking down the problem, and thanks for your help in working things out, and getting it fixed.
richard
Comment from lslebodn at 2016-08-06 10:29:58
Actually,
the crash is indirectly fixed in git master by commit 1594701
Comment from jhrozek at 2016-08-07 21:54:30
Should we still commit the patch to the stable branch, though?
Comment from rrigby at 2016-08-08 08:42:30
glad to see things are already fixed in the repository.
for me, the fix is really required in the el7 packages ... shall i report the issue in the red hat bugzilla?
thanks,
richard
Comment from jhrozek at 2016-08-08 10:25:58
The bug will be fixed in RHEL-7.3, so I guess you should be good :) (Testing of the 7.3 packages would be mostly welcome in the meantime!)
Comment from rrigby at 2016-08-08 18:31:06
that's great. thanks.
i'm quite happy to test the packages when available. i'm guessing that will be when 7.3 beta is released, unless i should also be looking somewhere else for the updated packages?
thanks again.
richard
Comment from jhrozek at 2016-08-17 15:54:05
Fields changed
milestone: NEEDS_TRIAGE => SSSD 1.13.5
Comment from jhrozek at 2016-08-17 16:06:33
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1367802
rhbz: => [https://bugzilla.redhat.com/show_bug.cgi?id=1367802 1367802]
Comment from sbose at 2016-10-24 17:01:31
I sent a backported version of the groupmembers override patch together with 2 fixes by Lukas to the list.
patch: 0 => 1
Comment from lslebodn at 2016-11-08 10:37:03
sssd-1-13:
resolution: => fixed
status: assigned => closed
Comment from rrigby at 2017-02-24 15:08:19
Metadata Update from @rrigby:
The text was updated successfully, but these errors were encountered: