You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411a93 in fo_set_port_status (server=0x50eef600,
status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;
(gdb) list
1327 /* It is possible to introduce duplicates when expanding SRV results
1328 * into fo_server structures. Find the duplicates and set the same
1329 * status */
1330 DLIST_FOR_EACH(siter, server->service->server_list) {
1331 if (siter == server) continue;
1332 if (!siter->common || !siter->common->name) continue;
1333
1334 if (siter->port == server->port &&
1335 (strcasecmp(siter->common->name, server->common->name) == 0)) {
1336 DEBUG(7, ("Marking port %d of duplicate server '%s' as '%s'\n",
(gdb) print siter->common
$1 = (struct server_common *) 0xa0
(gdb) print siter->common->name
Cannot access memory at address 0xc0
/var/log/secure:
Oct 7 04:26:01 blah crond[6022]: pam_sss(crond:account): Request to sssd failed. Timer expired
core file has been retained (but is large - 1.3Gbytes), and an sssd_default.log is availabled at log level 9.
component: SSSD => Data Provider
description: sssd is configured against Active Directory.
sssd_be crashed dumping core:
Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411a93 in fo_set_port_status (server=0x50eef600,
status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;
(gdb) list
1327 /* It is possible to introduce duplicates when expanding SRV results
1328 * into fo_server structures. Find the duplicates and set the same
1329 * status */
1330 DLIST_FOR_EACH(siter, server->service->server_list) {
1331 if (siter == server) continue;
1332 if (!siter->common || !siter->common->name) continue;
1333
1334 if (siter->port == server->port &&
1335 (strcasecmp(siter->common->name, server->common->name) == 0)) {
1336 DEBUG(7, ("Marking port %d of duplicate server '%s' as '%s'\n",
/var/log/secure:
Oct 7 04:26:01 blah crond[6022]: pam_sss(crond:account): Request to sssd failed. Timer expired
core file has been retained (but is large - 1.3Gbytes), and an sssd_default.log is availabled at log level 9. => sssd is configured against Active Directory.
sssd_be crashed dumping core:
{{{
Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411a93 in fo_set_port_status (server=0x50eef600,
status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;
(gdb) list
1327 /* It is possible to introduce duplicates when expanding SRV results
1328 * into fo_server structures. Find the duplicates and set the same
1329 * status */
1330 DLIST_FOR_EACH(siter, server->service->server_list) {
1331 if (siter == server) continue;
1332 if (!siter->common || !siter->common->name) continue;
1333
1334 if (siter->port == server->port &&
1335 (strcasecmp(siter->common->name, server->common->name) == 0)) {
1336 DEBUG(7, ("Marking port %d of duplicate server '%s' as '%s'\n",
(gdb) print siter->common
$1 = (struct server_common *) 0xa0
(gdb) print siter->common->name
Cannot access memory at address 0xc0
}}}
/var/log/secure:
{{{
Oct 7 04:26:01 blah crond[6022]: pam_sss(crond:account): Request to sssd failed. Timer expired
}}}
core file has been retained (but is large - 1.3Gbytes), and an sssd_default.log is availabled at log level 9.
priority: major => critical
Sorry for not asking this sooner, but do you still have SSSD logs from when the bug happened? It would be very beneficial to see what resolving SSSD performed etc.
Besides information jhrozek asked for earlier, I'd also greatly appreciate a reproducer, i.e. sanitized config file and steps you had to perform to induce this segfault. I'd like a core file of my own so I could inspect the code in detail.
There has been no activity for some time in this ticket. I'd like to ask you once more for the additional information we requested. If no more info is provided, I'll close the ticket as worksforme.
Sorry for not getting back to you, I'd not seen the movement on this ticket. I've still got the sssd logs and the core dumps, but not the matching build of 1.6.1 I had installed at the time, so I'm not sure the value of it. I've not got the matching /var/log/secure which makes lining up the timings of when things went wrong and matching that up with the 4.4.Gbyte sssd_default.log a little fun.
I upgraded to 1.6.3 and have not seen this problem again. I've left in place a script that monitors the logs for this failure, so should be able to catch it again if it happens in future. Before it was happening every week or two on a heavily loaded system, so it should crop up again soon enough if the problem's not fixed.
I have had crashes of sssd_be since, but they've all recovered gracefully.
Also, if you still have the core file, can you examine some data structures for me, please?
I would like to see the following from inside the fo_set_port_status() function:
{{{
print server->service->ctx
print *server->service->ctx
print *server->service->ctx->server_common_list
}}}
Thank you!
Actually, I seem to have a instance or three of this crash against 1.6.3 built straight from git, so maybe I shouldn't write off this bug yet. Log level is 0 unfortunately so I have nothing there.
Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411c03 in fo_set_port_status (server=0x21c5420, status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;
I don't have a reliable reproducer unfortunately and there's not an obvious pattern. The machine sits in service with a reasonable number of users coming in and out over ssh. Over the last month (a mix of the old 1.6.1 and the newer 1.6.3) it sssd_be has crashed 9 times. What log level would be useful?
I'm going to close this one, as the patch which probably fixes this has been pushed to master. Please feel free to reopen if the error persists on your system.
Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/1037
sssd is configured against Active Directory.
sssd_be crashed dumping core:
/var/log/secure:
core file has been retained (but is large - 1.3Gbytes), and an sssd_default.log is availabled at log level 9.
Comments
Comment from prefect at 2011-10-07 15:41:13
Easier to read log from gdb
gdb-log
Comment from sgallagh at 2011-10-07 15:44:03
Fields changed
component: SSSD => Data Provider
description: sssd is configured against Active Directory.
sssd_be crashed dumping core:
Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411a93 in fo_set_port_status (server=0x50eef600,
status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;
(gdb) list
1327 /* It is possible to introduce duplicates when expanding SRV results
1328 * into fo_server structures. Find the duplicates and set the same
1329 * status */
1330 DLIST_FOR_EACH(siter, server->service->server_list) {
1331 if (siter == server) continue;
1332 if (!siter->common || !siter->common->name) continue;
1333
1334 if (siter->port == server->port &&
1335 (strcasecmp(siter->common->name, server->common->name) == 0)) {
1336 DEBUG(7, ("Marking port %d of duplicate server '%s' as '%s'\n",
(gdb) print siter->common
$1 = (struct server_common *) 0xa0
(gdb) print siter->common->name
Cannot access memory at address 0xc0
/var/log/secure:
Oct 7 04:26:01 blah crond[6022]: pam_sss(crond:account): Request to sssd failed. Timer expired
core file has been retained (but is large - 1.3Gbytes), and an sssd_default.log is availabled at log level 9. => sssd is configured against Active Directory.
sssd_be crashed dumping core:
{{{
Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411a93 in fo_set_port_status (server=0x50eef600,
status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;
(gdb) list
1327 /* It is possible to introduce duplicates when expanding SRV results
1328 * into fo_server structures. Find the duplicates and set the same
1329 * status */
1330 DLIST_FOR_EACH(siter, server->service->server_list) {
1331 if (siter == server) continue;
1332 if (!siter->common || !siter->common->name) continue;
1333
1334 if (siter->port == server->port &&
1335 (strcasecmp(siter->common->name, server->common->name) == 0)) {
1336 DEBUG(7, ("Marking port %d of duplicate server '%s' as '%s'\n",
(gdb) print siter->common
$1 = (struct server_common *) 0xa0
(gdb) print siter->common->name
Cannot access memory at address 0xc0
}}}
/var/log/secure:
{{{
Oct 7 04:26:01 blah crond[6022]: pam_sss(crond:account): Request to sssd failed. Timer expired
}}}
core file has been retained (but is large - 1.3Gbytes), and an sssd_default.log is availabled at log level 9.
priority: major => critical
Comment from jhrozek at 2011-10-10 18:56:58
Sorry for not asking this sooner, but do you still have SSSD logs from when the bug happened? It would be very beneficial to see what resolving SSSD performed etc.
Comment from jhrozek at 2011-10-11 18:19:28
Also, if you still have the core file, can you examine some data structures for me, please?
I would like to see the following from inside the
fo_set_port_status()
function:Thank you!
Comment from dpal at 2011-10-13 14:48:18
Fields changed
milestone: NEEDS_TRIAGE => SSSD 1.7.0
priority: critical => blocker
Comment from dpal at 2011-10-13 14:48:42
Fields changed
owner: somebody => jzeleny
Comment from jzeleny at 2011-10-18 10:26:11
Besides information jhrozek asked for earlier, I'd also greatly appreciate a reproducer, i.e. sanitized config file and steps you had to perform to induce this segfault. I'd like a core file of my own so I could inspect the code in detail.
Thanks
Jan
Comment from jzeleny at 2011-11-21 08:24:57
There has been no activity for some time in this ticket. I'd like to ask you once more for the additional information we requested. If no more info is provided, I'll close the ticket as worksforme.
Comment from prefect at 2011-11-23 10:35:53
Replying to [comment:7 jzeleny]:
Sorry for not getting back to you, I'd not seen the movement on this ticket. I've still got the sssd logs and the core dumps, but not the matching build of 1.6.1 I had installed at the time, so I'm not sure the value of it. I've not got the matching /var/log/secure which makes lining up the timings of when things went wrong and matching that up with the 4.4.Gbyte sssd_default.log a little fun.
I upgraded to 1.6.3 and have not seen this problem again. I've left in place a script that monitors the logs for this failure, so should be able to catch it again if it happens in future. Before it was happening every week or two on a heavily loaded system, so it should crop up again soon enough if the problem's not fixed.
I have had crashes of sssd_be since, but they've all recovered gracefully.
jh
Comment from prefect at 2011-11-23 10:54:38
Replying to [comment:3 jhrozek]:
Actually, I seem to have a instance or three of this crash against 1.6.3 built straight from git, so maybe I shouldn't write off this bug yet. Log level is 0 unfortunately so I have nothing there.
Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411c03 in fo_set_port_status (server=0x21c5420, status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;
(gdb) print server->service->ctx
$1 = (struct fo_ctx *) 0x1ff4120
gdb) print *server->service->ctx
$2 = {service_list = 0x2018270, server_common_list = 0x21e3ec0, opts = 0x1ff75a0}
(gdb) print *server->service->ctx->server_common_list
$3 = {DO_NOT_TOUCH_THIS_MEMBER_refcount = 5, ctx = 0x1ff4120, prev = 0x0, next = 0x21e40f0,
name = 0x21e3f70 "az24.qa.fails.co.zn", rhostent = 0x2039c00, request_list = 0x0, server_status = 3,
last_status_change = {tv_sec = 1321099319, tv_usec = 18606}}
I'll bob the config on in a minute.
version: 1.6.1 => 1.6.3
Comment from prefect at 2011-11-23 11:01:46
I don't have a reliable reproducer unfortunately and there's not an obvious pattern. The machine sits in service with a reasonable number of users coming in and out over ssh. Over the last month (a mix of the old 1.6.1 and the newer 1.6.3) it sssd_be has crashed 9 times. What log level would be useful?
sssd.conf:
Comment from jzeleny at 2011-11-29 14:45:17
Fields changed
status: new => assigned
Comment from jzeleny at 2011-12-06 09:21:39
I'm going to close this one, as the patch which probably fixes this has been pushed to master. Please feel free to reopen if the error persists on your system.
Fixed in: d4d9091
resolution: => fixed
status: assigned => closed
Comment from sgallagh at 2012-01-30 22:10:06
Fields changed
rhbz: => 0
Comment from prefect at 2017-02-24 15:00:18
Metadata Update from @prefect:
The text was updated successfully, but these errors were encountered: