SIGSEGV in sssd_be #2079

sssd-bot · 2020-05-02T12:24:56Z

Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/1037

Created at 2011-10-07 15:32:42 by prefect
Closed as Fixed
Assigned to jzeleny

sssd is configured against Active Directory.

sssd_be crashed dumping core:

Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000411a93 in fo_set_port_status (server=0x50eef600, 
    status=PORT_WORKING) at src/providers/fail_over.c:1332
1332            if (!siter->common || !siter->common->name) continue;

(gdb) list
1327        /* It is possible to introduce duplicates when expanding SRV results
1328         * into fo_server structures. Find the duplicates and set the same
1329         * status */
1330        DLIST_FOR_EACH(siter, server->service->server_list) {
1331            if (siter == server) continue;
1332            if (!siter->common || !siter->common->name) continue;
1333
1334            if (siter->port == server->port &&
1335                (strcasecmp(siter->common->name, server->common->name) == 0)) {
1336                DEBUG(7, ("Marking port %d of duplicate server '%s' as '%s'\n",

(gdb) print siter->common
$1 = (struct server_common *) 0xa0
(gdb) print siter->common->name
Cannot access memory at address 0xc0

/var/log/secure:

Oct  7 04:26:01 blah crond[6022]: pam_sss(crond:account): Request to sssd failed. Timer expired

core file has been retained (but is large - 1.3Gbytes), and an sssd_default.log is availabled at log level 9.

Comments

Comment from prefect at 2011-10-07 15:41:13

Easier to read log from gdb
gdb-log

Comment from sgallagh at 2011-10-07 15:44:03

Fields changed

component: SSSD => Data Provider
description: sssd is configured against Active Directory.

sssd_be crashed dumping core:

Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411a93 in fo_set_port_status (server=0x50eef600,
status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;

(gdb) list
1327 /* It is possible to introduce duplicates when expanding SRV results
1328 * into fo_server structures. Find the duplicates and set the same
1329 * status */
1330 DLIST_FOR_EACH(siter, server->service->server_list) {
1331 if (siter == server) continue;
1332 if (!siter->common || !siter->common->name) continue;
1333
1334 if (siter->port == server->port &&
1335 (strcasecmp(siter->common->name, server->common->name) == 0)) {
1336 DEBUG(7, ("Marking port %d of duplicate server '%s' as '%s'\n",

(gdb) print siter->common
$1 = (struct server_common *) 0xa0
(gdb) print siter->common->name
Cannot access memory at address 0xc0

/var/log/secure:
Oct 7 04:26:01 blah crond[6022]: pam_sss(crond:account): Request to sssd failed. Timer expired

core file has been retained (but is large - 1.3Gbytes), and an sssd_default.log is availabled at log level 9. => sssd is configured against Active Directory.

sssd_be crashed dumping core:
{{{
Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411a93 in fo_set_port_status (server=0x50eef600,
status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;

(gdb) list
1327 /* It is possible to introduce duplicates when expanding SRV results
1328 * into fo_server structures. Find the duplicates and set the same
1329 * status */
1330 DLIST_FOR_EACH(siter, server->service->server_list) {
1331 if (siter == server) continue;
1332 if (!siter->common || !siter->common->name) continue;
1333
1334 if (siter->port == server->port &&
1335 (strcasecmp(siter->common->name, server->common->name) == 0)) {
1336 DEBUG(7, ("Marking port %d of duplicate server '%s' as '%s'\n",

(gdb) print siter->common
$1 = (struct server_common *) 0xa0
(gdb) print siter->common->name
Cannot access memory at address 0xc0
}}}
/var/log/secure:
{{{
Oct 7 04:26:01 blah crond[6022]: pam_sss(crond:account): Request to sssd failed. Timer expired
}}}
core file has been retained (but is large - 1.3Gbytes), and an sssd_default.log is availabled at log level 9.
priority: major => critical

Comment from jhrozek at 2011-10-10 18:56:58

Sorry for not asking this sooner, but do you still have SSSD logs from when the bug happened? It would be very beneficial to see what resolving SSSD performed etc.

Comment from jhrozek at 2011-10-11 18:19:28

Also, if you still have the core file, can you examine some data structures for me, please?

I would like to see the following from inside the fo_set_port_status() function:

print server->service->ctx
print *server->service->ctx
print *server->service->ctx->server_common_list

Thank you!

Comment from dpal at 2011-10-13 14:48:18

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.7.0
priority: critical => blocker

Comment from dpal at 2011-10-13 14:48:42

Fields changed

owner: somebody => jzeleny

Comment from jzeleny at 2011-10-18 10:26:11

Besides information jhrozek asked for earlier, I'd also greatly appreciate a reproducer, i.e. sanitized config file and steps you had to perform to induce this segfault. I'd like a core file of my own so I could inspect the code in detail.

Thanks
Jan

Comment from jzeleny at 2011-11-21 08:24:57

There has been no activity for some time in this ticket. I'd like to ask you once more for the additional information we requested. If no more info is provided, I'll close the ticket as worksforme.

Comment from prefect at 2011-11-23 10:35:53

Replying to [comment:7 jzeleny]:

Sorry for not getting back to you, I'd not seen the movement on this ticket. I've still got the sssd logs and the core dumps, but not the matching build of 1.6.1 I had installed at the time, so I'm not sure the value of it. I've not got the matching /var/log/secure which makes lining up the timings of when things went wrong and matching that up with the 4.4.Gbyte sssd_default.log a little fun.

I upgraded to 1.6.3 and have not seen this problem again. I've left in place a script that monitors the logs for this failure, so should be able to catch it again if it happens in future. Before it was happening every week or two on a heavily loaded system, so it should crop up again soon enough if the problem's not fixed.

I have had crashes of sssd_be since, but they've all recovered gracefully.

jh

Comment from prefect at 2011-11-23 10:54:38

Replying to [comment:3 jhrozek]:

Also, if you still have the core file, can you examine some data structures for me, please?

I would like to see the following from inside the fo_set_port_status() function:
{{{
print server->service->ctx
print *server->service->ctx
print *server->service->ctx->server_common_list
}}}

Thank you!

Actually, I seem to have a instance or three of this crash against 1.6.3 built straight from git, so maybe I shouldn't write off this bug yet. Log level is 0 unfortunately so I have nothing there.

Core was generated by `/usr/libexec/sssd/sssd_be --domain default --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000411c03 in fo_set_port_status (server=0x21c5420, status=PORT_WORKING) at src/providers/fail_over.c:1332
1332 if (!siter->common || !siter->common->name) continue;

(gdb) print server->service->ctx
$1 = (struct fo_ctx *) 0x1ff4120

gdb) print *server->service->ctx
$2 = {service_list = 0x2018270, server_common_list = 0x21e3ec0, opts = 0x1ff75a0}

(gdb) print *server->service->ctx->server_common_list
$3 = {DO_NOT_TOUCH_THIS_MEMBER_refcount = 5, ctx = 0x1ff4120, prev = 0x0, next = 0x21e40f0,
name = 0x21e3f70 "az24.qa.fails.co.zn", rhostent = 0x2039c00, request_list = 0x0, server_status = 3,
last_status_change = {tv_sec = 1321099319, tv_usec = 18606}}

I'll bob the config on in a minute.

version: 1.6.1 => 1.6.3

Comment from prefect at 2011-11-23 11:01:46

I don't have a reliable reproducer unfortunately and there's not an obvious pattern. The machine sits in service with a reasonable number of users coming in and out over ssh. Over the last month (a mix of the old 1.6.1 and the newer 1.6.3) it sssd_be has crashed 9 times. What log level would be useful?

sssd.conf:

[sssd]
config_file_version = 2
reconnection_retries = 3
sbus_timeout = 30
services = nss, pam
domains = default

[nss]
filter_groups = root
filter_users = root
reconnection_retries = 3

[pam]
reconnection_retries = 3

[domain/default]
lookup_family_order=ipv4_only
auth_provider = krb5
cache_credentials = false
krb5_realm = EXAMPLE.COM
chpass_provider = krb5
id_provider = ldap
dns_discovery_domain = EXAMPLE.COM
krb5_validate = true
krb5_renew_interval = 300
min_id = 100
access_provider = simple
simple_allow_groups = a_group
enumerate = false

ldap_force_upper_case_realm = True
ldap_schema = rfc2307bis
ldap_referrals = false
ldap_search_base = dc=example,dc=com
ldap_sasl_mech = gssapi
ldap_pwd_policy = none

ldap_user_object_class = user
ldap_user_name = sAMAccountName
ldap_user_uid_number = msSFU30UidNumber
ldap_user_gid_number = primaryGroupID
ldap_user_gecos = displayName
ldap_user_home_directory = msSFU30HomeDirectory
ldap_user_shell = msSFU30LoginShell
ldap_user_principal = userPrincipalName

ldap_group_object_class = group
ldap_group_name = cn
ldap_group_gid_number = msSFU30GidNumber
ldap_group_search_base = ou=blah,ou=blah,dc=example,dc=com

Comment from jzeleny at 2011-11-29 14:45:17

Fields changed

status: new => assigned

Comment from jzeleny at 2011-12-06 09:21:39

I'm going to close this one, as the patch which probably fixes this has been pushed to master. Please feel free to reopen if the error persists on your system.

Fixed in: d4d9091

resolution: => fixed
status: assigned => closed

Comment from sgallagh at 2012-01-30 22:10:06

Fields changed

rhbz: => 0

Comment from prefect at 2017-02-24 15:00:18

Metadata Update from @prefect:

Issue assigned to jzeleny
Issue set to the milestone: SSSD 1.7.0

The text was updated successfully, but these errors were encountered:

sssd-bot added the Closed: Fixed Issue was closed as fixed. label May 2, 2020

sssd-bot closed this as completed May 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGSEGV in sssd_be #2079

SIGSEGV in sssd_be #2079

sssd-bot commented May 2, 2020