Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDAP/AD: resolve domain local groups for remote users #57

Closed
wants to merge 3 commits into from

Conversation

sumit-bose
Copy link
Contributor

This set of patches should solve https://fedorahosted.org/sssd/ticket/3206. To
test this you have to join SSSD into an AD forest with at least two domains.
Assuming the domain SSSD is joined to is called A and there is another domain
in the forest called B you have to create a domain local group in domain A,
e.g. A\dom_loc. Then create e.g. a universal group in domain B, e.g. B\uni and
make B\uni a member of A\dom_loc. Any user from domain B which is a member of
B\uni should now be a member of A\dom_loc as well because SSSD is joined to
domain A and any domain local group from domain A is valid in this case.

If SSSD is joined to domain B A\dom_loc is not a valid group in this domain and
any user which is a member of B\uni should not be a member of A\dom_loc.

The lookup to the domain local groups is added as a second step after the
initial initgroups lookup and should work with and without the tokenGroups
request.

There is one issue I currently do not have a solution for. It is possible to
add the user from domain B to A\dom_loc directly. It this user is an indirect
member of A\dom_loc as well, e.g. via B\uni, and this direct membership is
removed the initgroups request will not return A\dom_loc. This is due to the
nature of SSSD's cache. The memberOf attribute will be removed from the user
entry during the initial lookup because this lookup is not aware of the domain
local groups and will remove the missing direct parent. During the domain local
lookup no changes are detected, because the direct membership is already
removed and there are no changes in the objects related to the indirect
membership. Even adding all nested memberships again won't help because the
sysdb code will detect that the membership is already present and skip it. And
unconditionally removing and adding all nested membership again sounds too
costly to me for this special case especially since the plan to use the PAC to
determine the memberships in the next release even for user which are not
already logged in.

@jhrozek jhrozek self-assigned this Oct 21, 2016
@jhrozek
Copy link
Contributor

jhrozek commented Oct 21, 2016

Hmm, looks like github ate my mail, so let's paste the comment again (and sorry if it arrives twice). Coverity detected some warnings:

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1554:12: warning: unused variable 'd' [-Wunused-variable]
#     size_t d;
#            ^
# 1552|       int ret;
# 1553|       size_t c;
# 1554|->     size_t d;
# 1555|       char **groupnamelist = NULL;
# 1556|       struct sysdb_attrs *groups[1];

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1562:25: warning: unused variable 'msg' [-Wunused-variable]
#     struct ldb_message *msg;
#                         ^
# 1560|       const char *class;
# 1561|       struct sss_domain_info *obj_dom;
# 1562|->     struct ldb_message *msg;
# 1563|       struct ldb_message_element *el;
# 1564|       const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF, NULL};

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1563:33: warning: unused variable 'el' [-Wunused-variable]
#     struct ldb_message_element *el;
#                                 ^
# 1561|       struct sss_domain_info *obj_dom;
# 1562|       struct ldb_message *msg;
# 1563|->     struct ldb_message_element *el;
# 1564|       const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF, NULL};
# 1565|       char *local_groups_base_dn;

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1564:17: warning: unused variable 'obj_attrs' [-Wunused-variable]
#     const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF, NULL};
#                 ^
# 1562|       struct ldb_message *msg;
# 1563|       struct ldb_message_element *el;
# 1564|->     const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF, NULL};
# 1565|       char *local_groups_base_dn;
# 1566|       uint8_t *obj_base_dn;

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1566:14: warning: unused variable 'obj_base_dn' [-Wunused-variable]
#     uint8_t *obj_base_dn;
#              ^
# 1564|       const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF, NULL};
# 1565|       char *local_groups_base_dn;
# 1566|->     uint8_t *obj_base_dn;
# 1567|       char **cached_local_parents = NULL;
# 1568|       uint8_t *name_start;

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c: scope_hint: In function 'sdap_ad_get_domain_local_groups_parse_parents'
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1568:14: warning: unused variable 'name_start' [-Wunused-variable]
#     uint8_t *name_start;
#              ^
# 1566|       uint8_t *obj_base_dn;
# 1567|       char **cached_local_parents = NULL;
# 1568|->     uint8_t *name_start;
# 1569|       char **add_list = NULL;
# 1570|       char **del_list = NULL;

if (primary_fqdn == NULL) {
ret = ENOMEM;
goto done;
if (strchr(shortname, '@') == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this check is safe for the future. We already have a request to support group names with @ in them, so I think this check could cause some false positives in the future. I wonder if it was safer to always parse the name and also pass in the "expected" domain and compare if the parsed and expected domain match. I think if the function only performed a check and didn't return anything except a true/false boolean, we could even avoid memory allocation, so the performance impact would be minimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this patch and modified the caller rfc2307bis_nested_groups_step() to check if sdap_get_group_primary_name() must be called or not.

sdap_id_op_handle(state->op),
state->search_bases,
state->groups, state->num_groups,
state->group_hash, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not related to this patch per se, but I was recently looking at performance issues where the user was a member of a huge number of groups (thousands) and just issuing the requests one-by-one took a long time there. I was wondering what do you think in general about issuing a number of parallel requests up to a certain point (maybe up to 5 or 10 parallel lookups at the time..)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general yes, but I think we should do it on a case-by-case basis and we have to take care not to overrun the server-side because I think there are limits how many parallel requests can be send from a single connection. An alternative to parallel request might be to make the search filter longer so that the server can return multiple results in a single request.

That said I think rfc2307bis_nested_groups_send() might not be the most important one to optimize because this path is not used with the default tokenGroups based lookup and with the current patch only by the AD provider for user from different domains. And since we try to add the 24U2Self/PAC based lookups for the next release this might be gone soon in the default setup as well.

But in the request where we lookup groups based on SIDs we got from the PAC or the tokenGroups request this kind of improvement would make sense.

goto done;
}

if (strchr(sysdb_name, '@') == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about detecting a qualified name as I had before applies also here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the check and added a comment that at this point we process data from LDAP and we unconditionally can expand the name to an internal fully-qualified name.

}

if (add_list == NULL && del_list == NULL) {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this optimization even a bit more aggressive by also checking if either list only contains a NULL. In one of my basic tests (with no domain-local groups so far..) I'm seeing:

(gdb) p del_list
$4 = (char **) 0x14421e0
(gdb) p del_list[0]
$5 = 0x0

It's not a big deal because sysdb_update_members would then just loop through the list, but at least we would avoid the (now potentially confusing) debug message later and avoid starting and then commiting an empty transaction in sysdb_update_members.

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far more cases need to be tested, both manually and using tests, but maybe we can at least discuss the code for now

@lslebodn
Copy link
Contributor

retest this please

Currently sysdb_get_direct_parents() only return direct parents from the
same domain as the child object. In setups with sub-domains this might
not be sufficient. A new option parent_dom is added which allows to
specify a domain the direct parents should be lookup up in. If it is
NULL the whole cache is searched.
sdap_nested_groups_store() and rfc2307bis_nested_groups_send/recv() will
be reused for domain local group lookups.
If a user from a trusted domain in the same forest is a direct or
indirect member of domain local groups from the local domain those
memberships must be resolved as well. Since those domain local groups
are not valid in the trusted domain a DC from the trusted domain which
is used to lookup the user data is not aware of them. As a consequence
those memberships must be resolved against a local DC in a second step.

Resolves https://fedorahosted.org/sssd/ticket/3206
@jhrozek
Copy link
Contributor

jhrozek commented Oct 25, 2016

Thanks for the changes, at least with the previous version all the tests with the domain-local groups were working for me and the internal ad_forest test didn't catch any new regressions (some tests are failing, but those are failing even with the old version). I haven't had the chance to run the IPA-AD tests yet.

@lslebodn
Copy link
Contributor

On (21/10/16 01:22), Jakub Hrozek wrote:

Hmm, looks like github ate my mail, so let's paste the comment again (and sorry if it arrives twice). Coverity detected some warnings:
It has nothing to do with coverity.
All warnings are reported by gcc.

covscan != coverity

LS

@sumit-bose
Copy link
Contributor Author

On Fri, Oct 21, 2016 at 01:22:08AM -0700, Jakub Hrozek wrote:

Hmm, looks like github ate my mail, so let's paste the comment again (and sorry if it arrives twice). Coverity detected some warnings:

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1554:12: warning: unused variable 'd' [-Wunused-variable]
#     size_t d;
#            ^
....

hm, I wonder why gcc 4.9.2 didn't show those warnings for
sdap_async_initgroups_ad.c but shows it for other files?

Nevertheless, I can see the warnings with newer versions of gcc and
updated the patch.

bye,
Sumit

@jhrozek
Copy link
Contributor

jhrozek commented Oct 25, 2016

On Wed, Oct 19, 2016 at 02:45:02AM -0700, sumit-bose wrote:

This set of patches should solve https://fedorahosted.org/sssd/ticket/3206. To
test this you have to join SSSD into an AD forest with at least two domains.
Assuming the domain SSSD is joined to is called A and there is another domain
in the forest called B you have to create a domain local group in domain A,
e.g. A\dom_loc. Then create e.g. a universal group in domain B, e.g. B\uni and
make B\uni a member of A\dom_loc. Any user from domain B which is a member of
B\uni should now be a member of A\dom_loc as well because SSSD is joined to
domain A and any domain local group from domain A is valid in this case.

If SSSD is joined to domain B A\dom_loc is not a valid group in this domain and
any user which is a member of B\uni should not be a member of A\dom_loc.

The lookup to the domain local groups is added as a second step after the
initial initgroups lookup and should work with and without the tokenGroups
request.

There is one issue I currently do not have a solution for. It is possible to
add the user from domain B to A\dom_loc directly. It this user is an indirect
member of A\dom_loc as well, e.g. via B\uni, and this direct membership is
removed the initgroups request will not return A\dom_loc. This is due to the
nature of SSSD's cache. The memberOf attribute will be removed from the user
entry during the initial lookup because this lookup is not aware of the domain
local groups and will remove the missing direct parent. During the domain local
lookup no changes are detected, because the direct membership is already
removed and there are no changes in the objects related to the indirect
membership. Even adding all nested memberships again won't help because the
sysdb code will detect that the membership is already present and skip it. And
unconditionally removing and adding all nested membership again sounds too
costly to me for this special case especially since the plan to use the PAC to
determine the memberships in the next release even for user which are not
already logged in.
You can view, comment on, or merge this pull request online at:

#57

-- Commit Summary --

  • sysdb: add parent_dom to sysdb_get_direct_parents()
  • sdap: sdap_get_primary_fqdn() check if name is already fully-qualified
  • sdap: make some nested group related calls public
  • LDAP/AD: resolve domain local groups for remote users

-- File Changes --

M src/db/sysdb.h (22)
M src/db/sysdb_search.c (7)
M src/providers/ldap/sdap.c (12)
M src/providers/ldap/sdap_async_initgroups.c (162)
M src/providers/ldap/sdap_async_initgroups_ad.c (409)
M src/providers/ldap/sdap_async_private.h (26)

-- Patch Links --

https://github.com/SSSD/sssd/pull/57.patch
https://github.com/SSSD/sssd/pull/57.diff

As first part of the review, I ran Coverity and it found some warnings:

rror: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1554:12:
warning: unused variable 'd' [-Wunused-variable]
#     size_t d;
#            ^
# 1552|       int ret;
# 1553|       size_t c;
# 1554|->     size_t d;
# 1555|       char **groupnamelist = NULL;
# 1556|       struct sysdb_attrs *groups[1];

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1562:25:
warning: unused variable 'msg' [-Wunused-variable]
#     struct ldb_message *msg;
#                         ^
# 1560|       const char *class;
# 1561|       struct sss_domain_info *obj_dom;
# 1562|->     struct ldb_message *msg;
# 1563|       struct ldb_message_element *el;
# 1564|       const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF,
# NULL};

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1563:33:
warning: unused variable 'el' [-Wunused-variable]
#     struct ldb_message_element *el;
#                                 ^
# 1561|       struct sss_domain_info *obj_dom;
# 1562|       struct ldb_message *msg;
# 1563|->     struct ldb_message_element *el;
# 1564|       const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF,
# NULL};
# 1565|       char *local_groups_base_dn;

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1564:17:
warning: unused variable 'obj_attrs' [-Wunused-variable]
#     const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF, NULL};
#                 ^
# 1562|       struct ldb_message *msg;
# 1563|       struct ldb_message_element *el;
# 1564|->     const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF,
# NULL};
# 1565|       char *local_groups_base_dn;
# 1566|       uint8_t *obj_base_dn;

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1566:14:
warning: unused variable 'obj_base_dn' [-Wunused-variable]
#     uint8_t *obj_base_dn;
#              ^
# 1564|       const char *obj_attrs[] = {SYSDB_NAME, SYSDB_MEMBEROF,
# NULL};
# 1565|       char *local_groups_base_dn;
# 1566|->     uint8_t *obj_base_dn;
# 1567|       char **cached_local_parents = NULL;
# 1568|       uint8_t *name_start;

Error: COMPILER_WARNING:
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c: scope_hint:
In function 'sdap_ad_get_domain_local_groups_parse_parents'
sssd-1.14.90/src/providers/ldap/sdap_async_initgroups_ad.c:1568:14:
warning: unused variable 'name_start' [-Wunused-variable]
#     uint8_t *name_start;
#              ^
# 1566|       uint8_t *obj_base_dn;
# 1567|       char **cached_local_parents = NULL;
# 1568|->     uint8_t *name_start;
# 1569|       char **add_list = NULL;
# 1570|       char **del_list = NULL;

Next, I'm running downstream tests to see if any functionality
regressed..(so I haven't started looking at the code yet..)

@jhrozek
Copy link
Contributor

jhrozek commented Oct 27, 2016

Thank you, this version reads better to me and still works fine (adding and removing the global group as well..)

So far ACK, I will push the patch once I'm able to re-run the downstream tests with this version.

@jhrozek
Copy link
Contributor

jhrozek commented Oct 28, 2016

CI: http://sssd-ci.duckdns.org/logs/job/56/05/summary.html

Coverity also didn't find any errors. For some reason, re-running just the tests that failed previously made them pass, but they fail when I run the whole test suite. I will take another look at the logs, but I don't think that a failure in test that tries the ad_access_filter option can be relevant to this patch.

So final ACK

@jhrozek
Copy link
Contributor

jhrozek commented Oct 31, 2016

@jhrozek
Copy link
Contributor

jhrozek commented Oct 31, 2016

sssd-1-4: ec1829d..c1f3b29

@jhrozek
Copy link
Contributor

jhrozek commented Oct 31, 2016

Umm sorry, the ranges are not so nice, let's list the commits

@jhrozek jhrozek added Pushed and removed Accepted labels Oct 31, 2016
@jhrozek jhrozek closed this Oct 31, 2016
fidencio added a commit to fidencio/sssd that referenced this pull request Dec 12, 2016
While debugging rhbz#1396912 some deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

The removal of those functions we lose some debug messages, which is the
least of problems now. Also instead of re-setting the watchdog timeout
in case of time-shift (please, see b8ceaeb for more info), let's just
re-set the timer's time.

Along with this commit, the function to teardown the watchdog has been
removed as it's not used anymore.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Dec 13, 2016
While debugging rhbz#1396912 some deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

The removal of those functions we lose some debug messages, which is the
least of problems now. Also instead of re-setting the watchdog timeout
in case of time-shift (please, see b8ceaeb for more info), let's just
exit the let the monitor restart the process.

A proper fix for the clock screw situation should be implemented on
samba's side, by having tevent using monotonic (or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Dec 20, 2016
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and a single byte is written to this pipe (which
is an async-signal-safe operation) and the logic currently done by the
timer handler will be done inside the fd handler in a safe way.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Dec 21, 2016
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by _exit(). Personally I don't
think is worth the trouble to try to log those messages properly in this
specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Jan 3, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by _exit(). Personally I don't
think is worth the trouble to try to log those messages properly in this
specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Jan 3, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by _exit(). Personally I don't
think is worth the trouble to try to log those messages properly in this
specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Jan 3, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by kill(-getpgrp(), SIGTERM).
Personally I don't think is worth the trouble to try to log those messages
properly in this specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Jan 5, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by _exit(1). Personally I don't
think is worth the trouble to try to log those messages properly in
this specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Jan 9, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by kill(-getpgrp(), SIGTERM).
Personally I don't think is worth the trouble to try to log those messages
properly in this specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Jan 9, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by kill(-getpgrp(), SIGTERM).
Personally I don't think is worth the trouble to try to log those messages
properly in this specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fidencio added a commit to fidencio/sssd that referenced this pull request Jan 10, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by kill(-getpgrp(), SIGTERM) (or
_exit(1) considering the cases where setting up the process group during
the server_setup() has failed).
Personally I don't think is worth the trouble to try to log those messages
properly in this specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    SSSD#1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    SSSD#2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    SSSD#3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    SSSD#4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    SSSD#5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    SSSD#6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
jhrozek pushed a commit that referenced this pull request Jan 25, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by kill(-getpgrp(), SIGTERM) (or
_exit(1) considering the cases where setting up the process group during
the server_setup() has failed).
Personally I don't think is worth the trouble to try to log those messages
properly in this specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    #1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    #2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    #3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    #4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    #5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    #6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    #7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    #8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    #9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    #10 <signal handler called>
    #11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    #12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    #13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    #14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    #15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    #16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    #17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    #18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    #19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    #20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    #21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    #22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    #23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    #24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    #25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    #26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    #27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    #28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    #29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    #30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    #31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    #32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    #33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    #34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    #35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    #36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    #37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    #38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    #39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    #40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    #41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    #42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    #43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    #44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    #45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    #46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    #47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    #48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    #49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    #50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    #51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    #52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    #53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    #54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    #55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    #56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    #57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    #58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    #59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    #60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Pavel Březina <pbrezina@redhat.com>
Reviewed-by: Jakub Hrozek <jhrozek@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
jhrozek pushed a commit that referenced this pull request Jan 25, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by kill(-getpgrp(), SIGTERM) (or
_exit(1) considering the cases where setting up the process group during
the server_setup() has failed).
Personally I don't think is worth the trouble to try to log those messages
properly in this specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    #1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    #2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    #3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    #4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    #5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    #6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    #7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    #8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    #9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    #10 <signal handler called>
    #11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    #12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    #13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    #14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    #15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    #16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    #17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    #18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    #19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    #20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    #21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    #22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    #23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    #24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    #25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    #26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    #27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    #28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    #29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    #30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    #31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    #32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    #33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    #34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    #35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    #36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    #37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    #38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    #39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    #40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    #41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    #42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    #43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    #44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    #45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    #46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    #47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    #48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    #49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    #50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    #51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    #52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    #53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    #54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    #55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    #56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    #57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    #58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    #59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    #60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Pavel Březina <pbrezina@redhat.com>
Reviewed-by: Jakub Hrozek <jhrozek@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
(cherry picked from commit e6a5f8c)
jhrozek pushed a commit to jhrozek/sssd that referenced this pull request Jul 20, 2017
While debugging rhbz#1396912 a deadlock on sssd_be was noticed[0] and
it's been caused by the use of non async-signal-safe functions from the
signal_handler (please, see man 7 signal for more info about which are
the async-signal-safe functions that can be used).

In order to work this situation around a pipe has been added to the
watchdog_ctx structure and, in case of clock screw,  a single byte is
written to this pipe (which is an async-signal-safe operation) and the
logic currently done by the timer handler to reset the watchdog will be
done inside the fd handler in a safe way.

With this patch we ended up losing some debug messages as
orderly_shutdown() has been replaced by kill(-getpgrp(), SIGTERM) (or
_exit(1) considering the cases where setting up the process group during
the server_setup() has failed).
Personally I don't think is worth the trouble to try to log those messages
properly in this specific case.

It's really worth to mention that a proper fix the clock screw situation
should be implemented on samba's side, by having tevent using monotonic
(or boottime) clock.

[0]:
  [root@dusan ~]# pstack 17922
    #0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    #1  0x00007fe707d04f93 in _L_lock_14932 () from /lib64/libc.so.6
    #2  0x00007fe707d02013 in __GI___libc_malloc (bytes=140630248638304, bytes@entry=15) at malloc.c:2891
    #3  0x00007fe707d0888a in __GI___strdup (s=0x7fe707dff4f7 "/etc/localtime") at strdup.c:42
    #4  0x00007fe707d31b61 in tzset_internal (always=<optimized out>, explicit=explicit@entry=1) at tzset.c:438
    #5  0x00007fe707d32523 in __tz_convert (timer=timer@entry=0x7ffcd5d2b090, use_localtime=use_localtime@entry=1, tp=tp@entry=0x7fe708041d40 <_tmbuf>) at tzset.c:621
    #6  0x00007fe707d30521 in __GI_localtime (t=t@entry=0x7ffcd5d2b090) at localtime.c:42
    SSSD#7  0x00007fe70886c7b0 in sss_vdebug_fn (file=<optimized out>, line=<optimized out>, function=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=16, flags=flags@entry=0, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n", ap=ap@entry=0x7ffcd5d2b130) at src/util/debug.c:248
    SSSD#8  0x00007fe70886c995 in sss_debug_fn (file=file@entry=0x7fe70bff263b "src/util/util_watchdog.c", line=line@entry=82, function=function@entry=0x7fe70bff27f0 <__FUNCTION__.9379> "watchdog_handler", level=level@entry=16, format=format@entry=0x7fe70bff2760 "Watchdog timer overflow, killing process!\n") at src/util/debug.c:284
    SSSD#9  0x00007fe70bfdb409 in watchdog_handler (sig=<optimized out>) at src/util/util_watchdog.c:81
    SSSD#10 <signal handler called>
    SSSD#11 0x00007fe707cff664 in _int_malloc (av=av@entry=0x7fe70803c760 <main_arena>, bytes=bytes@entry=151) at malloc.c:3494
    SSSD#12 0x00007fe707d01fbc in __GI___libc_malloc (bytes=bytes@entry=151) at malloc.c:2893
    SSSD#13 0x00007fe708450749 in __talloc_with_prefix (prefix_len=0, size=55, context=0x7fe718373210) at ../talloc.c:668
    SSSD#14 __talloc (size=55, context=0x7fe718373210) at ../talloc.c:708
    SSSD#15 _talloc_named_const (name=0x7fe70bb7015d "../common/ldb_pack.c:425", size=55, context=0x7fe718373210) at ../talloc.c:865
    SSSD#16 talloc_named_const (context=<optimized out>, size=size@entry=55, name=name@entry=0x7fe70bb7015d "../common/ldb_pack.c:425") at ../talloc.c:1606
    SSSD#17 0x00007fe70bb61803 in ldb_unpack_data_only_attr_list (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=0x7fe7184aa1e0, list=list@entry=0x0, list_size=list_size@entry=0, nb_elements_in_db=nb_elements_in_db@entry=0x0) at ../common/ldb_pack.c:425
    SSSD#18 0x00007fe70bb61a7d in ldb_unpack_data (ldb=ldb@entry=0x7fe70e4d52c0, data=data@entry=0x7ffcd5d2b990, message=<optimized out>) at ../common/ldb_pack.c:470
    SSSD#19 0x00007fe6fdc29b46 in ltdb_parse_data_unpack (key=..., data=..., private_data=0x7ffcd5d2ba70) at ../ldb_tdb/ldb_search.c:249
    SSSD#20 0x00007fe70a5e0a24 in tdb_parse_data (tdb=tdb@entry=0x7fe70e4eaa10, key=..., offset=15619748, len=414772, parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/io.c:637
    SSSD#21 0x00007fe70a5dc1fc in tdb_parse_record (tdb=0x7fe70e4eaa10, key=..., parser=parser@entry=0x7fe6fdc29b10 <ltdb_parse_data_unpack>, private_data=private_data@entry=0x7ffcd5d2ba70) at ../common/tdb.c:253
    SSSD#22 0x00007fe6fdc29e7b in ltdb_search_dn1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, msg=msg@entry=0x7fe7184aa1e0) at ../ldb_tdb/ldb_search.c:287
    SSSD#23 0x00007fe6fdc2acbb in ltdb_dn_list_load (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183c4940, list=list@entry=0x7fe7183c3a30) at ../ldb_tdb/ldb_index.c:181
    SSSD#24 0x00007fe6fdc2bbbb in ltdb_index_add1 (module=module@entry=0x7fe70e4eab50, dn=dn@entry=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", v_idx=v_idx@entry=0, el=<optimized out>, el=<optimized out>) at ../ldb_tdb/ldb_index.c:1134
    SSSD#25 0x00007fe6fdc2c62c in ltdb_index_add_el (el=0x7fe7184aa3e0, dn=0x7fe7183bf3e0 "name=testuser7045@domain.com,cn=users,cn=DOMAIN.COM,cn=sysdb", module=0x7fe70e4eab50) at ../ldb_tdb/ldb_index.c:1180
    SSSD#26 ltdb_index_add_element (module=module@entry=0x7fe70e4eab50, dn=<optimized out>, el=el@entry=0x7fe7184aa3e0) at ../ldb_tdb/ldb_index.c:1290
    SSSD#27 0x00007fe6fdc290bb in ltdb_modify_internal (module=module@entry=0x7fe70e4eab50, msg=0x7fe7183bf0c0, req=req@entry=0x7fe7183bdc10) at ../ldb_tdb/ldb_tdb.c:903
    SSSD#28 0x00007fe6fdc2958a in ltdb_modify (ctx=0x7fe7183c2950, ctx=0x7fe7183c2950) at ../ldb_tdb/ldb_tdb.c:998
    SSSD#29 ltdb_callback (ev=<optimized out>, te=<optimized out>, t=..., private_data=<optimized out>) at ../ldb_tdb/ldb_tdb.c:1380
    SSSD#30 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4d2890) at ../tevent_timed.c:341
    SSSD#31 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4d2890, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#32 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4d2890, location=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent_standard.c:114
    SSSD#33 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4d2890, location=location@entry=0x7fe70bb72ec5 "../common/ldb.c:631") at ../tevent.c:533
    SSSD#34 0x00007fe70bb6bc4f in ldb_wait (handle=0x7fe7183c4530, type=<optimized out>) at ../common/ldb.c:631
    SSSD#35 0x00007fe70bb6c793 in ldb_autotransaction_request (ldb=0x7fe70e4d52c0, req=0x7fe7183bdc10) at ../common/ldb.c:573
    SSSD#36 0x00007fe70bb6d263 in ldb_modify (ldb=ldb@entry=0x7fe70e4d52c0, message=<optimized out>) at ../common/ldb.c:1655
    SSSD#37 0x00007fe70bfa2ab5 in sysdb_set_cache_entry_attr (ldb=0x7fe70e4d52c0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bf680, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1159
    SSSD#38 0x00007fe70bfa304d in sysdb_rep_ts_entry_attr (sysdb=0x7fe70e4eadd0, attrs=0x7fe7183bf680, entry_dn=0x7fe7183c4760) at src/db/sysdb_ops.c:1218
    SSSD#39 sysdb_set_ts_entry_attr (sysdb=sysdb@entry=0x7fe70e4eadd0, entry_dn=entry_dn@entry=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1248
    SSSD#40 0x00007fe70bfa4aa9 in sysdb_set_entry_attr (sysdb=0x7fe70e4eadd0, entry_dn=0x7fe7183c4760, attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1199
    SSSD#41 0x00007fe70bfa4b5f in sysdb_set_user_attr (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", attrs=attrs@entry=0x7fe7183bb840, mod_op=mod_op@entry=2) at src/db/sysdb_ops.c:1285
    SSSD#42 0x00007fe70bfa58c3 in sysdb_add_user (domain=domain@entry=0x7fe70e4d62f0, name=name@entry=0x7fe7183c01f0 "testuser7045@domain.com", uid=uid@entry=1415408147, gid=<optimized out>, gid@entry=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:1928
    SSSD#43 0x00007fe70bfab271 in sysdb_store_new_user (now=1481105315, cache_timeout=5400, attrs=0x7fe7183bb840, orig_dn=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", shell=0x0, homedir=0x0, gecos=0x7fe710465d00 "Test User7045", gid=1415400513, uid=1415408147, name=0x7fe7183c01f0 "testuser7045@domain.com", domain=0x7fe70e4d62f0) at src/db/sysdb_ops.c:2549
    SSSD#44 sysdb_store_user (domain=domain@entry=0x7fe70e4d62f0, name=0x7fe7183c01f0 "testuser7045@domain.com", pwd=pwd@entry=0x0, uid=1415408147, gid=1415400513, gecos=gecos@entry=0x7fe710465d00 "Test User7045", homedir=homedir@entry=0x0, shell=shell@entry=0x0, orig_dn=orig_dn@entry=0x7fe710465940 "CN=Test User7045,OU=Sales,DC=DOMAIN,DC=COM", attrs=attrs@entry=0x7fe7183bb840, remove_attrs=0x7fe7183c08a0, cache_timeout=cache_timeout@entry=5400, now=now@entry=1481105315) at src/db/sysdb_ops.c:2499
    SSSD#45 0x00007fe6fba0d9f9 in sdap_save_user (memctx=memctx@entry=0x7fe70e544ee0, opts=opts@entry=0x7fe70e518400, dom=dom@entry=0x7fe70e4d62f0, attrs=<optimized out>, _usn_value=_usn_value@entry=0x7ffcd5d2c260, now=now@entry=1481105315) at src/providers/ldap/sdap_async_users.c:509
    SSSD#46 0x00007fe6fba0df9a in sdap_save_users (memctx=memctx@entry=0x7fe70e544e40, sysdb=0x7fe70e4eadd0, dom=0x7fe70e4d62f0, opts=0x7fe70e518400, users=<optimized out>, num_users=10006, _usn_value=_usn_value@entry=0x7fe70e544e60) at src/providers/ldap/sdap_async_users.c:572
    SSSD#47 0x00007fe6fba0e460 in sdap_get_users_done (subreq=<optimized out>) at src/providers/ldap/sdap_async_users.c:938
    SSSD#48 0x00007fe6fba0c9d5 in sdap_search_user_process (subreq=0x0) at src/providers/ldap/sdap_async_users.c:814
    SSSD#49 0x00007fe6fba07379 in generic_ext_search_handler (subreq=0x0, opts=<optimized out>) at src/providers/ldap/sdap_async.c:1689
    SSSD#50 0x00007fe6fba0991b in sdap_get_generic_op_finished (op=<optimized out>, reply=<optimized out>, error=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:1621
    SSSD#51 0x00007fe6fba083cd in sdap_process_message (ev=<optimized out>, sh=<optimized out>, msg=0x7fe70e5f9ce0) at src/providers/ldap/sdap_async.c:353
    SSSD#52 sdap_process_result (ev=<optimized out>, pvt=<optimized out>) at src/providers/ldap/sdap_async.c:197
    SSSD#53 0x00007fe708664b4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7fe70e4cbc30) at ../tevent_timed.c:341
    SSSD#54 0x00007fe708665b5a in epoll_event_loop_once (ev=0x7fe70e4cbc30, location=<optimized out>) at ../tevent_epoll.c:911
    SSSD#55 0x00007fe708664257 in std_event_loop_once (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:114
    SSSD#56 0x00007fe70866040d in _tevent_loop_once (ev=ev@entry=0x7fe70e4cbc30, location=location@entry=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:533
    SSSD#57 0x00007fe7086605ab in tevent_common_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent.c:637
    SSSD#58 0x00007fe7086641f7 in std_event_loop_wait (ev=0x7fe70e4cbc30, location=0x7fe70bfee8e7 "src/util/server.c:702") at ../tevent_standard.c:140
    SSSD#59 0x00007fe70bfd1993 in server_loop (main_ctx=0x7fe70e4cd080) at src/util/server.c:702
    SSSD#60 0x00007fe70c84cb82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587

Resolves:
https://fedorahosted.org/sssd/ticket/3266

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Pavel Březina <pbrezina@redhat.com>
Reviewed-by: Jakub Hrozek <jhrozek@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
(cherry picked from commit e6a5f8c)
(cherry picked from commit 0606a71)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants