Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock on cos cache rebuild #2138

Closed
389-ds-bot opened this issue Sep 13, 2020 · 11 comments
Closed

deadlock on cos cache rebuild #2138

389-ds-bot opened this issue Sep 13, 2020 · 11 comments
Labels
closed: fixed Migration flag - Issue
Milestone

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49079


The problem is easily reproducible using replication (see test case) but may also exists on standalone instance.

There is a thread responsible to rebuild cos cache when it is notified that a cos definition was updated. To do so this thread, holding the notification lock, will rebuild the cache reading the definitions (on backend) trying to acquire db page.

A thread that updates a cos definition will notify the first thread. The update acquire db page and tries to acquire notification lock.

This is a scenario of deadlock because of locks taken in the opposite order.

@389-ds-bot 389-ds-bot added the closed: fixed Migration flag - Issue label Sep 13, 2020
@389-ds-bot 389-ds-bot added this to the 1.3.6 backlog milestone Sep 13, 2020
@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-01-06 21:05:02

attachment
ticket49079_test.py

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-01-06 21:07:56

Signature of the deadlock is

Thread 41 (Thread 0x7fa6dce71700 (LWP 4933)):
0  pthread_cond_wait@@GLIBC_2.3.2 (...)
1  0x00007fa6e89802f3 in __db_pthread_mutex_condwait (...)
2  __db_hybrid_mutex_suspend (...)
3  0x00007fa6e897f640 in __db_tas_mutex_lock_int (...)
4  __db_tas_mutex_lock (...)
5  0x00007fa6e8a29d0a in __lock_get_internal (...)
6  0x00007fa6e8a2a7f0 in __lock_get (...)
7  0x00007fa6e8a56132 in __db_lget (...)
8  0x00007fa6e899d5e5 in __bam_search (...)
9  0x00007fa6e8988256 in __bamc_search (...)
10 0x00007fa6e8989d0f in __bamc_get (...)
11 0x00007fa6e8a42c76 in __dbc_iget (...)
12 0x00007fa6e8a4344d in __dbc_get (...)
13 0x00007fa6e8a4f863 in __db_get (...)
14 0x00007fa6e8a53143 in __db_get_pp (...)
15 0x00007fa6e3d11b6b in id2entry (...)
16 0x00007fa6e3d4d2dd in ldbm_back_next_search_entry_ext (...)
17 0x00007fa6f179782b in iterate (...)
18 0x00007fa6f1797daa in send_results_ext (...)
19 0x00007fa6f1799811 in op_shared_search (...)
20 0x00007fa6f17a9ede in search_internal_callback_pb (...)
21 0x00007fa6f17aa459 in slapi_search_internal_callback_pb (...)
22 0x00007fa6e6c4da14 in cos_cache_add_dn_tmpls (...)
23 cos_cache_add_defn (...)
24 cos_dn_defs_cb (...)
25 0x00007fa6f17bb49d in send_ldap_search_entry_ext (...)
26 0x00007fa6f17bbd2c in send_ldap_search_entry (...)
27 0x00007fa6f1797c59 in iterate (...)
28 0x00007fa6f1797daa in send_results_ext (...)
29 0x00007fa6f1799811 in op_shared_search (...)
30 0x00007fa6f17a9ede in search_internal_callback_pb (...)
31 0x00007fa6f17aa459 in slapi_search_internal_callback_pb (...)
32 0x00007fa6e6c4eec0 in cos_cache_add_dn_defs (...)
33 cos_cache_build_definition_list (...)
34 cos_cache_create (...)
35 0x00007fa6e6c4f70f in cos_cache_wait_on_change (...)
36 0x00007fa6ef97f96b in _pt_root (...)
37 0x00007fa6ef31fdc5 in start_thread (...)
38 0x00007fa6ef04d1cd in clone (...)

Thread 20 (Thread 0x7fa6c0ff9700 (LWP 4955)):
0  __lll_lock_wait (...)
1  0x00007fa6ef321d38 in _L_lock_975 () from /lib64/libpthread.so.0
2  0x00007fa6ef321ce1 in __GI___pthread_mutex_lock (...)
3  0x00007fa6ef979e39 in PR_Lock (...)
4  0x00007fa6f17c7f7a in slapi_lock_mutex (...)
5  0x00007fa6e6c4ff04 in cos_cache_change_notify (...)
6  0x00007fa6e6c4abfd in cos_post_op (...)
7  0x00007fa6f17a4d38 in plugin_call_func (...)
8  0x00007fa6f17a4fc3 in plugin_call_list (...)
9  plugin_call_plugins (...)
10 0x00007fa6f1792575 in op_shared_modify (...)
11 0x00007fa6f1793014 in modify_internal_pb (...)
12 0x00007fa6f1793b13 in slapi_modify_internal_pb (...)
13 0x00007fa6e3a96926 in urp_fixup_modify_entry (...)
14 0x00007fa6e3a96a62 in mod_namingconflict_attr (...)
15 0x00007fa6e3a974ea in urp_annotate_dn (...)
16 0x00007fa6e3a977fb in urp_add_operation (...)
17 0x00007fa6e3a7d2e8 in multimaster_bepreop_add (...)
18 0x00007fa6f17a4d38 in plugin_call_func (...)
19 0x00007fa6f17a4fc3 in plugin_call_list (...)
20 plugin_call_plugins (...)
21 0x00007fa6e3d297b1 in ldbm_back_add (...)
22 0x00007fa6f1748ee8 in op_shared_add (...)
23 0x00007fa6f174a1b0 in do_add (...)
24 0x00007fa6f1c7e9b3 in connection_dispatch_operation (...)
25 connection_threadmain (...)
26 0x00007fa6ef97f96b in _pt_root (...)
27 0x00007fa6ef31fdc5 in start_thread (...)
28 0x00007fa6ef04d1cd in clone (...)

@389-ds-bot
Copy link
Author

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-01-11 06:06:14

A small note, is cos_cache_create_unloc()? meant to be cos_cache_create_unlock()?

Otherwise, I can understand the fix and how it works, it seems reasonable. I'm wondering if there is a cleaner way to achieve this with reordering of the backend lock compared to the change_lock, or with monitors.

Regardless, I think this solution will work,

@389-ds-bot
Copy link
Author

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-01-11 18:42:36

Thanks William and Ludwig for the reviews

'''git push origin master'''
Counting objects: 7, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 1.90 KiB | 0 bytes/s, done.
Total 7 (delta 4), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
9feca58..ac44337 master -> master

commit ac44337
Author: Thierry Bordaz tbordaz@redhat.com
Date: Tue Jan 10 14:32:53 2017 +0100

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2017-01-26 05:14:26

Pushed to 389-ds-base-1.3.5:
79a3dea..3ac12cb 389-ds-base-1.3.5 -> 389-ds-base-1.3.5
commit 3ac12cb

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-02-11 22:49:40

Metadata Update from @tbordaz:

  • Issue assigned to tbordaz
  • Issue set to the milestone: 1.3.6 backlog

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-13 04:13:38

Hey Thierry, is this issue now resolved? Would you like to close it?

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2017-03-13 04:13:47

Metadata Update from @Firstyear:

  • Issue close_status updated to: None

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-03-29 15:35:07

Metadata Update from @tbordaz:

  • Issue close_status updated to: fixed
  • Issue status updated to: Closed (was: Open)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: fixed Migration flag - Issue
Projects
None yet
Development

No branches or pull requests

1 participant