Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: fix deadlock in ActivePyModules::get_osdmap() #38762

Merged
merged 1 commit into from Jan 15, 2021

Conversation

peng-jiaqi
Copy link
Contributor

@peng-jiaqi peng-jiaqi commented Jan 5, 2021

In function "ActivePyModules::get_osdmap()", We do not read or write to object "ActivePyModules", so it is safe to delete lock "ActivePyModules::lock", and it can avoid other thread waiting for lock "ActivePyModules::lock"

Fixes: https://tracker.ceph.com/issues/48852

Signed-off-by: peng jiaqi peng.jiaqi@zte.com.cn

@tchaikov
Copy link
Contributor

tchaikov commented Jan 5, 2021

@peng-jiaqi i think this issue has been addressed by 0601b31. could you give it a try?

@peng-jiaqi
Copy link
Contributor Author

I think my issue and 0601b31 are not the same, In my issue, Thread 39 held lock "Objecter::rwlock" in function "Objecter::handle_osd_map()", and waited on lock "ActivePyModules::lock" in function "ActivePyModules::notify_all()". Thread 30 held lock "ActivePyModules::lock"in function "ActivePyModules::get_osdmap()", and waited on lock "Objecter::rwlock" in function "with_osdmap()". Deadlock occurred in Thread 39 and Thread 30. In 0601b31, Deadlock occurs because of Gil. @tchaikov

@tchaikov
Copy link
Contributor

tchaikov commented Jan 6, 2021

@peng-jiaqi thanks for the explanation! it makes sense to me. could you please

  1. create a tracker ticket at https://tracker.ceph.com. so we can backport this fix,
  2. add the backtrace in the tracker ticket as part of its description,
  3. outline the deadlock in the commit message instead of dumping the backtrace, and
  4. connect this fix to the tracker ticket in the commit message using the notation of Fixes: https://tracker.ceph.com/issues/<id>

?

also, you might want to note down the reason why it's safe to drop the lock in the commit message as well. my guess is that objecter already protects its internal state with the shared lock .

@tchaikov
Copy link
Contributor

@peng-jiaqi ping?

In function "ActivePyModules::get_osdmap()", We do not read or write to
object "ActivePyModules", so it is safe to delete lock
"ActivePyModules::lock", and it can avoid other thread waiting for lock
"ActivePyModules::lock"

 Fixes: https://tracker.ceph.com/issues/48852

Signed-off-by: peng jiaqi <peng.jiaqi@zte.com.cn>
@peng-jiaqi peng-jiaqi changed the title mgr: not use ActivePyModules lock in ActivePyModules::get_osdmap() mgr: fix deadlock in ActivePyModules::get_osdmap() Jan 15, 2021
@peng-jiaqi
Copy link
Contributor Author

I'm back. @tchaikov.

Fixes: https://tracker.ceph.com/issues/48852

In function "ActivePyModules::get_osdmap()", We do not read or write to object "ActivePyModules", so it is safe to delete lock "ActivePyModules::lock", and it can avoid other thread waiting for lock "ActivePyModules::lock"

@tchaikov
Copy link
Contributor

@peng-jiaqi thank you! i added nautilus and octopus to the backport field of the ticket you created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants