Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix frequent segmentation fault of Python tests by refactoring ExecutorService #12427

Merged

Conversation

BewareMyPower
Copy link
Contributor

Fixes #11635

Motivation

Python tests failed very often with segmentation fault. It can be reproduced easily based on the apachepulsar/pulsar-build:ubuntu-16.04-pb3 image in local env. After dumping the core file and running gdb python core to debug, we can see the following backtrace.

#0  0x00007f655e5e8d0d in std::__atomic_base<long>::load (__m=std::memory_order_seq_cst, this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:396
#1  std::__atomic_base<long>::operator long (this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:259
#2  0x00007f655e5e6348 in boost::asio::detail::task_io_service::run (this=0x0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:136
#3  0x00007f655e5e6cc4 in boost::asio::io_service::run (this=0x1f11990) at /usr/include/boost/asio/impl/io_service.ipp:59
#4  0x00007f655e5e32d0 in pulsar::ExecutorService::startWorker (this=0x1f11130, io_service=std::shared_ptr (count 2, weak 0) 0x1e8ba00) at /pulsar/pulsar-client-cpp/lib/ExecutorService.cc:37

The io_service object is not null, but the internal impl_ field, whose type is a task_io_service pointer, is null (see (this=0x0 of #2).

The cause is io_service::run is called in a dependent thread, while the destructor of ExecutorService is called in another thread. After the ExecutorService is destructed, the internal thread field worker_ and the io_service field will both be invalid to access. io_service::run should not be called after that.

Modifications

Refactor the ExecutorService. Since it's not copyable, it's redundant to store io_service and io_service::work as smart pointers. In addition, the thread that runs io_service is never used, just detach it when ExecutorService is created.

The key point is to check whether ExecutorService#close is called in the thread by examining the atomic boolean value closed_. Besides, the shared pointer is captured in the thread so that the ExecutorService's lifetime could be extended.

@BewareMyPower
Copy link
Contributor Author

Though it has been verified in my local env for many times, I'm still looking forward to what will happen in CI. Please don't rerun the cpp tests if it failed.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @BewareMyPower !

@BewareMyPower
Copy link
Contributor Author

PeriodicTaskTest failed before running Python tests. And there're some compilation errors on CentOS 7. I'll fix them soon.

@BewareMyPower BewareMyPower added this to the 2.10.0 milestone Oct 20, 2021
@merlimat merlimat merged commit af0ea69 into apache:master Oct 21, 2021
@BewareMyPower BewareMyPower deleted the bewaremypower/fix-py-tests-segfault branch October 21, 2021 07:01
codelipenghui pushed a commit that referenced this pull request Oct 21, 2021
…orService (#12427)

* Refactor ExecutorService to avoid usage of pointers

* Remove unnecessary friend class

* Fix CentOS 7 build

* Fix PeriodicalTest

(cherry picked from commit af0ea69)
@codelipenghui codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Oct 21, 2021
eolivelli pushed a commit to eolivelli/pulsar that referenced this pull request Nov 29, 2021
…orService (apache#12427)

* Refactor ExecutorService to avoid usage of pointers

* Remove unnecessary friend class

* Fix CentOS 7 build

* Fix PeriodicalTest
codelipenghui pushed a commit that referenced this pull request Dec 20, 2021
…orService (#12427)

* Refactor ExecutorService to avoid usage of pointers

* Remove unnecessary friend class

* Fix CentOS 7 build

* Fix PeriodicalTest

(cherry picked from commit af0ea69)
@codelipenghui codelipenghui added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.8.2 release/2.9.2 type/flaky-tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Tests] "python pulsar_test.py" failed with segmentation fault and core dump
5 participants