Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.6.0 cpp client crash #7327

Closed
firefeifei opened this issue Jun 21, 2020 · 14 comments
Closed

2.6.0 cpp client crash #7327

firefeifei opened this issue Jun 21, 2020 · 14 comments
Labels
area/client type/bug The PR fixed a bug or issue reported a bug

Comments

@firefeifei
Copy link

Describe the bug
A clear and concise description of what the bug is.

*To Reproduce

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

call this api, cpp client crash

pulsar_client_create_producer(m_client, topic.c_str(), producer_conf, &producer)

Desktop (please complete the following information):

  • OS:linux kernel 4.14.105

Additional context

(gdb) bt
#0 0x00007f8307f4c377 in raise () from /lib64/libc.so.6
#1 0x00007f8307f4da68 in abort () from /lib64/libc.so.6
#2 0x0000000000c7c515 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x0000000000c27896 in __cxxabiv1::__terminate (handler=) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:38
#4 0x0000000000c278c3 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#5 0x0000000000c6e775 in std::(anonymous namespace)::execute_native_thread_routine (__p=) at ../../../../../libstdc++-v3/src/c++11/thread.cc:92
#6 0x00007f8308d00ea5 in start_thread () from /lib64/libpthread.so.0
#7 0x00007f83080148cd in clone () from /lib64/libc.so.6

@firefeifei firefeifei added the type/bug The PR fixed a bug or issue reported a bug label Jun 21, 2020
@sijie
Copy link
Member

sijie commented Jun 23, 2020

@firefeifei Can you share the code example so that we can reproduce?

@firefeifei
Copy link
Author

If I configure IP and Port, the Pulsar cluster does not start and crashes
If the configured IP and Port pulsar services are normal, then there is no crash
@sijie

@merlimat
Copy link
Contributor

@firefeifei Please provide a simple (complete) code example that reproduce the issue. That would make it far easier to understand what's happening.

@firefeifei
Copy link
Author

m_conf = pulsar_client_configuration_create();
pulsar_client_configuration_set_logger(m_conf, PulsarProducer::pulsar_logger, (void*)this);
m_client = pulsar_client_create(service_url.c_str(), m_conf);
pulsar_producer_configuration_t* producer_conf = pulsar_producer_configuration_create();
pulsar_producer_configuration_set_batching_enabled(producer_conf, 0);
pulsar_producer_configuration_set_max_pending_messages(producer_conf, 1000000);
pulsar_result err = pulsar_client_create_producer(m_client, topic.c_str(), producer_conf, &producer);
 if (err != pulsar_result_Ok) {
    pulsar_producer_configuration_free(producer_conf);
   return NULL;
 }

service_url: 127.0.0.1:5678

@merlimat
Copy link
Contributor

Have you tried removing the custom logger?

@firefeifei
Copy link
Author

You can test this code without custom logger

@BewareMyPower
Copy link
Contributor

I cannot reproduce the problem, assuming your code is like:

#include <iostream>
#include <string>
#include <pulsar/c/client.h>

int main(int argc, char* argv[]) {
    if (argc < 2) {
        std::cerr << "Usage: " << argv[0] << " service-url" << std::endl;
        return 1;
    }
    std::string topic = "Foo";
    std::string service_url = argv[1];
    auto m_conf = pulsar_client_configuration_create();
    auto m_client = pulsar_client_create(service_url.c_str(), m_conf);
    auto producer_conf = pulsar_producer_configuration_create();
    pulsar_producer_configuration_set_batching_enabled(producer_conf, 0);
    pulsar_producer_configuration_set_max_pending_messages(producer_conf, 10000);
    pulsar_producer_t* producer = NULL;
    auto err = pulsar_client_create_producer(m_client, topic.c_str(), producer_conf, &producer);
    if (err != pulsar_result_Ok) {
        pulsar_producer_configuration_free(producer_conf);
        return 1;
    }
    return 0;
}

The running result:

$ ./examples/SampleProducer pulsar://127.0.0.1:5678
2020-07-04 02:32:31.475 INFO  [140369736358592] ConnectionPool:85 | Created connection for pulsar://127.0.0.1:5678
2020-07-04 02:32:31.477 ERROR [140369636267776] ClientConnection:385 | [<none> -> pulsar://127.0.0.1:5678] Failed to establish connection: Connection refused
2020-07-04 02:32:31.477 INFO  [140369636267776] ClientConnection:1372 | [<none> -> pulsar://127.0.0.1:5678] Connection closed
2020-07-04 02:32:31.477 ERROR [140369636267776] ClientImpl:180 | Error Checking/Getting Partition Metadata while creating producer on persistent://public/default/Foo -- ConnectError
2020-07-04 02:32:31.477 INFO  [140369636267776] ClientConnection:235 | [<none> -> pulsar://127.0.0.1:5678] Destroyed connection

Could you provide your log? In addition, could you use a higher version GCC to debug? Your backtrace lost some debug info because of libstdc++'s BUG, see How we discovered why C++ exceptions disappear in stack trace

@firefeifei
Copy link
Author

firefeifei commented Nov 2, 2020

#0  0x0000000000c3ba64 in boost::asio::detail::task_io_service::wake_one_thread_and_unlock(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&) ()
#1  0x0000000000e5902e in boost::asio::detail::task_io_service::init_task (this=0x7f75200022c0) at /usr/local/include/boost/asio/detail/impl/task_io_service.ipp:130
#2  0x0000000000e590e0 in init_task (this=<optimized out>) at /usr/local/include/boost/asio/detail/impl/epoll_reactor.ipp:145
#3  reactive_socket_service_base (io_service=..., this=0x7f7520002b08) at /usr/local/include/boost/asio/detail/impl/reactive_socket_service_base.ipp:34
#4  reactive_socket_service (io_service=..., this=0x7f7520002b08) at /usr/local/include/boost/asio/detail/reactive_socket_service.hpp:77
#5  stream_socket_service (io_service=..., this=0x7f7520002ae0) at /usr/local/include/boost/asio/stream_socket_service.hpp:91
#6  boost::asio::detail::service_registry::create<boost::asio::stream_socket_service<boost::asio::ip::tcp> > (owner=...) at /usr/local/include/boost/asio/detail/impl/service_registry.hpp:81
#7  0x0000000000c3b467 in boost::asio::detail::service_registry::do_use_service(boost::asio::io_service::service::key const&, boost::asio::io_service::service* (*)(boost::asio::io_service&)) ()
#8  0x0000000000e576a9 in use_service<boost::asio::stream_socket_service<boost::asio::ip::tcp> > (this=<optimized out>) at /usr/local/include/boost/asio/detail/impl/service_registry.hpp:48
#9  use_service<boost::asio::stream_socket_service<boost::asio::ip::tcp> > (ios=...) at /usr/local/include/boost/asio/impl/io_service.hpp:33
#10 basic_io_object (io_service=..., this=0x7f7520002ab0) at /usr/local/include/boost/asio/basic_io_object.hpp:183
#11 basic_socket (io_service=..., this=0x7f7520002ab0) at /usr/local/include/boost/asio/basic_socket.hpp:71
#12 basic_stream_socket (io_service=..., this=0x7f7520002ab0) at /usr/local/include/boost/asio/basic_stream_socket.hpp:73
#13 pulsar::ExecutorService::createSocket (this=<optimized out>) at /data/commlibsrc/pulsar-2.6.0/pulsar-client-cpp/lib/ExecutorService.cc:48
#14 0x0000000000f18d2c in pulsar::ClientConnection::ClientConnection (this=0x7f7520004070, logicalAddress="pulsar://pulsar-broker.service.ft:6650", physicalAddress="pulsar://pulsar-broker.service.ft:6650", executor=std::shared_ptr (count 36865104, weak -1) 0x7f753d278f30, 
    clientConfiguration=..., authentication=std::shared_ptr (count 3, weak 0) 0x2328430) at /data/commlibsrc/pulsar-2.6.0/pulsar-client-cpp/lib/ClientConnection.cc:169
#15 0x0000000000e43564 in pulsar::ConnectionPool::getConnectionAsync (this=0x2328520, logicalAddress="pulsar://pulsar-broker.service.ft:6650", physicalAddress="pulsar://pulsar-broker.service.ft:6650") at /data/commlibsrc/pulsar-2.6.0/pulsar-client-cpp/lib/ConnectionPool.cc:83
#16 0x0000000000f0b840 in pulsar::BinaryProtoLookupService::getPartitionMetadataAsync (this=0x2328748, topicName=...) at /data/commlibsrc/pulsar-2.6.0/pulsar-client-cpp/lib/BinaryProtoLookupService.cc:72
#17 0x0000000000e35ab5 in pulsar::ClientImpl::createProducerAsync(std::string const&, pulsar::ProducerConfiguration, std::function<void (pulsar::Result, pulsar::Producer)>) (this=<optimized out>, topic="persistent://public/data/test_pulsar_ds591", conf=..., callback=...)
    at /data/commlibsrc/pulsar-2.6.0/pulsar-client-cpp/lib/ClientImpl.cc:158
#18 0x0000000000e30775 in pulsar::Client::createProducerAsync(std::string const&, pulsar::ProducerConfiguration, std::function<void (pulsar::Result, pulsar::Producer)>) (this=this@entry=0x23281e0, topic="persistent://public/data/test_pulsar_ds591", conf=..., callback=...)
    at /data/commlibsrc/pulsar-2.6.0/pulsar-client-cpp/lib/Client.cc:65
#19 0x0000000000e30ee0 in pulsar::Client::createProducer (this=0x23281e0, topic="persistent://public/data/test_pulsar_ds591", conf=..., producer=...) at /data/commlibsrc/pulsar-2.6.0/pulsar-client-cpp/lib/Client.cc:53

@firefeifei
Copy link
Author

firefeifei commented Nov 2, 2020

Hello,@sijie
Basic function debugging, connect pulsar coredump

@BewareMyPower
Copy link
Contributor

From the stack we can see the segmentation fault was caused by boost::asio::ip::tcp::socket's constructor. I found some similar issues:

Could you give the version of boost dependencies by ldd libpulsar.so?

@firefeifei
Copy link
Author

1.54.0
link use static lib

@BewareMyPower
Copy link
Contributor

I've reproduced it with boost 1.54.

#include <string>
#include <pulsar/c/client.h>

int main(int argc, char* argv[]) {
    std::string service_url = "pulsar://localhost:65531";
    auto m_conf = pulsar_client_configuration_create();
    auto m_client = pulsar_client_create(service_url.c_str(), m_conf);
    auto producer_conf = pulsar_producer_configuration_create();
    pulsar_producer_t* producer = NULL;
    auto err = pulsar_client_create_producer(m_client, "my-topic", producer_conf, &producer);
    if (err != pulsar_result_Ok) {
        pulsar_producer_configuration_free(producer_conf);
        return 1;
    }
    return 0;
}

The output:

2020-11-06 21:46:16.585 INFO  [140320113425088] ConnectionPool:85 | Created connection for pulsar://localhost:65531
2020-11-06 21:46:16.589 INFO  [140320018294528] ClientConnection:235 | [<none> -> pulsar://localhost:65531] Destroyed connection
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
  what():  remote_endpoint: Transport endpoint is not connected
Aborted (core dumped)

Not sure whether it's the same issue. Or it's just because boost version is too low. I'll look into the issue later.

@BewareMyPower
Copy link
Contributor

Please try with following diff and take a look again

diff --git a/pulsar-client-cpp/lib/ClientConnection.cc b/pulsar-client-cpp/lib/ClientConnection.cc
index 22cc420..824d54a 100644
--- a/pulsar-client-cpp/lib/ClientConnection.cc
+++ b/pulsar-client-cpp/lib/ClientConnection.cc
@@ -340,9 +340,15 @@ void ClientConnection::handleTcpConnected(const boost::system::error_code& err,
                                           tcp::resolver::iterator endpointIterator) {
     if (!err) {
         std::stringstream cnxStringStream;
-        cnxStringStream << "[" << socket_->local_endpoint() << " -> " << socket_->remote_endpoint() << "] ";
-        cnxString_ = cnxStringStream.str();
-
+        try {
+            cnxStringStream << "[" << socket_->local_endpoint() << " -> " << socket_->remote_endpoint()
+                            << "] ";
+            cnxString_ = cnxStringStream.str();
+        } catch (const boost::system::system_error& e) {
+            LOG_ERROR("Failed to get endpoint: " << e.what());
+            close();
+            return;
+        }
         if (logicalAddress_ == physicalAddress_) {
             LOG_INFO(cnxString_ << "Connected to broker");
         } else {

@BewareMyPower
Copy link
Contributor

After discussing with @firefeifei , the issue is with the link order of libpulsar.a. After putting libpulsar.a before other dependencies like boost, it works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/client type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

4 participants