Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault sometimes happens when running unit tests #203

Open
BewareMyPower opened this issue Mar 12, 2024 · 2 comments
Open

Segmentation fault sometimes happens when running unit tests #203

BewareMyPower opened this issue Mar 12, 2024 · 2 comments

Comments

@BewareMyPower
Copy link
Contributor

See

...
2024-03-12T02:35:31.1396530Z 2024-03-12 02:35:31.139 DEBUG [140455429797440] ClientConnection:886 | [[::1]:52958 -> [::1]:6650] Handling incoming command: SEND_RECEIPT
2024-03-12T02:35:31.1398821Z 2024-03-12 02:35:31.139 DEBUG [140455429797440] ClientConnection:1498 | [[::1]:52958 -> [::1]:6650] Got receipt for producer: 0 -- msg: 3-- message id: (81,3,-1,-1)
2024-03-12T02:35:31.1401720Z 2024-03-12 02:35:31.139 DEBUG [140455429797440] ProducerImpl:940 | [persistent://public/default/my-python-topic-keyshare-sticky-1710210931.0501215, standalone-0-57] Received ack for msg 3
2024-03-12T02:35:31.1404238Z 2024-03-12 02:35:31.139 DEBUG [140455429797440] ClientConnection:861 | [[::1]:52958 -> [::1]:6650] Received a message from the server for consumer: 1
2024-03-12T02:35:31.1405802Z 2024-03-12 02:35:31.139 DEBUG [140455429797440] ConsumerImpl:516 | [persistent://public/default/my-python-topic-keyshare-sticky-1710210931.0501215, my-sub, 1] Received Message -- Size: 7
2024-03-12T02:35:31.1407524Z 2024-03-12 02:35:31.139 DEBUG [140455429797440] ConsumerImpl:564 | [persistent://public/default/my-python-topic-keyshare-sticky-1710210931.0501215, my-sub, 1]  metadata.num_messages_in_batch() = 1
2024-03-12T02:35:31.1410975Z 2024-03-12 02:35:31.139 DEBUG [140455429797440] ConsumerImpl:566 | [persistent://public/default/my-python-topic-keyshare-sticky-1710210931.0501215, my-sub, 1]  metadata.has_num_messages_in_batch() = 0
2024-03-12T02:35:31.1413569Z 2024-03-12 02:35:31.140 DEBUG [140455413016128] RetryableOperation:129 | Run operation get-partition-metadata-persistent://public/default/partitioned_topic_name_test, remaining time: 29900 ms
2024-03-12T02:35:35.1471950Z ....................................../tests/run-unit-tests.sh: line 28:  3956 Segmentation fault      (core dumped) python3 pulsar_test.py
@BewareMyPower
Copy link
Contributor Author

2024-03-12 02:35:31.040 INFO  [140455429797440] RetryableOperation:114 | Reschedule get-partition-metadata-persistent://public/default/partitioned_topic_name_test for 100 ms, remaining time: 29900 ms
2024-03-12 02:35:31.040 DEBUG [140455429797440] ExecutorService:44 | Event loop of ExecutorService exits successfully
2024-03-12 02:35:31.040 DEBUG [140457878760320] ClientImpl:775 | listenerExecutorProvider_ is closed
2024-03-12 02:35:31.040 DEBUG [140457878760320] ClientImpl:780 | partitionListenerExecutorProvider_ is closed
2024-03-12 02:35:31.041 DEBUG [140457878760320] ProducerImpl:117 | [persistent://public/default/partitioned_topic_name_test-partition-0, standalone-0-53] ~ProducerImpl
2024-03-12 02:35:31.041 INFO  [140457878760320] ProducerImpl:757 | Producer - [persistent://public/default/partitioned_topic_name_test-partition-0, standalone-0-53] , [batching  = off]
2024-03-12 02:35:31.041 DEBUG [140457878760320] ProducerImpl:117 | [persistent://public/default/partitioned_topic_name_test-partition-1, standalone-0-54] ~ProducerImpl
2024-03-12 02:35:31.041 INFO  [140457878760320] ProducerImpl:757 | Producer - [persistent://public/default/partitioned_topic_name_test-partition-1, standalone-0-54] , [batching  = off]
2024-03-12 02:35:31.041 DEBUG [140457878760320] ProducerImpl:117 | [persistent://public/default/partitioned_topic_name_test-partition-2, standalone-0-55] ~ProducerImpl
2024-03-12 02:35:31.041 INFO  [140457878760320] ProducerImpl:757 | Producer - [persistent://public/default/partitioned_topic_name_test-partition-2, standalone-0-55] , [batching  = off]
# another test...
2024-03-12 02:35:31.140 DEBUG [140455413016128] RetryableOperation:129 | Run operation get-partition-metadata-persistent://public/default/partitioned_topic_name_test, remaining time: 29900 ms

It seems that the RetryableOperation is still scheduled after client.close() is called.

@BewareMyPower
Copy link
Contributor Author

The root cause might be an incorrect implementation of the shutdown of a partitioned producer.

Here are the output when I ran PulsarTest.test_shutdown_client with debug level logs:

2024-03-12 15:09:35.278 DEBUG [0x1e4d42100] ClientImpl:699 | 1 producers and 0 consumers have been shutdown.
2024-03-12 15:09:35.278 DEBUG [0x16df5f000] ConnectionPool:86 | Got connection from pool for pulsar://localhost:6650-0 use_count: 2 @ 0x15ca15e00
2024-03-12 15:09:35.278 INFO  [0x1e4d42100] ClientConnection:1325 | [127.0.0.1:63468 -> 127.0.0.1:6650] Connection disconnected (refCnt: 2)
2024-03-12 15:09:35.278 DEBUG [0x16de47000] ClientConnection:1262 | [127.0.0.1:63468 -> 127.0.0.1:6650]  Ignoring timer cancelled event, code[system:89]
2024-03-12 15:09:35.278 INFO  [0x1e4d42100] HandlerBase:150 | [persistent://public/default/partitioned_topic_name_test-partition-0, standalone-155-58] Schedule reconnection in 0.1 s
2024-03-12 15:09:35.279 INFO  [0x1e4d42100] HandlerBase:150 | [persistent://public/default/partitioned_topic_name_test-partition-1, standalone-155-59] Schedule reconnection in 0.1 s
2024-03-12 15:09:35.279 DEBUG [0x16df5f000] BinaryProtoLookupService:148 | PartitionMetadataLookup failed for persistent://public/default/partitioned_topic_name_test, result NotConnected
2024-03-12 15:09:35.279 INFO  [0x1e4d42100] HandlerBase:150 | [persistent://public/default/partitioned_topic_name_test-partition-2, standalone-155-60] Schedule reconnection in 0.1 s

As you can see the internal producers still scheduled reconnections after shutdown.

After using another topic name in test_shutdown_client, no reconnection happened:

024-03-12 15:20:36.796 DEBUG [0x1e4d42100] ClientImpl:699 | 1 producers and 0 consumers have been shutdown.
2024-03-12 15:20:36.796 INFO  [0x1e4d42100] ClientConnection:1325 | [127.0.0.1:63850 -> 127.0.0.1:6650] Connection disconnected (refCnt: 1)
2024-03-12 15:20:36.796 DEBUG [0x1e4d42100] ClientImpl:705 | ConnectionPool is closed
2024-03-12 15:20:36.796 DEBUG [0x16fd93000] ClientConnection:1262 | [127.0.0.1:63850 -> 127.0.0.1:6650]  Ignoring timer cancelled event, code[system:89]
2024-03-12 15:20:36.796 INFO  [0x16fd93000] ClientConnection:274 | [127.0.0.1:63850 -> 127.0.0.1:6650] Destroyed connection to pulsar://localhost:6650-0
2024-03-12 15:20:36.796 DEBUG [0x16fd93000] ExecutorService:44 | Event loop of ExecutorService exits successfully
2024-03-12 15:20:36.796 DEBUG [0x1e4d42100] ClientImpl:715 | ioExecutorProvider_ is closed
2024-03-12 15:20:36.796 DEBUG [0x1e4d42100] ClientImpl:720 | listenerExecutorProvider_ is closed
2024-03-12 15:20:36.796 DEBUG [0x1e4d42100] ClientImpl:725 | partitionListenerExecutorProvider_ is closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant