Skip to content

[fix][transaction] Remove single thread pool and fix the concurrent problem to avoid performance issues#14940

Closed
mattisonchao wants to merge 12 commits intoapache:masterfrom
mattisonchao:fix_handle_tc_connect
Closed

[fix][transaction] Remove single thread pool and fix the concurrent problem to avoid performance issues#14940
mattisonchao wants to merge 12 commits intoapache:masterfrom
mattisonchao:fix_handle_tc_connect

Conversation

@mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Mar 30, 2022

Motivation

According to #13969, We can know that adding a single-threaded pool is because deques in a concurrent environment will cause problems with requesting additions and requesting clears.

We can do some changes and remove the single-threaded pool to improve performance.

e.g:

Current we have threads A, B, C and D.

Backgound
Threads A and B request handleTcClientConnect concurrently, then A acquires the semaphore from openTransactionMetadataStore, and B adds the failure to acquire the semaphore to the deque.

Success openTransactionMetadataStore condition
When A's call to handleTcClientConnect succeeds, they will first release the semaphore and then complete all futures in the deque. Because if we release the semaphore after finishing the deque future, which will cause thread B to reconnect and add to the deque before thread A releases the semaphore and never completes Thread B.

Exceptionally openTransactionMetadataStore condition
When A fails to call handleTcClientConnect, they will release the semaphore first and get all futures(it's a threshold) that need to be complete exceptionally after releasing the semaphore to avoid an infinite loop caused by client reconnection, because If we release the semaphore after all deque futures are complete exceptionally, the client will immediately reconnect and add to the deque (because thread A does not release the semaphore) which may cause an infinite loop (which means something like, because We have an end time that will break the loop)

Other problems rely on the current implementation

  • When the future of the semaphore acquisition failure exceeds the maximum timeout. We currently clear the deque directly, which is an issue where the CompletableFuture may not complete.

  • When TransactionMetadataStoreService#handleTcClientConnect invoke method as bellow we omit the exception.

pulsarService.getBrokerService().checkTopicNsOwnership(TopicName
       .TRANSACTION_COORDINATOR_ASSIGN.getPartition((int) tcId.getId()).toString()).thenRun(() -> {
           //.... omit some code
      })
  • Avoid frequent context switching using thread pools.

  • When checking the TC status for the second time, we just finish the future and don't return. (the code is below)

// TransactionMetadataStoreService.java 181-185
// omit some code...
 if (stores.get(tcId) != null) {
      completableFuture.complete(null);
 }

openTransactionMetadataStore(tcId)
// omit some code...

Modifications

  • Remove the single thread pool to avoid poor performance.
  • Fix concurrent deque issue.
  • Use a bounded queue instead of the deque.
  • Modified to chained calls, all exceptions will not be ignored.
  • Refactor some code.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • no-need-doc

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Mar 30, 2022
@mattisonchao mattisonchao changed the title [fix][Transaction] Fix unhandled exception when checkTopicNsOwnership. [fix][Transaction] Fix potential unfinished future. Mar 30, 2022
@mattisonchao mattisonchao changed the title [fix][Transaction] Fix potential unfinished future. [fix][Transaction] Fix potentially unfinished CompletableFuture. Mar 30, 2022
@mattisonchao mattisonchao changed the title [fix][Transaction] Fix potentially unfinished CompletableFuture. [fix][transaction] Fix potentially unfinished CompletableFuture. Mar 30, 2022
@Technoboy- Technoboy- closed this Mar 30, 2022
@Technoboy- Technoboy- reopened this Mar 30, 2022
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving the execution out of the pinned executor.
I am not sure this is a good move
Can you please clarify?

@mattisonchao mattisonchao requested a review from eolivelli March 30, 2022 09:01
@mattisonchao mattisonchao changed the title [fix][transaction] Fix potentially unfinished CompletableFuture. [fix][transaction] Remove single thread pool and fix the concurrent problem to avoid performance issues Mar 31, 2022
@mattisonchao
Copy link
Member Author

@eolivelli I updated this PR, could you please review it again?

@mattisonchao mattisonchao marked this pull request as draft March 31, 2022 02:59
@mattisonchao mattisonchao marked this pull request as ready for review March 31, 2022 03:37
@mattisonchao mattisonchao reopened this Mar 31, 2022
@mattisonchao mattisonchao reopened this Apr 1, 2022
@mattisonchao mattisonchao reopened this Apr 1, 2022
@mattisonchao mattisonchao marked this pull request as draft April 1, 2022 13:54
@mattisonchao
Copy link
Member Author

I chose to use Ordered-Executor to ensure thread-safe. so, close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants