Revert "Issue #1791: Read Submission should bypass OSE Threads" #3106

eolivelli · 2022-03-14T07:49:03Z

This reverts commit 6b99ff7.

Descriptions of the changes in this PR:
Revert #1792 "Issue #1791: Read Submission should bypass OSE Threads"

Motivation

See #3104

This reverts commit 6b99ff7.

eolivelli · 2022-03-14T07:52:13Z

@diegosalvi this change is also important for HerdDB and DistributedLog

lhotari · 2022-03-14T07:52:57Z

There's #3105 as a possible solution for mitigating the performance issue which was the original motivation for the PR which is reverted by this PR.

diegosalvi · 2022-03-14T08:50:58Z

@eolivelli interesting, thank you. We'll test disabling netty recycler as stated in apache/pulsar#14436 for now

dlg99 · 2022-03-14T15:08:50Z

@Ghatage FYI

dlg99

@congbobo184 @lhotari As I understand, there is no confirmation that reverting this PR actually fixes the problem reported in Pulsar.

If this problem is related to calling netty's recycle from another thread (which netty had and fixed) I'd love to see something that actually proves the theory so netty can fix it.

I don't see anything in PendingReadOp that requires initiate() and readEntryComplete() to run on the same thread for the read-only ledger. I can be wrong, please help me understand what I am missing. readEntryComplete() still runs on the OSE.

What you see in the log mentioned in the comment might happen if:

request took longer than speculative retry time
speculative read got submitted
both original request and the speculative retry succeeded

It is possible that handling of this case somehow corrupts recyclable entry though i don't see yet where it could happen (callbacks should be handled on the same OSE thread etc)

I need more proof to confirm this actually fixes the problem and not just reverts some PR that touches code you are suspecting.

jvrao · 2022-03-14T21:31:51Z

+1 on @dlg99 's comment. I could not understand where exactly the problem and how it could get addressed by making create and recycle run in the same thread. You could always have outstanding read responses after erroring out. But even if that happens, what exactly the error/corruption that is visible to the client?

congbobo184 · 2022-03-15T06:25:43Z

#3110 will fix it

lhotari · 2022-03-15T06:27:38Z

You could always have outstanding read responses after erroring out. But even if that happens, what exactly the error/corruption that is visible to the client?

@jvrao The issue description is #3104 and there are references to a Pulsar issue with a lot of troubleshooting comments. @congbobo184 has also summarized the behavior in the PR #3110 description.

eolivelli · 2022-03-15T06:50:05Z

Closing for now

hangc0276

The root cause of #3104 is not caused by the thread safety problem. And the isolation thread pool for cold and hot data reading will greatly reduce thread switching overhead and make the throughput more stable.

This issue has been fixed by #3110

Revert "Issue apache#1791: Read Submission should bypass OSE Threads"

34344d2

This reverts commit 6b99ff7.

eolivelli self-assigned this Mar 14, 2022

eolivelli added the area/client label Mar 14, 2022

eolivelli added this to the 4.15.0 milestone Mar 14, 2022

eolivelli mentioned this pull request Mar 14, 2022

Recycled LedgerEntryImpl instances are corrupted due to invalid recycling in BK client #3104

Closed

eolivelli requested review from dlg99, jvrao, merlimat and reddycharan March 14, 2022 07:51

lhotari approved these changes Mar 14, 2022

View reviewed changes

nicoloboschi approved these changes Mar 14, 2022

View reviewed changes

dlg99 requested changes Mar 14, 2022

View reviewed changes

dlg99 mentioned this pull request Mar 14, 2022

[Proto] java.lang.IllegalStateException: Some required fields are missing apache/pulsar#14436

Closed

eolivelli closed this Mar 15, 2022

hangc0276 requested changes Mar 15, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Issue #1791: Read Submission should bypass OSE Threads" #3106

Revert "Issue #1791: Read Submission should bypass OSE Threads" #3106

eolivelli commented Mar 14, 2022

eolivelli commented Mar 14, 2022

lhotari commented Mar 14, 2022

diegosalvi commented Mar 14, 2022

dlg99 commented Mar 14, 2022

dlg99 left a comment

jvrao commented Mar 14, 2022

congbobo184 commented Mar 15, 2022

lhotari commented Mar 15, 2022

eolivelli commented Mar 15, 2022

hangc0276 left a comment

Revert "Issue #1791: Read Submission should bypass OSE Threads" #3106

Revert "Issue #1791: Read Submission should bypass OSE Threads" #3106

Conversation

eolivelli commented Mar 14, 2022

Motivation

eolivelli commented Mar 14, 2022

lhotari commented Mar 14, 2022

diegosalvi commented Mar 14, 2022

dlg99 commented Mar 14, 2022

dlg99 left a comment

Choose a reason for hiding this comment

jvrao commented Mar 14, 2022

congbobo184 commented Mar 15, 2022

lhotari commented Mar 15, 2022

eolivelli commented Mar 15, 2022

hangc0276 left a comment

Choose a reason for hiding this comment