-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Issue #1791: Read Submission should bypass OSE Threads" #3106
Conversation
This reverts commit 6b99ff7.
@diegosalvi this change is also important for HerdDB and DistributedLog |
There's #3105 as a possible solution for mitigating the performance issue which was the original motivation for the PR which is reverted by this PR. |
@eolivelli interesting, thank you. We'll test disabling netty recycler as stated in apache/pulsar#14436 for now |
@Ghatage FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@congbobo184 @lhotari As I understand, there is no confirmation that reverting this PR actually fixes the problem reported in Pulsar.
If this problem is related to calling netty's recycle from another thread (which netty had and fixed) I'd love to see something that actually proves the theory so netty can fix it.
I don't see anything in PendingReadOp that requires initiate() and readEntryComplete() to run on the same thread for the read-only ledger. I can be wrong, please help me understand what I am missing. readEntryComplete() still runs on the OSE.
What you see in the log mentioned in the comment might happen if:
- request took longer than speculative retry time
- speculative read got submitted
- both original request and the speculative retry succeeded
It is possible that handling of this case somehow corrupts recyclable entry though i don't see yet where it could happen (callbacks should be handled on the same OSE thread etc)
I need more proof to confirm this actually fixes the problem and not just reverts some PR that touches code you are suspecting.
+1 on @dlg99 's comment. I could not understand where exactly the problem and how it could get addressed by making create and recycle run in the same thread. You could always have outstanding read responses after erroring out. But even if that happens, what exactly the error/corruption that is visible to the client? |
#3110 will fix it |
@jvrao The issue description is #3104 and there are references to a Pulsar issue with a lot of troubleshooting comments. @congbobo184 has also summarized the behavior in the PR #3110 description. |
Closing for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reverts commit 6b99ff7.
Descriptions of the changes in this PR:
Revert #1792 "Issue #1791: Read Submission should bypass OSE Threads"
Motivation
See #3104