Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply the backpressure changes on the V2 requests #3324

Merged
merged 5 commits into from
Jun 16, 2022

Conversation

zymap
Copy link
Member

@zymap zymap commented Jun 9, 2022


Motivation

If one bookie is slow (not down, just slow), the BK client
will the acks to the user that the entries are written after
the first 2 acks. In the meantime, it will keep waiting for
the 3rd bookie to respond. If the bookie responds within the
timeout, the entries can now be dropped from memory, otherwise
the write will timeout internally and it will get replayed
to a new bookie.

In the V3 request, we have server-side backpressure
to impact the client-side behaviors. We should apply the same
changes to the V2 request. That would help this issue
to be resolved.

Modification

Descriptions of the changes in this PR:

---

*Motivation*

If one bookie is slow (not down, just slow), the BK client
will the acks to the user that the entries are written after
the first 2 acks. In the meantime, it will keep waiting for
the 3rd bookie to respond. If the bookie responds within the
timeout, the entries can now be dropped from memory, otherwise
the write will timeout internally and it will get replayed
to a new bookie.

In the V3 request, we have [server-side backpressure](apache#1410)
to impact the client-side behaviors. We should apply the same
changes to the V2 request. That would help this [issue](apache/pulsar#14861)
to be resolved.

*Modification*

- Apply the change apache#1410 to V2 protocol
@zymap zymap added this to the 4.16.0 milestone Jun 9, 2022
@zymap zymap self-assigned this Jun 9, 2022
@zymap
Copy link
Member Author

zymap commented Jun 9, 2022

As we mentioned in this issue, broker is easy to OOM when there has one bookie slow.
After this change and applying the client-side configuration waitTimeoutOnBackpressureMs=50, the OOM issue looks resolved.

Broker Direct Memory:
image

Bookie Add latency:
image

@hangc0276
Copy link
Contributor

rerun failure checks

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@hangc0276
Copy link
Contributor

rerun failure checks

@hangc0276
Copy link
Contributor

rerun failure checks

1 similar comment
@hangc0276
Copy link
Contributor

rerun failure checks

@zymap
Copy link
Member Author

zymap commented Jun 15, 2022

@eolivelli @merlimat Please help to take a review when you have time.

Once the backpressure applies to the v2 requests, this will help to control the client's memory usage. Also can help to resolve the issue I mentioned in this issue

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dlg99 PTAL

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

@shoothzj shoothzj merged commit 62400bd into apache:master Jun 16, 2022
zymap added a commit that referenced this pull request Jun 16, 2022
---

*Motivation*

If one bookie is slow (not down, just slow), the BK client
will the acks to the user that the entries are written after
the first 2 acks. In the meantime, it will keep waiting for
the 3rd bookie to respond. If the bookie responds within the
timeout, the entries can now be dropped from memory, otherwise
the write will timeout internally and it will get replayed
to a new bookie.

In the V3 request, we have [server-side backpressure](#1410)
to impact the client-side behaviors. We should apply the same
changes to the V2 request. That would help this [issue](apache/pulsar#14861)
to be resolved.

*Modification*

- Apply the change #1410 to V2 protocol

Descriptions of the changes in this PR:

(cherry picked from commit 62400bd)
zymap added a commit that referenced this pull request Jun 16, 2022
---

*Motivation*

If one bookie is slow (not down, just slow), the BK client
will the acks to the user that the entries are written after
the first 2 acks. In the meantime, it will keep waiting for
the 3rd bookie to respond. If the bookie responds within the
timeout, the entries can now be dropped from memory, otherwise
the write will timeout internally and it will get replayed
to a new bookie.

In the V3 request, we have [server-side backpressure](#1410)
to impact the client-side behaviors. We should apply the same
changes to the V2 request. That would help this [issue](apache/pulsar#14861)
to be resolved.

*Modification*

- Apply the change #1410 to V2 protocol

Descriptions of the changes in this PR:

(cherry picked from commit 62400bd)
@zymap
Copy link
Member Author

zymap commented Jun 16, 2022

Thank you!

I have cherry-picked this PR to branch-4.14 and branch-4.15

eolivelli pushed a commit to datastax/bookkeeper that referenced this pull request Jul 27, 2022
---

*Motivation*

If one bookie is slow (not down, just slow), the BK client
will the acks to the user that the entries are written after
the first 2 acks. In the meantime, it will keep waiting for
the 3rd bookie to respond. If the bookie responds within the
timeout, the entries can now be dropped from memory, otherwise
the write will timeout internally and it will get replayed
to a new bookie.

In the V3 request, we have [server-side backpressure](apache#1410)
to impact the client-side behaviors. We should apply the same
changes to the V2 request. That would help this [issue](apache/pulsar#14861)
to be resolved.

*Modification*

- Apply the change apache#1410 to V2 protocol

Descriptions of the changes in this PR:

(cherry picked from commit 62400bd)
(cherry picked from commit b025f7b)
lhotari pushed a commit to datastax/bookkeeper that referenced this pull request Aug 9, 2022
---

*Motivation*

If one bookie is slow (not down, just slow), the BK client
will the acks to the user that the entries are written after
the first 2 acks. In the meantime, it will keep waiting for
the 3rd bookie to respond. If the bookie responds within the
timeout, the entries can now be dropped from memory, otherwise
the write will timeout internally and it will get replayed
to a new bookie.

In the V3 request, we have [server-side backpressure](apache#1410)
to impact the client-side behaviors. We should apply the same
changes to the V2 request. That would help this [issue](apache/pulsar#14861)
to be resolved.

*Modification*

- Apply the change apache#1410 to V2 protocol

Descriptions of the changes in this PR:

(cherry picked from commit 62400bd)
(cherry picked from commit b025f7b)
lhotari added a commit to lhotari/bookkeeper that referenced this pull request Aug 9, 2022
- cherry-picking apache#3324 requires this backport
lhotari added a commit to datastax/bookkeeper that referenced this pull request Aug 9, 2022
- cherry-picking apache#3324 requires this backport
@lhotari
Copy link
Member

lhotari commented Aug 9, 2022

@zymap It looks like cherry-picking to branch-4.14 requires this change:
#3443 .

lhotari added a commit to lhotari/bookkeeper that referenced this pull request Aug 9, 2022
- cherry-picking apache#3324 requires this backport
lhotari added a commit to datastax/bookkeeper that referenced this pull request Aug 9, 2022
- cherry-picking apache#3324 requires this backport
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
---

*Motivation*

If one bookie is slow (not down, just slow), the BK client
will the acks to the user that the entries are written after
the first 2 acks. In the meantime, it will keep waiting for
the 3rd bookie to respond. If the bookie responds within the
timeout, the entries can now be dropped from memory, otherwise
the write will timeout internally and it will get replayed
to a new bookie.

In the V3 request, we have [server-side backpressure](apache#1410)
to impact the client-side behaviors. We should apply the same
changes to the V2 request. That would help this [issue](apache/pulsar#14861)
to be resolved.

*Modification*

- Apply the change apache#1410 to V2 protocol

Descriptions of the changes in this PR:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants