Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry to tolerate the offload index file read failure #12452

Merged
merged 2 commits into from
Oct 22, 2021

Conversation

zymap
Copy link
Member

@zymap zymap commented Oct 21, 2021


Motivation

We met the ReadLedgerMetadata exception when reading the index
file. The index file only read once, so it may not read all the
data from the stream and cause the metadata read failed. We need
to ensure the all data is read from the stream or the stream is
end. When the stream is end, we will receive the EOF exception,
so we need to use readFully not read.

Add the retry logic to tolerate the failure cause by the network.
Because the stream is from the HTTP, so it's may break on some
case. Add a small retry to avoid it to backoff by the dispatcher.

Modifications

  • Use readFully to replace the read method
  • Add a small retry for handling the index block build

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)
  • The rest endpoints: (yes / no)
  • The admin cli options: (yes / no)
  • Anything that affects deployment: (yes / no / don't know)

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

  • doc-required

    (If you need help on updating docs, create a doc issue)

  • no-need-doc

    (Please explain why)

  • doc

    (If this PR contains doc changes)

---

*Motivation*

We met the ReadLedgerMetadata exception when reading the index
file. The index file only read once, so it may not read all the
data from the stream and cause the metadata read failed. We need
to ensure the all data is read from the stream or the stream is
end. When the stream is end, we will receive the EOF exception,
so we need to use `readFully` not `read`.

Add the retry logic to tolerate the failure cause by the network.
Because the stream is from the HTTP, so it's may break on some
case. Add a small retry to avoid it to backoff by the dispatcher.

*Modifications*

- Use `readFully` to replace the `read` method
- Add a small retry for handling the index block build
@zymap zymap added type/bug The PR fixed a bug or issue reported a bug area/tieredstorage doc-not-needed Your PR changes do not impact docs release/2.8.2 release/2.9.1 labels Oct 21, 2021
@zymap zymap added this to the 2.10.0 milestone Oct 21, 2021
@zymap zymap self-assigned this Oct 21, 2021
OffloadIndexBlock index;
try (InputStream payLoadStream = blob.getPayload().openStream()) {
index = (OffloadIndexBlock) indexBuilder.fromStream(payLoadStream);
int retryCount = 3;
Copy link
Member Author

@zymap zymap Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This retry is for a quick retry, most of the time it shouldn't trigger. We just don't want to exit to wait for the next backoff.

@codelipenghui codelipenghui merged commit 33bcc17 into apache:master Oct 22, 2021
zymap added a commit that referenced this pull request Oct 22, 2021
* Add retry to tolerate the offload index file read failure
---

*Motivation*

We met the ReadLedgerMetadata exception when reading the index
file. The index file only read once, so it may not read all the
data from the stream and cause the metadata read failed. We need
to ensure the all data is read from the stream or the stream is
end. When the stream is end, we will receive the EOF exception,
so we need to use `readFully` not `read`.

Add the retry logic to tolerate the failure cause by the network.
Because the stream is from the HTTP, so it's may break on some
case. Add a small retry to avoid it to backoff by the dispatcher.

*Modifications*

- Use `readFully` to replace the `read` method
- Add a small retry for handling the index block build

* Add comments and enrich log

(cherry picked from commit 33bcc17)
@zymap zymap added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Oct 22, 2021
eolivelli pushed a commit to eolivelli/pulsar that referenced this pull request Nov 29, 2021
* Add retry to tolerate the offload index file read failure
---

*Motivation*

We met the ReadLedgerMetadata exception when reading the index
file. The index file only read once, so it may not read all the
data from the stream and cause the metadata read failed. We need
to ensure the all data is read from the stream or the stream is
end. When the stream is end, we will receive the EOF exception,
so we need to use `readFully` not `read`.

Add the retry logic to tolerate the failure cause by the network.
Because the stream is from the HTTP, so it's may break on some
case. Add a small retry to avoid it to backoff by the dispatcher.

*Modifications*

- Use `readFully` to replace the `read` method
- Add a small retry for handling the index block build

* Add comments and enrich log
codelipenghui pushed a commit that referenced this pull request Dec 20, 2021
* Add retry to tolerate the offload index file read failure
---

*Motivation*

We met the ReadLedgerMetadata exception when reading the index
file. The index file only read once, so it may not read all the
data from the stream and cause the metadata read failed. We need
to ensure the all data is read from the stream or the stream is
end. When the stream is end, we will receive the EOF exception,
so we need to use `readFully` not `read`.

Add the retry logic to tolerate the failure cause by the network.
Because the stream is from the HTTP, so it's may break on some
case. Add a small retry to avoid it to backoff by the dispatcher.

*Modifications*

- Use `readFully` to replace the `read` method
- Add a small retry for handling the index block build

* Add comments and enrich log

(cherry picked from commit 33bcc17)
@codelipenghui codelipenghui added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Dec 20, 2021
lhotari pushed a commit to datastax/pulsar that referenced this pull request Apr 11, 2022
* Add retry to tolerate the offload index file read failure
---

*Motivation*

We met the ReadLedgerMetadata exception when reading the index
file. The index file only read once, so it may not read all the
data from the stream and cause the metadata read failed. We need
to ensure the all data is read from the stream or the stream is
end. When the stream is end, we will receive the EOF exception,
so we need to use `readFully` not `read`.

Add the retry logic to tolerate the failure cause by the network.
Because the stream is from the HTTP, so it's may break on some
case. Add a small retry to avoid it to backoff by the dispatcher.

*Modifications*

- Use `readFully` to replace the `read` method
- Add a small retry for handling the index block build

* Add comments and enrich log

(cherry picked from commit 33bcc17)
lhotari pushed a commit to datastax/pulsar that referenced this pull request Apr 11, 2022
* Add retry to tolerate the offload index file read failure
---

*Motivation*

We met the ReadLedgerMetadata exception when reading the index
file. The index file only read once, so it may not read all the
data from the stream and cause the metadata read failed. We need
to ensure the all data is read from the stream or the stream is
end. When the stream is end, we will receive the EOF exception,
so we need to use `readFully` not `read`.

Add the retry logic to tolerate the failure cause by the network.
Because the stream is from the HTTP, so it's may break on some
case. Add a small retry to avoid it to backoff by the dispatcher.

*Modifications*

- Use `readFully` to replace the `read` method
- Add a small retry for handling the index block build

* Add comments and enrich log

(cherry picked from commit 33bcc17)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tieredstorage cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.8.2 release/2.9.2 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants