-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix compactor skips data from last compacted Ledger #12429
Merged
codelipenghui
merged 1 commit into
apache:master
from
codelipenghui:penghui/fix-loss-compacted-data
Oct 21, 2021
Merged
Fix compactor skips data from last compacted Ledger #12429
codelipenghui
merged 1 commit into
apache:master
from
codelipenghui:penghui/fix-loss-compacted-data
Oct 21, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The compaction task depends on the last snapshot and the incremental entries to build the new snapshot. So for the compaction cursor, we need to force seek the read position to ensure the compactor can read the complete last snapshot because of the compactor will read the data before the compaction cursor mark delete position.
codelipenghui
requested review from
eolivelli,
315157973,
hangc0276,
merlimat and
sijie
October 20, 2021 01:31
@codelipenghui:Thanks for your contribution. For this PR, do we need to update docs? |
codelipenghui
added
release/2.8.2
type/bug
The PR fixed a bug or issue reported a bug
doc-not-needed
Your PR changes do not impact docs
and removed
doc-label-missing
labels
Oct 20, 2021
merlimat
approved these changes
Oct 21, 2021
hangc0276
approved these changes
Oct 21, 2021
codelipenghui
added a commit
that referenced
this pull request
Oct 21, 2021
## Motivation The PR is fixing the compacted data lost during the data compaction. We see a few events deletion but the compacted events obviously dropped a lot. ![image](https://user-images.githubusercontent.com/12592133/138008777-00eb7c0b-358e-4291-bfd4-f4b27cbedbf4.png) After investigating more details about the issue, only the first read operation reads the data from the compacted ledger, since the second read operation, the broker start read data from the original topic. ``` 2021-10-19T23:09:30,021+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from compacted Ledger = cursor position: -1:-1, Horizon: 16:-1, isFirstRead: true 2021-10-19T23:09:30,049+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from original Ledger = cursor position: 16:0, Horizon: 16:-1, isFirstRead: false ``` ## Modifications The compaction task depends on the last snapshot and the incremental entries to build the new snapshot. So for the compaction cursor, we need to force seek the read position to ensure the compactor can read the complete last snapshot because the compactor will read the data before the compaction cursor mark delete position. ## Verifying this change New test added for checking the compacted data will not lost. (cherry picked from commit 1830f90)
codelipenghui
added a commit
to codelipenghui/incubator-pulsar
that referenced
this pull request
Oct 22, 2021
…l reader/consumer also skips data while enabled read compacted data and read from the earliest position.
merlimat
pushed a commit
that referenced
this pull request
Oct 24, 2021
…er/consumer (#12464) also skips data while enabled read compacted data and read from the earliest position.
zeo1995
pushed a commit
to zeo1995/pulsar
that referenced
this pull request
Oct 25, 2021
* up/master: (46 commits) [website][upgrade]feat: docs migration - version-2.7.2 Pulsar Schema (apache#12393) [docs] io-develop, fix broken link (apache#12414) docs(function): fix incorrect classname in python runtime sample (apache#12476) Remove redundant null check for getInternalListener (apache#12474) Fix the retry topic's `REAL_TOPIC` & `ORIGIN_MESSAGE_ID` property should not be modified once it has been written. (apache#12451) [cli] Fix output format of string by pulsar-admin command (apache#11878) fix the race of delete subscription and delete topic (apache#12240) fix influxdb yaml doc (apache#12460) [Modernizer] Add Maven Modernizer plugin in pulsar-proxy module (apache#12326) fix DefaultCryptoKeyReaderTest can not run on windows (apache#12475) apache#12429 only fixed the compactor skips data issue, but the normal reader/consumer (apache#12464) broker resource group test optimize fail msg (apache#12438) Stop OffsetStore when stopping the connector (apache#12457) fix a typo in UnAckedMessageTracker (apache#12467) docs(function): fix typo in pip install (apache#12468) Optimize the code: remove extra spaces (apache#12470) optimize SecurityUtility code flow (apache#12431) Update lombok to 1.18.22 (apache#12466) Update team.js to add David K. as a committer (apache#12440) Fix java demo error in reset cursor admin (apache#12454) ... # Conflicts: # site2/website-next/versioned_docs/version-2.7.2/schema-evolution-compatibility.md # site2/website-next/versioned_docs/version-2.7.2/schema-get-started.md # site2/website-next/versioned_docs/version-2.7.2/schema-manage.md # site2/website-next/versioned_docs/version-2.7.2/schema-understand.md # site2/website-next/versioned_sidebars/version-2.7.2-sidebars.json
eolivelli
pushed a commit
to eolivelli/pulsar
that referenced
this pull request
Nov 29, 2021
## Motivation The PR is fixing the compacted data lost during the data compaction. We see a few events deletion but the compacted events obviously dropped a lot. ![image](https://user-images.githubusercontent.com/12592133/138008777-00eb7c0b-358e-4291-bfd4-f4b27cbedbf4.png) After investigating more details about the issue, only the first read operation reads the data from the compacted ledger, since the second read operation, the broker start read data from the original topic. ``` 2021-10-19T23:09:30,021+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from compacted Ledger = cursor position: -1:-1, Horizon: 16:-1, isFirstRead: true 2021-10-19T23:09:30,049+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from original Ledger = cursor position: 16:0, Horizon: 16:-1, isFirstRead: false ``` ## Modifications The compaction task depends on the last snapshot and the incremental entries to build the new snapshot. So for the compaction cursor, we need to force seek the read position to ensure the compactor can read the complete last snapshot because the compactor will read the data before the compaction cursor mark delete position. ## Verifying this change New test added for checking the compacted data will not lost.
eolivelli
pushed a commit
to eolivelli/pulsar
that referenced
this pull request
Nov 29, 2021
…l reader/consumer (apache#12464) also skips data while enabled read compacted data and read from the earliest position.
codelipenghui
added a commit
that referenced
this pull request
Dec 20, 2021
## Motivation The PR is fixing the compacted data lost during the data compaction. We see a few events deletion but the compacted events obviously dropped a lot. ![image](https://user-images.githubusercontent.com/12592133/138008777-00eb7c0b-358e-4291-bfd4-f4b27cbedbf4.png) After investigating more details about the issue, only the first read operation reads the data from the compacted ledger, since the second read operation, the broker start read data from the original topic. ``` 2021-10-19T23:09:30,021+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from compacted Ledger = cursor position: -1:-1, Horizon: 16:-1, isFirstRead: true 2021-10-19T23:09:30,049+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from original Ledger = cursor position: 16:0, Horizon: 16:-1, isFirstRead: false ``` ## Modifications The compaction task depends on the last snapshot and the incremental entries to build the new snapshot. So for the compaction cursor, we need to force seek the read position to ensure the compactor can read the complete last snapshot because the compactor will read the data before the compaction cursor mark delete position. ## Verifying this change New test added for checking the compacted data will not lost. (cherry picked from commit 1830f90)
eolivelli
pushed a commit
to eolivelli/pulsar
that referenced
this pull request
Feb 25, 2022
The PR is fixing the compacted data lost during the data compaction. We see a few events deletion but the compacted events obviously dropped a lot. ![image](https://user-images.githubusercontent.com/12592133/138008777-00eb7c0b-358e-4291-bfd4-f4b27cbedbf4.png) After investigating more details about the issue, only the first read operation reads the data from the compacted ledger, since the second read operation, the broker start read data from the original topic. ``` 2021-10-19T23:09:30,021+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from compacted Ledger = cursor position: -1:-1, Horizon: 16:-1, isFirstRead: true 2021-10-19T23:09:30,049+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from original Ledger = cursor position: 16:0, Horizon: 16:-1, isFirstRead: false ``` The compaction task depends on the last snapshot and the incremental entries to build the new snapshot. So for the compaction cursor, we need to force seek the read position to ensure the compactor can read the complete last snapshot because the compactor will read the data before the compaction cursor mark delete position. New test added for checking the compacted data will not lost. (cherry picked from commit 1830f90) (cherry picked from commit 1fbc7ed)
eolivelli
pushed a commit
to eolivelli/pulsar
that referenced
this pull request
Feb 25, 2022
…l reader/consumer (apache#12464) also skips data while enabled read compacted data and read from the earliest position. (cherry picked from commit dd90657) (cherry picked from commit 39ee36c)
Technoboy-
pushed a commit
to Technoboy-/pulsar
that referenced
this pull request
Jun 30, 2022
The PR is fixing the compacted data lost during the data compaction. We see a few events deletion but the compacted events obviously dropped a lot. ![image](https://user-images.githubusercontent.com/12592133/138008777-00eb7c0b-358e-4291-bfd4-f4b27cbedbf4.png) After investigating more details about the issue, only the first read operation reads the data from the compacted ledger, since the second read operation, the broker start read data from the original topic. ``` 2021-10-19T23:09:30,021+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from compacted Ledger = cursor position: -1:-1, Horizon: 16:-1, isFirstRead: true 2021-10-19T23:09:30,049+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from original Ledger = cursor position: 16:0, Horizon: 16:-1, isFirstRead: false ``` The compaction task depends on the last snapshot and the incremental entries to build the new snapshot. So for the compaction cursor, we need to force seek the read position to ensure the compactor can read the complete last snapshot because the compactor will read the data before the compaction cursor mark delete position. New test added for checking the compacted data will not lost.
Technoboy-
pushed a commit
to Technoboy-/pulsar
that referenced
this pull request
Jul 5, 2022
The PR is fixing the compacted data lost during the data compaction. We see a few events deletion but the compacted events obviously dropped a lot. ![image](https://user-images.githubusercontent.com/12592133/138008777-00eb7c0b-358e-4291-bfd4-f4b27cbedbf4.png) After investigating more details about the issue, only the first read operation reads the data from the compacted ledger, since the second read operation, the broker start read data from the original topic. ``` 2021-10-19T23:09:30,021+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from compacted Ledger = cursor position: -1:-1, Horizon: 16:-1, isFirstRead: true 2021-10-19T23:09:30,049+0800 [broker-topic-workers-OrderedScheduler-7-0] INFO org.apache.pulsar.compaction.CompactedTopicImpl - =====[public/default/persistent/c499d42c-75d7-48d1-9225-2e724c0e1d83] Read from original Ledger = cursor position: 16:0, Horizon: 16:-1, isFirstRead: false ``` The compaction task depends on the last snapshot and the incremental entries to build the new snapshot. So for the compaction cursor, we need to force seek the read position to ensure the compactor can read the complete last snapshot because the compactor will read the data before the compaction cursor mark delete position. New test added for checking the compacted data will not lost.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/compaction
cherry-picked/branch-2.7
Archived: 2.7 is end of life
cherry-picked/branch-2.8
Archived: 2.8 is end of life
cherry-picked/branch-2.9
Archived: 2.9 is end of life
doc-not-needed
Your PR changes do not impact docs
release/2.7.5
release/2.8.2
release/2.9.1
type/bug
The PR fixed a bug or issue reported a bug
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
The PR is fixing the compacted data lost during the data compaction.
We see a few events deletion but the compacted events obviously dropped a lot.
After investigating more details about the issue, only the first read operation reads the data from
the compacted ledger, since the second read operation, the broker start read data from the original
topic.
Modifications
The compaction task depends on the last snapshot and the incremental
entries to build the new snapshot. So for the compaction cursor, we
need to force seek the read position to ensure the compactor can read
the complete last snapshot because the compactor will read the data
before the compaction cursor mark delete position.
Verifying this change
New test added for checking the compacted data will not lost.