Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scavenge: Handle events that have been deleted from (now empty) chunks but not from the index #3813

Merged
merged 1 commit into from
Apr 21, 2023

Conversation

shaan1337
Copy link
Member

@shaan1337 shaan1337 commented Apr 12, 2023

Fixed: Handle events that have been deleted from (now empty) chunks but not from the index

Previously this would result in the scavenge stopping with an error like "Could not get TimeStamp range for chunk "

But this is a possible state if the old scavenger had scavenged the chunks but not the index. This fix correctly handles this scenario so that the scavenge process can continue

@shaan1337 shaan1337 force-pushed the new-scavenger-fix branch 2 times, most recently from caab6f9 to 9e1b740 Compare April 13, 2023 11:06
@shaan1337 shaan1337 marked this pull request as ready for review April 13, 2023 11:15
@hayley-jean hayley-jean self-requested a review April 13, 2023 11:19
@shaan1337 shaan1337 marked this pull request as draft April 14, 2023 05:34
@shaan1337
Copy link
Member Author

please hold on the review - i'm improving the test cases

@shaan1337 shaan1337 force-pushed the new-scavenger-fix branch 4 times, most recently from 6ca83d0 to 31a1fef Compare April 17, 2023 09:01
@shaan1337 shaan1337 marked this pull request as ready for review April 17, 2023 09:42
@timothycoleman
Copy link
Contributor

great, nice work 👍 squash the changes and i'll approve

@timothycoleman
Copy link
Contributor

timothycoleman commented Apr 18, 2023

for posterity, shaan and i identified that the scenario where the if (allDiscardedSoFar) check returns false is quite narrow:

  1. we have an event E in the index that is not in the chunk
  2. the logical chunk that it would have been in, in fact, has no prepares left in it at all
  3. there is an earlier event for the same stream, the stream has a maxage, and we MaybeDiscard it in the calculator
  4. but we do, in fact, want to keep that previous event D based on the current metadata (which means that E was removed in error or because of an earlier metadata)

the net effect is that in this scenario, if the allDiscardedSoFar check was not present, D would be removed, perhaps to the surprise of the user.

note that a stream in this state is hard to read. in order to read D the read must not also try to include E

if it wasn't for this corner case, then the allDiscardedSoFar variable and the AlreadyDiscarded discard decision would be unnecessary, we could just return Discard (although it would unnecessarily add weight to that chunk)

…from the index

Previously this would result in the scavenge stopping with an error like "Could not get TimeStamp range for chunk <chunk number>"
But this is a possible state if the old scavenge had scavenged the chunks but not the index.
This fix correctly handles this scenario so that the scavenge process can continue
Copy link
Contributor

@timothycoleman timothycoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@shaan1337
Copy link
Member Author

great, nice work +1 squash the changes and i'll approve

thanks for the thorough review!

@timothycoleman timothycoleman changed the title Handle events that have been deleted from (now empty) chunks but not from the index Scavenge: Handle events that have been deleted from (now empty) chunks but not from the index Apr 19, 2023
@timothycoleman
Copy link
Contributor

great, nice work +1 squash the changes and i'll approve

thanks for the thorough review!

thanks for the careful code!

@hayley-jean hayley-jean merged commit d7d8ea5 into master Apr 21, 2023
9 checks passed
@hayley-jean hayley-jean deleted the new-scavenger-fix branch April 21, 2023 13:07
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hayley-jean 👉 Created pull request targeting release/oss-v22.10: #3821

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants