Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ldtdeser trunk CASSANDRA-18648 #2464

Closed
wants to merge 2 commits into from
Closed

Conversation

bereng
Copy link
Contributor

@bereng bereng commented Jul 5, 2023

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

@bereng
Copy link
Contributor Author

bereng commented Jul 5, 2023

CI j8: All green except for 2 known flakies
CI j11: All green except for a known flaky. The upgrade failures are aligned to a trunk reference run I triggered as baseline

@bereng
Copy link
Contributor Author

bereng commented Jul 14, 2023

Note to self. Should add CHANGES.txt entry and notes on upgrades

@pkolaczk
Copy link
Contributor

pkolaczk commented Jul 17, 2023

What are the situations when we need both MFDA and LDT at the same time?
I feel that we need both concepts, but not at the same time.

When the cell is live, we don't know when it will be deleted, so why store LDT?
When the cell is dead, we don't need to store MFDA anymore because it was logically deleted already.
So if they are a proper union, then we can save up even more bytes.

What am I missing?

@jacek-lewandowski
Copy link
Contributor

jacek-lewandowski commented Jul 17, 2023

@pkolaczk

When the cell is live, we don't know when it will be deleted, so why store LDT?

we don't - we store only a flag that the cell is live

When the cell is dead, we don't need to store MFDA anymore because it was logically deleted already

I need to check, but if deletion clears MFDA, the second flag comes into play; otherwise maybe we should clear MFDA along with deletion

@bereng
Copy link
Contributor Author

bereng commented Jul 18, 2023

I don't think @pkolaczk is asking that but:

  • If a cell has been deleted why store ldt? mfda should be enough
  • if a cell has become dead why store the mdfa? ldt should be enough

I didn't know yesterday but then I remembered gc_seconds is a configurable parameter. It can change per table and through time so it needs to be recorded somehow. If that were the case we could store ldt as a delta in seconds from mfda and lift the 2106 limit... 🤔

@jacek-lewandowski
Copy link
Contributor

right, my comment was off

@bereng
Copy link
Contributor Author

bereng commented Jul 19, 2023

@pkolaczk I think the question you raised is legit. But it touches on one of the corner stones of the codebase. I am not comfortable touching that so close to a release. I think the change is big enough it deserves it's own ticket. Are you ok we address that separately?

@pkolaczk
Copy link
Contributor

pkolaczk commented Jul 19, 2023

Are you ok we address that separately?

@bereng sure, no problem. For the most common case - live cells the timestamps aren't stored at all so this is already a very good improvement.

@blambov
Copy link
Contributor

blambov commented Jul 20, 2023

AFAIK both local and markedFor times are needed and serve different purposes. They should, however, be close to each other normally (i.e. local should be ~markedFor/1000). How about, as an additional space saving measure, storing local - markedFor / 1000 as a signed vint instead of the four-byte int for local? (Note: the vint may become up to 6 bytes in odd cases, but that's something I'd happily trade for having 1 most of the time.)

I do not insist on this, saving the bytes for the LIVE encoding is a sufficiently good improvement.

@bereng
Copy link
Contributor Author

bereng commented Jul 21, 2023

@blambov thx for the suggestion. Yes a Vint to mfda, to some synthetic sstable epoch, storing gc_grace_sedonds instead etc. Are all alternatives I want to look at. But first I want to javadoc the whole OA sstable format for everyone's benefit and to see if there are more opportunities around as I don't know this code.

@bereng
Copy link
Contributor Author

bereng commented Jul 21, 2023

I have rebased and pushed a version whose only change is that we only use the sign bit to encode LIVE DTs. That allows us to support the full Long range and imposes no restrictions on the users. We can revisit this and use the full byte when we have a justification in the future, have a better opinion on how to improve the ldt serialization, etc.

Ci is green with only known flakies.
Jmh numbers still look very good.

Copy link
Contributor

@blambov blambov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

You will probably need to regenerate the legacy files after rebasing because of a new flag from CASSANDRA-18397.

@bereng
Copy link
Contributor Author

bereng commented Jul 24, 2023

@jacek-lewandowski @pkolaczk this is ready for (hopefully) final review. Then I'll rebase and re-run CI.

Copy link
Contributor

@jacek-lewandowski jacek-lewandowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@bereng
Copy link
Contributor Author

bereng commented Jul 25, 2023

Rebased to trunk. CI with only the expected failures after the jdk8 drop.

The resolution of 18676 will inform how to solve the corruption detection.

@bereng bereng force-pushed the ldtdeser-trunk branch 2 times, most recently from a1d4848 to 74ce260 Compare July 26, 2023 05:09
@bereng
Copy link
Contributor Author

bereng commented Jul 26, 2023

Added sstable corruption detection and CI is green

@bereng bereng force-pushed the ldtdeser-trunk branch 2 times, most recently from 568c28a to 73e680b Compare July 26, 2023 06:42
@bereng
Copy link
Contributor Author

bereng commented Jul 26, 2023

Switched to IOException and all is green. Failures are the j17 known ones.

@blambov
Copy link
Contributor

blambov commented Jul 26, 2023

LGTM

src/java/org/apache/cassandra/db/DeletionTime.java Outdated Show resolved Hide resolved
src/java/org/apache/cassandra/db/DeletionTime.java Outdated Show resolved Hide resolved
@bereng
Copy link
Contributor Author

bereng commented Jul 26, 2023

Squashed before commit.

@bereng
Copy link
Contributor Author

bereng commented Jul 26, 2023

Note on reviewers: @jakubzytka is OOO currently, knowing his only concern has been cleared, knowing him personally and after having checked with PMC I have been allowed to merge.

@bereng bereng closed this Jul 26, 2023
@bereng bereng deleted the ldtdeser-trunk branch July 26, 2023 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants