New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

KAFKA-14491: [3/N] Add logical key value segments #13143

Merged

mjsax merged 5 commits into apache:trunk from vcrfxia:kip-889-logical-segments

Feb 4, 2023

Collaborator

vcrfxia commented Jan 21, 2023 •

edited

Today's KeyValueSegments create a new RocksDB instance for each KeyValueSegment. This PR introduces an analogous LogicalKeyValueSegments implementation, with corresponding LogicalKeyValueSegment, which shares a single physical RocksDB instance across all "logical" segments. This will be used for the RocksDB versioned store implementation proposed in KIP-889.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

vcrfxia mentioned this pull request

KAFKA-14491: [2/N] Refactor RocksDB store open iterator management #13142

Merged

3 tasks

mjsax added streams kip labels

mjsax reviewed

View reviewed changes

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated Show resolved Hide resolved

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated

    
                      }

                      open = false;

                      closeOpenIterators();

Member

mjsax Jan 24, 2023

If the store was never open, it seems it's still safe to call closeOpenIterators and it should just be an empty list? -- Could we inline the code into close() directly?

Collaborator Author

vcrfxia Jan 24, 2023

Sure, I don't feel strongly so I made the change. Besides guarding against closing a segment which was never opened, the usage of open also guarded against closing the same segment twice. I've inlined closeOpenIterators() and accounted for this by clearing openIterators after it's copied.

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated Show resolved Hide resolved

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated Show resolved Hide resolved

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java

    
                      }

                      Bytes getPrefix() {

                          return Bytes.wrap(prefix);

Member

mjsax Jan 24, 2023

How large is the overhead to call wrap() (besides that it create a new object, what does it do?)

We pass in Bytes prefix in the constructor and seem if we keep a reference, we could just return it (without the need to unwrap in the constructor and re-wrap here?

Collaborator Author

vcrfxia Jan 24, 2023

wrap() just creates the new object (after performing a null check) so it's very lightweight.

It's more convenient to keep prefix as byte[] than Bytes because all the other operations require byte[] rather than Bytes. If we really wanted we could keep both (one copy as byte[] and another as Bytes) but that feels like overkill.

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated Show resolved Hide resolved

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated Show resolved Hide resolved

vcrfxia commented

View reviewed changes

Collaborator Author

vcrfxia left a comment •

edited

Thanks @mjsax for the speedy review! Addressed your comments in the latest commit.

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated

    
                      }

                      open = false;

                      closeOpenIterators();

Collaborator Author

vcrfxia Jan 24, 2023

Sure, I don't feel strongly so I made the change. Besides guarding against closing a segment which was never opened, the usage of open also guarded against closing the same segment twice. I've inlined closeOpenIterators() and accounted for this by clearing openIterators after it's copied.

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated Show resolved Hide resolved

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated Show resolved Hide resolved

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated Show resolved Hide resolved

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java

    
                      }

                      Bytes getPrefix() {

                          return Bytes.wrap(prefix);

Collaborator Author

vcrfxia Jan 24, 2023

wrap() just creates the new object (after performing a null check) so it's very lightweight.

It's more convenient to keep prefix as byte[] than Bytes because all the other operations require byte[] rather than Bytes. If we really wanted we could keep both (one copy as byte[] and another as Bytes) but that feels like overkill.

streams/src/main/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegment.java Outdated Show resolved Hide resolved

Member

mjsax commented Jan 26, 2023

Seems some of you newly added tests fail. Can you have a look.

Collaborator Author

vcrfxia commented Jan 26, 2023

Thanks. Needed to update the test for the latest changes which now set isOpen = true always. Fixed now.

Member

mjsax commented Jan 27, 2023

Checkstyle error:

> Task :streams:checkstyleTest

[2023-01-26T23:03:19.137Z] [ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13143/streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java:23:15: Unused import - org.junit.Assert.assertFalse. [UnusedImports]

Member

mjsax commented Jan 31, 2023

Merged the other PR -- can you rebase this one?


          add logical segments implementation

9aff136

vcrfxia force-pushed the kip-889-logical-segments branch from 82f47a6 to 9aff136 Compare

January 31, 2023 16:30

mjsax reviewed

View reviewed changes

Member

mjsax left a comment

Just some minor follow up on tests.

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentTest.java Outdated

+                      segment2.put(new Bytes(kv0.key.getBytes(UTF_8)), kv0.value.getBytes(UTF_8));
+                      segment2.put(new Bytes(kv1.key.getBytes(UTF_8)), kv1.value.getBytes(UTF_8));
+                      assertEquals("a", getAndDeserialize(segment1, "1"));

Member

mjsax Feb 1, 2023

Should we also get on the physical store to see if the logic works as expected? (Also for other tests)

Collaborator Author

vcrfxia Feb 2, 2023

I was on the fence about this because it requires testing the internals of the class (i.e., specifically how the segment prefixes are serialized) rather than just the public-facing methods. In the end I opted to test indirectly instead, by inserting the same keys into different segments and checking that their values do not collide.

If you prefer checking the contents of the physical store itself, I can make the update.

Member

mjsax Feb 3, 2023

I see your point, but the test does not really achieve this, as we put the same data into both segment? To test "segment isolation" we would need to put 4 different record (2 per segment) and test both positive (put on s1 allows use to get on s1) and negative (put on s1, does not allow get on s2 to see the data)?

Might apply to other tests, too?

Collaborator Author

vcrfxia Feb 3, 2023

Ah good point. That's definitely a gap in shouldPut() and shouldPutAll(). All of the other tests are already set up in a way that they fail if segments are not properly isolated from each other. Just pushed a fix to the two tests which didn't ensure that, and some minor cleanup to a few of the other tests.

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentTest.java Outdated Show resolved Hide resolved

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentTest.java Outdated

+                      expectedContents.add(kv0);
+                      expectedContents.add(kv1);
+                      try (final KeyValueIterator<Bytes, byte[]> iterator = segment1.range(null, new Bytes(STRING_SERIALIZER.serialize(null, "1")))) {

Member

mjsax Feb 1, 2023

Should we test different ranges? All lower and upper bound null/not-null combination?

Collaborator Author

vcrfxia Feb 2, 2023

Heh, this additional test coverage caught a bug. Pushed a fix in the latest commit.

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java Outdated Show resolved Hide resolved

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java Outdated Show resolved Hide resolved

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java Outdated

+                  @Test
+                  public void shouldCreateSegments() {
+                      final LogicalKeyValueSegment segment1 = segments.getOrCreateSegmentIfLive(0, context, -1L);

Member

mjsax Feb 1, 2023

Should we call getOrCreateSegment instead? Otherwise we mainly test the logic of AbstractSegments ?

Collaborator Author

vcrfxia Feb 2, 2023

See above.

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java

+                  }
+                  @Test
+                  public void shouldCleanupSegmentsThatHaveExpired() {

Member

mjsax Feb 1, 2023

Sound like we test AbstractSegments logic here -- do we need to do this?

Collaborator Author

vcrfxia Feb 2, 2023

You're right that these tests are testing logic from AbstractSegments and not anything specific about LogicalKeyValueSegments. The thing is, AbstractSegments doesn't have its own test file at the moment (I assume because it's abstract). If you think it's worth it, I can remove these tests from here and also from KeyValueSegmentsTest.java, and create a dummy AbstractSegments implementation to add an AbstractSegmentsTest.java. I'd like to do that as a follow-up PR instead of as part of this change, though.

(Also, for this specific test, I would like to have it here because I plan to refactor the cleanup logic in AbstractSegments in a follow-up PR. The current approach (cleanup as part of getOrCreateSegmentIfLive()) is not very efficient for the versioned store use case because this method is called multiple times during a single put operation. It will be better to only perform cleanup once per put.)

Member

mjsax Feb 3, 2023

If you think it's worth it, I can remove these tests from here and also from KeyValueSegmentsTest.java, and create a dummy AbstractSegments implementation to add an AbstractSegmentsTest.java. I'd like to do that as a follow-up PR instead of as part of this change, though.

Sounds cleaner to me. And yes, follow up PR is preferable.

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java

+                  }
+                  @Test
+                  public void shouldGetSegmentForTimestamp() {

Member

mjsax Feb 1, 2023

Ab above

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java

+                  }
+                  @Test
+                  public void shouldGetSegmentsWithinTimeRange() {

Member

mjsax Feb 1, 2023

ab above

vcrfxia commented

View reviewed changes

Collaborator Author

vcrfxia left a comment

Thanks for your review, @mjsax ! Responded to your comments inline. Will push a commit with the latest changes soon.

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentTest.java Outdated

+                      segment2.put(new Bytes(kv0.key.getBytes(UTF_8)), kv0.value.getBytes(UTF_8));
+                      segment2.put(new Bytes(kv1.key.getBytes(UTF_8)), kv1.value.getBytes(UTF_8));
+                      assertEquals("a", getAndDeserialize(segment1, "1"));

Collaborator Author

vcrfxia Feb 2, 2023

I was on the fence about this because it requires testing the internals of the class (i.e., specifically how the segment prefixes are serialized) rather than just the public-facing methods. In the end I opted to test indirectly instead, by inserting the same keys into different segments and checking that their values do not collide.

If you prefer checking the contents of the physical store itself, I can make the update.

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentTest.java Outdated Show resolved Hide resolved

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java Outdated Show resolved Hide resolved

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java Outdated Show resolved Hide resolved

streams/src/test/java/org/apache/kafka/streams/state/internals/LogicalKeyValueSegmentsTest.java

+                  }
+                  @Test
+                  public void shouldCleanupSegmentsThatHaveExpired() {

Collaborator Author

vcrfxia Feb 2, 2023

You're right that these tests are testing logic from AbstractSegments and not anything specific about LogicalKeyValueSegments. The thing is, AbstractSegments doesn't have its own test file at the moment (I assume because it's abstract). If you think it's worth it, I can remove these tests from here and also from KeyValueSegmentsTest.java, and create a dummy AbstractSegments implementation to add an AbstractSegmentsTest.java. I'd like to do that as a follow-up PR instead of as part of this change, though.

(Also, for this specific test, I would like to have it here because I plan to refactor the cleanup logic in AbstractSegments in a follow-up PR. The current approach (cleanup as part of getOrCreateSegmentIfLive()) is not very efficient for the versioned store use case because this method is called multiple times during a single put operation. It will be better to only perform cleanup once per put.)

vcrfxia added 2 commits

February 1, 2023 20:04


          review feedback

af961e3


          fix bug with null bounds in range

bbc64be

This was referenced Feb 2, 2023

KAFKA-14491: [5/N] Basic operations for RocksDB versioned store #13188

Merged

KAFKA-14491: [6/N] Support restoring RocksDB versioned store from changelog #13189

Merged

vcrfxia added 2 commits

February 3, 2023 10:59


          unit test changes

069a6e0


          extra unit test fix

fb51c2a

mjsax merged commit 4a7fedd into apache:trunk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment