Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-14481: Move LogSegment/LogSegments to storage module #14529

Merged
merged 7 commits into from
Oct 16, 2023

Conversation

ijuma
Copy link
Contributor

@ijuma ijuma commented Oct 11, 2023

A few notes:

  • Delete a few methods from UnifiedLog that were simply invoking the related method in LogFileUtils
  • Fix CoreUtils.swallow to use the passed in logging
  • Fix LogCleanerParameterizedIntegrationTest to close log before reopening
  • Minor tweaks in LogSegment for readability

For broader context on this change, please check:

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@ijuma ijuma force-pushed the log-segment-java branch 2 times, most recently from 32ac0c1 to db75c14 Compare October 11, 2023 14:24
LogSegmentData segmentData = new LogSegmentData(logFile.toPath(), toPathIfExists(segment.lazyOffsetIndex().get().file()),
toPathIfExists(segment.lazyTimeIndex().get().file()), Optional.ofNullable(toPathIfExists(segment.txnIndex().file())),
LogSegmentData segmentData = new LogSegmentData(logFile.toPath(), toPathIfExists(segment.offsetIndex().file()),
toPathIfExists(segment.timeIndex().file()), Optional.ofNullable(toPathIfExists(segment.txnIndex().file())),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@satishd Is it intentional that we force the indexes to be materialized here? We could pass the file without materializing if that's better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metadata is passed to RSM plugin which is external to Kafka. I would like to hide the details (that we have a lazy materialized index) from the external RSM plugin and instead have a clean contract which states, "Kafka guarantees that these files will be present, RSM can pick up and upload these". This provides a clean decoupling where Kafka <-> RSM plugin state sharing is only via files.
That is why we need to materialize the indexes before giving them to RSM plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is cost in materializing and hence why we should be sure it's needed. If that's the case here, then we're all good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about this. I take back my original statement.

We want to ensure that the file we are passing to RSM plugin contains all the data which is present in MemoryByteBuffer i.e. we should have flushed the MemoryByteBuffer to the file using force(). In Kafka, when we close a segment, indexes are flushed asynchronously. Hence, it might be possible that when we are passing the file to RSM, the file doesn't contain flushed data. This is a bug but it is not related to the change we are trying to make here. I will create a separate JIRA for this.

RSM doesn't need to read the index file during archive, hence it's ok to pass just the file name "without" materializing the index in the in-memory mmapped buffers.

Hence, we should flush() the content of indexes into the file before this operation but it is not necessary to materialize (i.e. read the content of file into memory).

@satishd we require your opinion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can file the JIRA and discuss it there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma, Good catch! afaik, no need to load the lazyIndex to write the contents of the index files to remote storage.

@Divij Right, we do not need to materialize the indexes for RSM to write them to the remote storage. Whenever a log segment is rolled over, segment and its indexes are flushed to disk in an asynchrnous manner. As indexes are mmapped, any file reads fetch from page cache which will be consistent with whatever is written to memory. We can explore whether it is really needed to flush it. RLM has access to segment, that can be flushed before the files are passed to RSM.

Filed https://issues.apache.org/jira/browse/KAFKA-15612 for follow-up discussion.

@@ -1945,26 +1947,17 @@ object UnifiedLog extends Logging {
logOffsetsListener)
}

def logFile(dir: File, offset: Long, suffix: String = ""): File = LogFileUtils.logFile(dir, offset, suffix)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and other deleted methods were simply passing through to LogFileUtils and hence did not add enough value to retain.

case Level.WARN => logging.warn(e.getMessage, e)
case Level.INFO => logging.info(e.getMessage, e)
case Level.DEBUG => logging.debug(e.getMessage, e)
case Level.TRACE => logging.trace(e.getMessage, e)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a bug where we were not using the passed logging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to add a test (perhaps for one of the functions that are using this utility) which could have caught this? Could be done as a separate JIRA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for separate JIRA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate JIRA is good. Could be a great newbie task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma ijuma force-pushed the log-segment-java branch 6 times, most recently from ac459c9 to 1600c76 Compare October 12, 2023 07:24
return lazyTimeIndex.get();
}

public File timeIndexFile() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are leaking the implementation detail of index here (the fact that it is backed by a file). IMO, this should not be a public method. If someone wants to access the underlying index file, they should be asking the Index about it and not the LogSegment using LogSegment.timeIndex().file() instead.

Copy link
Contributor Author

@ijuma ijuma Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's public because it needs to be a public method (this is an internal class though). What you suggested doesn't work because it forces materialization and it would be a serious regression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are leaking the fact here that the index is implemented using a lazy index implementation outside the segment.

Does the caller know that File returned by timeIndexFile() may not be consistent with the in-memory state of index? We are adding the responsibility on the caller to ensure that it calls flush() if it requires consistency. This means that we are leaking the internal implementation of index (being lazy) to the caller. This is concerning because it can cause bugs where authors using this method in other parts of the code may not understand that it could be eventually inconsistent.

If we really want to provide an index reference which doesn't require materialization, why not share the LazyIndex with the caller. Then the caller explicitly knows that this index is lazily evaluated?

Also, where are we using this function outside this class? Should this be private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not leaking anything that is not already exposed. It would be different if that was not accessible via the overall interface already. The only discussion here is not whether this is exposed (it already is), but how to expose it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not leaking anything that is not already exposed. It would be different if that was not accessible via the overall interface already. The only discussion here is not whether this is exposed (it already is), but how to expose it.

Fair point. Let tackle the how question in this PR and take up plugging the underlying file implementation leak separately later.

On the how part, can we keep the index or LaxyIndex as the components who choose to expose internal implementation outside (instead of LogSegment)? What that means is that we will probably have a function here in LogSegment which returns LazyIndex.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LazyIndex exposes a larger surface area though. And it's confusing to expose both LazyIndex and the materialized index. The way it is now is actually simpler. Local indexes are file based and that's core to how they work. LogSegment gives you the file if that's all you need or the materialized index. And LazyIndex is used internally to simplify the implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, could we add a javadoc on this method that the physical file pointed to by this file object may not be consistent with in-memory copy of index? It doesn't totally address my concern of a potential bug when someone uses this File and assumes that it represents a consistent view of the index but I will am happy with a javadoc now. The whole index implemented by a file thing needs to be hidden away but that belongs in a separate JIRA.

Copy link
Contributor Author

@ijuma ijuma Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexes are memory mapped, so in theory it should be consistent for reads that go through the page cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last clarification: I am not opposed to improving the overall modularity of the classes in the storage layer. As you said, in a separate PR/discussion.

@ijuma
Copy link
Contributor Author

ijuma commented Oct 12, 2023

Thanks for the prompt review @divijvaidya! Please note that this is still in draft because there are still failing tests. Since it was still in draft mode, I have been force pushing and so on. In the future, please let me know if you are reviewing a draft PR so I can avoid force pushes (which are painful for you as the reviewer).

TestUtils.waitUntilTrue(() => log.logStartOffset == endOffset,
"Timed out waiting for deletion of old segments")
assertEquals(1, log.numberOfSegments)

cleaner.shutdown()
closeLog(log)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we did not close the log before reopening it a bit later. It didn't seem to cause any problems, but it made it more difficult to debug test failures (since the behavior is not clearly defined in this case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

* The first time this is invoked, it will result in a time index lookup (including potential materialization of
* the time index).
*/
public TimestampOffset readMaxTimestampAndOffsetSoFar() throws IOException {
Copy link
Contributor Author

@ijuma ijuma Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a read prefix to this to make it clearer that it does more than than the field maxTimestampAndOffsetSoFar. The conversion from Scala to Java had originally caused some code to use the field instead of the method, which led to some subtly different behavior in some cases.

@ijuma
Copy link
Contributor Author

ijuma commented Oct 12, 2023

I believe the tests should pass this time, let's see.

@ijuma ijuma marked this pull request as ready for review October 12, 2023 16:19
@ijuma ijuma requested a review from satishd October 12, 2023 16:20
@ijuma
Copy link
Contributor Author

ijuma commented Oct 12, 2023

The Java 11 build has 3 unrelated failures:

Build / JDK 11 and Scala 2.13 / org.apache.kafka.common.network.SslTransportLayerTest.[3] tlsProtocol=TLSv1.3, useInlinePem=false
Build / JDK 11 and Scala 2.13 / org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddNamedTopologyToRunningApplicationWithSingleInitialNamedTopology()
Build / JDK 11 and Scala 2.13 / org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.testFallbackPriorTaskAssignorLargePartitionCount

This is ready for review.

@ijuma
Copy link
Contributor Author

ijuma commented Oct 13, 2023

Test failures:

Build / JDK 17 and Scala 2.13 / kafka.api.PlaintextConsumerTest.testSubsequentPatternSubscription()
Build / JDK 21 and Scala 2.13 / kafka.api.AuthorizerIntegrationTest.testAuthorizeByResourceTypePrefixedResourceDenyDominate(String).quorum=zk
Build / JDK 21 and Scala 2.13 / kafka.api.ConsumerBounceTest.testCloseDuringRebalance()
Build / JDK 21 and Scala 2.13 / kafka.api.DelegationTokenEndToEndAuthorizationWithOwnerTest.testCreateTokenForOtherUserFails(String).quorum=kraft
Build / JDK 21 and Scala 2.13 / org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated()
Build / JDK 11 and Scala 2.13 / integration.kafka.server.FetchFromFollowerIntegrationTest.testRackAwareRangeAssignor()
Build / JDK 11 and Scala 2.13 / org.apache.kafka.tools.MetadataQuorumCommandTest.[1] Type=Raft-Combined, Name=testDescribeQuorumReplicationSuccessful, MetadataVersion=3.7-IV0, Security=PLAINTEXT
Build / JDK 11 and Scala 2.13 / org.apache.kafka.tools.MetadataQuorumCommandTest.[1] Type=Raft-Combined, Name=testDescribeQuorumReplicationSuccessful, MetadataVersion=3.7-IV0, Security=PLAINTEXT

Java 8 passed and the failures look unrelated, but I kicked off another build to get more signal.

@satishd
Copy link
Member

satishd commented Oct 13, 2023

@ijuma Thanks for the PR. I will review it tomorrow.

case Level.WARN => logging.warn(e.getMessage, e)
case Level.INFO => logging.info(e.getMessage, e)
case Level.DEBUG => logging.debug(e.getMessage, e)
case Level.TRACE => logging.trace(e.getMessage, e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to add a test (perhaps for one of the functions that are using this utility) which could have caught this? Could be done as a separate JIRA.

TestUtils.waitUntilTrue(() => log.logStartOffset == endOffset,
"Timed out waiting for deletion of old segments")
assertEquals(1, log.numberOfSegments)

cleaner.shutdown()
closeLog(log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor

@divijvaidya divijvaidya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for patiently answering my comments, Ismael. This looks good to me.

@ijuma
Copy link
Contributor Author

ijuma commented Oct 13, 2023

Thank you for patiently answering my comments

Thanks for the review and for paying close attention to code quality (modularity, readability, etc.) - it's important!

@ijuma
Copy link
Contributor Author

ijuma commented Oct 13, 2023

@ijuma Thanks for the PR. I will review it tomorrow.

Thanks @satishd. I won't merge until Tuesday to give you a chance to review.

Copy link
Member

@satishd satishd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ijuma for the PR covering the refactoring and the cleanup . LGTM.

else if (e instanceof RuntimeException)
throw (RuntimeException) e;
else
throw new IllegalStateException("Unexpected exception thrown: " + e, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change to maintain the semantics while moving to Java.

public void close() throws IOException {
if (maxTimestampAndOffsetSoFar != TimestampOffset.UNKNOWN)
Utils.swallow(LOGGER, Level.WARN, "maybeAppend", () -> timeIndex().maybeAppend(maxTimestampSoFar(), offsetOfMaxTimestampSoFar(), true));
Utils.closeQuietly(lazyOffsetIndex, "offsetIndex", LOGGER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to use closeQuietly instead of using swallow by closing here.

@ijuma ijuma merged commit 1073d43 into apache:trunk Oct 16, 2023
1 check was pending
@ijuma ijuma deleted the log-segment-java branch October 16, 2023 13:37
AnatolyPopov pushed a commit to aiven/kafka that referenced this pull request Feb 16, 2024
)

A few notes:
* Delete a few methods from `UnifiedLog` that were simply invoking the related method in `LogFileUtils`
* Fix `CoreUtils.swallow` to use the passed in `logging`
* Fix `LogCleanerParameterizedIntegrationTest` to close `log` before reopening
* Minor tweaks in `LogSegment` for readability
 
For broader context on this change, please check:

* KAFKA-14470: Move log layer to storage module

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants