Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-8717: Reuse cached offset metadata when reading from log #7081

Merged
merged 5 commits into from
Jul 30, 2019

Conversation

hachikuji
Copy link
Contributor

Although we currently cache offset metadata for the high watermark and last stable offset, we don't use it when reading from the log. Instead we always look it up from the index. This patch pushes fetch isolation into Log.read so that we are able to reuse the cached offset metadata.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@ijuma
Copy link
Contributor

ijuma commented Jul 15, 2019

Thanks for the PR. Concretely, does this mean we save one index look up in the usual consumer reading to the end of the log (hw or lso) case? Even though one would expect the index to be in the page cache, always good to do less work when possible.

@hachikuji
Copy link
Contributor Author

hachikuji commented Jul 15, 2019

@ijuma Yes, that's right. This came up in the context of KIP-392 because we realized that, unlike the leader, the follower would always need one lookup to fetch the high watermark or LSO. However, we saw that the leader was already doing an extra lookup whenever it read from the log, so we thought we could eliminate that lookup and bring the follower fetch at least up to parity with the leader's current behavior.

To be perfectly honest, I am skeptical of the benefit of this offset metadata caching. I did some brief consumer performance testing with this patch and saw basically no difference. But it's a bit harder to say in a more general context with more partitions and more consumers. In any case, I thought this presented a nice opportunity to simplify some of the internal fetch APIs a little bit.

@hachikuji
Copy link
Contributor Author

retest this please

1 similar comment
@hachikuji
Copy link
Contributor Author

retest this please

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji : Thanks for the PR. Great cleanup patch. A few comments below. Also, since this PR is kind of tricky, it's probably useful to have a jira to track it.

core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved
@hachikuji hachikuji changed the title MINOR: Reuse cached offset metadata when reading from log KAFKA-8717: Reuse cached offset metadata when reading from log Jul 25, 2019
@hachikuji
Copy link
Contributor Author

retest this please

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji : Thanks for the PR. LGTM. Just a minor comment below.

@@ -2086,6 +2107,14 @@ class Log(@volatile var dir: File,
}
}

/**
* Get the largest log segment with a base offset less than the given offset, if one exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less than => less than or equal to ?

@hachikuji hachikuji merged commit a48b5d9 into apache:trunk Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants