Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-10500: Thread Cache Resizes #9572

Merged
merged 12 commits into from Nov 18, 2020
Merged

KAFKA-10500: Thread Cache Resizes #9572

merged 12 commits into from Nov 18, 2020

Conversation

wcarlson5
Copy link
Contributor

The thread cache can now be resized. This will go towards being able to scale the number of threads

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@wcarlson5
Copy link
Contributor Author

@cadonna Part 2

@wcarlson5 wcarlson5 mentioned this pull request Nov 9, 2020
3 tasks
Copy link
Contributor

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, @wcarlson5 !

Here my feedback.

Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits. Overall LGTM.

}
final long cacheSizePerThread = totalCacheSize / (numStreamThreads + (hasGlobalTopology ? 1 : 0));
totalCacheSize = config.getLong(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG);
final long cacheSizePerThread = totalCacheSize / (numStreamThreads + ((globalTaskTopology != null) ? 1 : 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move off using hasGlobalTopology?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was in a separate method without access to hasGlobalTopology. I supposes if it stays we can move it back

@@ -806,6 +803,13 @@ private KafkaStreams(final InternalTopologyBuilder internalTopologyBuilder,
rocksDBMetricsRecordingService = maybeCreateRocksDBMetricsRecordingService(clientId, config);
}

private void resizeThreadCache(final int numStreamThreads) {
final long cacheSizePreThread = totalCacheSize / (numStreamThreads + ((globalTaskTopology != null) ? 1 : 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this duplicates L733. Might be good to extract into a small helper method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have it in a separate method but helper but when removing the totalCacheSize < 0 check @cadonna thought it would be more readable inline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the totalCacheSize check is relevant for avoiding code duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was about readability. I might be misremembering though, as it was a conversation we had last week

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this line is duplicated, it should go in a method. When I proposed to move it inline, I was apparently not aware that the same line was used somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a new method. Glad we got that cleared up. LGTM?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM?

If this is a question, should it be LGTY? 😂

final CircularIterator<NamedCache> circularIterator = new CircularIterator<>(caches.values());
while (sizeBytes() > maxCacheSizeBytes) {
if (!circularIterator.hasNext()) {
log.error("Unable to remove any more entries as all caches are empty");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this ever happen? If we the max cache size is smaller than a single entry, would we not evict the entry and the used cache size would always shrink to zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add a check to make sure the number of threads is positive then probably not. Ill add that check then remove this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. -- I guess the miss-leading fact was, that this check was done inside the while-loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in retrospect it was not very clear. Hopefully its better this way now

@mjsax mjsax added the streams label Nov 17, 2020
}

private void resizeThreadCache(final int numStreamThreads) {
if (numStreamThreads < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be smaller than 0 ? Should the test be <= 0 or < 1 instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be zero if you have a global thread, but since this is internal the check might not be entirely necessary

Copy link
Member

@mjsax mjsax Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be zero, but the check says < 0, so it would always evaluate to false?

And if we have zero threads, we should not resize the cache as we might end up in an infinite loop? But we would only call this method if we "shrink", ie, if the thread count grows, but it can never grow from negative to zero, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. Maybe what we need to do it put a minimum size of cache to limit how many stream threads an instance can have?

Copy link
Member

@mjsax mjsax Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, getCacheSizePerThread would eventually return zero (with growing number of threads), what means that every put() into the cache would result in an immediate eviction. So I don't think we need to do anything for this corner case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good point

@mjsax mjsax merged commit d12fbb7 into apache:trunk Nov 18, 2020
@mjsax
Copy link
Member

mjsax commented Nov 18, 2020

Thanks for the PR @wcarlson5. Merged to trunk.

@wcarlson5 wcarlson5 deleted the thread-cache-resizeable branch November 19, 2020 00:34
@mjsax mjsax added the kip Requires or implements a KIP label Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kip Requires or implements a KIP streams
Projects
None yet
3 participants