New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-14414: Remove unnecessary usage of ObjectSerializationCache #12890
Conversation
@clolov please review when you get a chance. |
@mimaison please take a look when you get a chance! |
@@ -75,7 +77,17 @@ public void write(ByteBuffer buffer, ObjectSerializationCache serializationCache | |||
} | |||
|
|||
public int size(ObjectSerializationCache serializationCache) { | |||
return data.size(serializationCache, headerVersion); | |||
if (this.size == SIZE_NOT_INITIALIZED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method does not need to be public anymore, callers from other packages use the new overload.
Also on the golden path size() -> size(ObjectSerializationCache serializationCache)
we do the equality check twice. If we inline it, does it make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I changed the access modifier for this method to default. It is used by tests hence, I am currently not changing to private.
-
I have made similar changes to RespondeHeader as well to prevent incorrect future usage of the size() method over there that would cause similar performance problems.
-
Added JavaDocs primarily meant for contributors to ensure that they use the correct intended method.
-
In this refactor, the size(ObjectSerializationCache) method calculates the size every time (instead of using the cached value). This should remove the double equality checks.
-
Also added some unit tests.
@@ -110,7 +110,7 @@ object RequestChannel extends Logging { | |||
|
|||
def sizeOfBodyInBytes: Int = bodyAndSize.size | |||
|
|||
def sizeInBytes: Int = header.size(new ObjectSerializationCache) + sizeOfBodyInBytes | |||
def sizeInBytes: Int = header.size + sizeOfBodyInBytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers:
This is the main change in this PR which impacts performance where we are using the cached value instead of calculating it again.
} | ||
|
||
@Override | ||
public int hashCode() { | ||
return this.data.hashCode(); | ||
return Objects.hash(data, headerVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers
Piggybacking this minor change where we now include headerVersion in equality comparison for two RequestHeader objects.
Test failures are unrelated.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the PR
Out of curiosity, what profiler was used to compute this? Also, did we see an actual improvement (eg was cpu usage or latency lower after the change)? I ask because profilers are known to have safepoint bias and can incorrectly attribute the cost when it comes to certain method types. |
More details on identity hash code here (it's pretty fast): https://shipilev.net/jvm/anatomy-quarks/26-identity-hash-code/ |
Hey @ijuma - thank you for your question.
|
Was this profile taken after the broker was running for long enough for the JIT compilation to have completed? The profile seems to show some things that look a bit odd. That said, I think the change is fine overall. It makes sense to avoid the map allocation (including underlying array) and the overhead of mutating it (including array resizes required for it). As you said, it also makes sense to avoid parsing the request header twice. My questions were mostly so that we understand what we are trying to achieve and we have the right understanding of the underlying reason. This helps ensure future changes are not based on incorrect conclusions. Thanks for the improvement! |
@ijuma yes, there was a warm up workload prior to profiling. The JVM was probably alive for ~7-8 min. before the profile capture started. What are the fishy things that you notice here? I can try again on a long running server if you like. I am going to be using a similar profiler as a motivation for some future changes that I have lined up (I am currently writing a JMH benchmark for my other ArrayBuffer vs. ListBuffer PR) and I want to ensure we are on the same page wrt it's effectiveness. Hence, let's resolve this. What can I change in my setup that can help us understand the flamegraph better? I would be happy to jump on a call too to explain my setup if that makes things faster or we can use the public slack channel (ASF workspace, #kafka channel) to communicate faster on this. |
@divijvaidya Thanks. Can you please attach the html file? I think it's possible to do it here, but if not then the JIRA would be helpful. I can take a closer look on the bits that seemed a bit suspicious. |
…ache#12890) Reviewers: Mickael Maison <mickael.maison@gmail.com>
Motivation
We create an instance of ObjectSerializationCache at https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L113 which does not get used at all. We always "add" to the cache but never retrieve from it (as is evident by the fact that we don't store the reference of the cache anywhere).
Adding information to the cache is expensive because it uses
System.identityHashCode(Object)
which is expensive as demonstrated by the flame graph of producer requests over Apache Kafka 3.3.1 plaintext broker.The above graph is a stack where below the yellow line are all callers of the function. The horizontal axis demonstrates the amount of CPU used by each caller. As you will observe,
KafkaAPIs.handleProduceRequest
is a major contributor to the CPU usage.Change
Currently, the header of a request is parsed by the processor thread prior to adding it to RequestChannel. With this change, we cache the computed size in the
RequestHeader
object itself at the time of parsing the bytebuffer into aRequestHeader
object. The cached size is re-used when it is required atRequestChannel
.After the change
Note that the CPU utilization by the above hotspot has been eliminated.
Please note how the CPU utilization by
KafkaAPIs.handleProduceRequest
has been eliminated since we don't useObjectSerializationCache
in that code path now.