Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Refactor Values class to fix checkstyle, add benchmark, optimize exceptions #15469

Merged
merged 10 commits into from
May 23, 2024

Conversation

gharris1727
Copy link
Contributor

@gharris1727 gharris1727 commented Mar 5, 2024

The Values class has checkstyle suppressions for NPathComplexity, MethodLength, CyclomaticComplexity, and JavaNCSS. Rather than suppressing these problems, we should refactor the oversize methods into smaller methods that satisfy the checkstyle heuristics.

I added a benchmark to verify that this refactoring does not cause a performance regression. Once I had the benchmark, I found some really obvious optimizations:

  • parseAsTemporal was catching ParseExceptions, and I eliminated those by checking the ParsePosition.
  • number parsing was catching multiple ArithmeticExceptions when numbers had decimal parts, or were too large to fit in certain types. The new implementation does some equivalent rounding and cast-checking to avoid the exceptions.

With these changes, the parse() profile shows a significant improvement, while all of the other methods are nearly the same or slightly faster than before. I'll include the detailed results from my machine in a follow-up comment.

Finally I just did some simplification that makes the generic convertTo method call the specific convertTo(Boolean,Byte, etc) methods, rather than having every one of the specific methods call the single generic method. This eliminates the need for casting Object to the return types, and eliminates flaws that could come from that cast failing. I fixed one such flaw in #15399 that I found while doing this refactor.

The SchemaMerger is a refactor of the fix in #15399 because the additional variables and control flow exceeded the checkstyle heuristics.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@gharris1727
Copy link
Contributor Author

gharris1727 commented Mar 5, 2024

Benchmark Before ns/op Before Error After ns/op After Error Speedup
ValuesBenchmark.testConvertToBoolean 124.038 0.319 71.448 2.258 1.7
ValuesBenchmark.testConvertToByte 111.404 2.196 61.334 0.092 1.8
ValuesBenchmark.testConvertToDate 3698.989 135.292 3675.007 5.379 1.0
ValuesBenchmark.testConvertToDecimal 1433.731 20.883 1515.147 2.91 0.9
ValuesBenchmark.testConvertToDouble 196.389 68.131 59.401 0.357 3.3
ValuesBenchmark.testConvertToFloat 329.249 158.87 56.547 0.43 5.8
ValuesBenchmark.testConvertToInteger 130.939 0.479 64.801 0.465 2.0
ValuesBenchmark.testConvertToList 1212.389 19.245 1527.234 2.949 0.8
ValuesBenchmark.testConvertToLong 131.72 2.178 75.345 0.434 1.7
ValuesBenchmark.testConvertToMap 1641.567 27.802 1890.402 12.093 0.9
ValuesBenchmark.testConvertToShort 113.9 0.254 61.587 0.199 1.8
ValuesBenchmark.testConvertToString 1620.263 8.395 1669.339 3.622 1.0
ValuesBenchmark.testConvertToStruct 3.768 0.034 1.374 0.013 2.7
ValuesBenchmark.testConvertToTime 2775.386 8.146 2685.46 8.681 1.0
ValuesBenchmark.testConvertToTimestamp 2849.605 7.025 2731.251 6.568 1.0
ValuesBenchmark.testInferSchema 116.758 0.133 97.536 0.325 1.2
ValuesBenchmark.testParseString 41501.659 197.842 13606.346 287.035 3.1

This has some irregularities. The floating point tests were multi-modal, but don't appear to be with the new implementation, i'm not sure why. The speedup numbers for those are a bit misleading.

# Benchmark: org.apache.kafka.jmh.connect.ValuesBenchmark.testConvertToFloat
# Warmup Iteration   1: 280.384 ns/op
# Warmup Iteration   2: 133.229 ns/op
# Warmup Iteration   3: 132.223 ns/op
Iteration   1: 430.475 ns/op
Iteration   2: 284.555 ns/op
Iteration   3: 273.401 ns/op
Iteration   4: 328.282 ns/op
Iteration   5: 237.622 ns/op
Iteration   6: 344.620 ns/op
Iteration   7: 405.791 ns/op

# Benchmark: org.apache.kafka.jmh.connect.ValuesBenchmark.testConvertToDouble
# Warmup Iteration   1: 200.209 ns/op
# Warmup Iteration   2: 138.782 ns/op
# Warmup Iteration   3: 195.215 ns/op
Iteration   1: 196.872 ns/op
Iteration   2: 192.461 ns/op
Iteration   3: 200.401 ns/op
Iteration   4: 216.845 ns/op
Iteration   5: 195.567 ns/op
Iteration   6: 137.116 ns/op
Iteration   7: 235.458 ns/op

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Greg for looking into this, this Values class was a bit of a monster. I made a couple of suggestions and this needs rebasing to resolve the checkstyle/suppressions.xml conflict.

@@ -766,135 +852,23 @@ protected static boolean canParseSingleTokenLiteral(Parser parser, boolean embed
protected static SchemaAndValue parse(Parser parser, boolean embedded) throws NoSuchElementException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to move all these parse<>() methods to the Parser class, and extract Parser to its own file. WDYT?

I made a quick attempt in 10f4910#diff-024f49f1f6adf07bcc1cab6aa8caa0d931ba2c6be887d96ab575ae032be4d051

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a cool idea, I think that makes a lot of sense when a bunch of these static methods take a Parser argument anyway.

Since this is in the public API, I'll focus on moving some of the internal methods to instance methods of a protected/package local Parser, and leave the public static methods in Values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, Parser is also part of the public API, I thought it was only a private inner class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parser was protected, so I think it's still safe to refactor. The class doesn't show up here: https://javadoc.io/doc/org.apache.kafka/connect-api/latest/index.html

I moved the existing Parser to Tokenizer, as it had a good interface already, and adding methods would just be clutter. The methods which took a Parser argument have now been moved to a single toplevel class named Parser. Both of these classes are package-local, so shouldn't appear in the API docs.

I left almost all of the private/protected static methods in Values, just bringing a few over that were only ever called by the Parser. I tried moving things from Values to Parser to break the circular dependency, but this required moving nearly everything to Parser. The two classes are really intertwined, and i'm not really satisfied with this refactor now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My core difficulty is that the parsing logic and the conversion logic mutually depend on one another:

  1. The convertTo methods check if the input is a String, and then run it through the Parser.
  2. After parsing a map or array, the Parser calls convertTo on the elements to "cast" them to a common schema

I'm pretty sure convertTo -> parser -> convertTo is a reasonable cycle, and should happen all the time via convertToList, convertToMap.

I don't think that parser -> convertTo -> parser is a useful cycle for multiple reasons, but proving that is a little bit slippery. With some time I think I can break this part of the cycle so that this doesn't end up as one big ball of code again.

Copy link
Member

@mimaison mimaison May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Parser class shows up in https://kafka.apache.org/37/javadoc/org/apache/kafka/connect/data/Values.Parser.html so not sure if we can make it private. I think the other inner classes are not really meant to be used by users. Should we do a small KIP to hide them all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the parser -> convertTo -> parser cycle, if we want to tackle it, it's definitively in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Locally for me this doesn't appear in the API docs, and it's because of this change: https://issues.apache.org/jira/browse/KAFKA-14839 . The ticket says it's fixed in 3.5.0, and that commit is definitely in 3.6.0+ so it should have applied when doing the 3.7.0 release.

I'm happy to preserve all of the protected methods (both the existing ones in Values, and the whole Parser class) if these are truly public API, but if this other commit already removed them from the Javadoc, I think we should try and follow through on the renaming here. I just need to figure out how they showed up in the 37 release docs...

Copy link
Contributor Author

@gharris1727 gharris1727 May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's because there are two javadoc configs, and they got out-of-sync:
./gradlew javadoc uses this one:

kafka/build.gradle

Lines 597 to 610 in 5439914

javadoc {
options.charSet = 'UTF-8'
options.docEncoding = 'UTF-8'
options.encoding = 'UTF-8'
options.memberLevel = JavadocMemberLevel.PUBLIC // Document only public members/API
// Turn off doclint for now, see https://blog.joda.org/2014/02/turning-off-doclint-in-jdk-8-javadoc.html for rationale
options.addStringOption('Xdoclint:none', '-quiet')
// The URL structure was changed to include the locale after Java 8
if (JavaVersion.current().isJava11Compatible())
options.links "https://docs.oracle.com/en/java/javase/${JavaVersion.current().majorVersion}/docs/api/"
else
options.links "https://docs.oracle.com/javase/8/docs/api/"
}

and ./gradlew aggregatedJavadoc uses this one:

kafka/build.gradle

Lines 3335 to 3353 in 5439914

task aggregatedJavadoc(type: Javadoc, dependsOn: compileJava) {
def projectsWithJavadoc = subprojects.findAll { it.javadoc.enabled }
source = projectsWithJavadoc.collect { it.sourceSets.main.allJava }
classpath = files(projectsWithJavadoc.collect { it.sourceSets.main.compileClasspath })
includes = projectsWithJavadoc.collectMany { it.javadoc.getIncludes() }
excludes = projectsWithJavadoc.collectMany { it.javadoc.getExcludes() }
options.charSet = 'UTF-8'
options.docEncoding = 'UTF-8'
options.encoding = 'UTF-8'
// Turn off doclint for now, see https://blog.joda.org/2014/02/turning-off-doclint-in-jdk-8-javadoc.html for rationale
options.addStringOption('Xdoclint:none', '-quiet')
// The URL structure was changed to include the locale after Java 8
if (JavaVersion.current().isJava11Compatible())
options.links "https://docs.oracle.com/en/java/javase/${JavaVersion.current().majorVersion}/docs/api/"
else
options.links "https://docs.oracle.com/javase/8/docs/api/"
}

Apparently the javadoc website I linked uses the javadoc jar where the protected members are excluded, and the Kafka website uses the aggregatedJavadoc tree which still has the protected members.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! In that case I'm fine with the changes, no need for a KIP.

@@ -177,7 +213,12 @@ public static Long convertToLong(Schema schema, Object value) throws DataExcepti
* @throws DataException if the value could not be converted to a float
*/
public static Float convertToFloat(Schema schema, Object value) throws DataException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of these convertTo<>() methods are not covered by unit tests. It's ok not to address this in this PR if you'd prefer as it's already huge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests to get the methods themselves up to 100% coverage, but the overall class still is missing some coverage. Thanks for pointing this out, as there were certainly some pretty obvious cases that weren't tested.

@gharris1727 gharris1727 marked this pull request as draft May 6, 2024 23:02
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
…arser class

Signed-off-by: Greg Harris <greg.harris@aiven.io>
@gharris1727 gharris1727 force-pushed the minor-refactor-values-checkstyle branch from 5390b3b to 80dbeb7 Compare May 7, 2024 18:35
@gharris1727
Copy link
Contributor Author

Here's the Values test coverage changes:

State Class % Method % Line %
Initial 100% (4/4) 81% (40/49) 78% (464/589)
Added tests 100% (4/4) 97% (48/49) 85% (502/589)
Refactored 100% (6/6) 93% (77/82) 84% (565/669)

There are more classes, methods, and lines, but the percentage coverage went up.

@gharris1727 gharris1727 marked this pull request as ready for review May 7, 2024 18:44
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gharris1727
Copy link
Contributor Author

Here's the final performance changes:

Benchmark Before Before Error After After Error Speedup
ValuesBenchmark.testConvertToBoolean 123.749 0.842 72.577 0.412 1.7
ValuesBenchmark.testConvertToByte 117.883 1.397 62.387 0.148 1.9
ValuesBenchmark.testConvertToDate 3700.318 160.043 3522.214 36.075 1.1
ValuesBenchmark.testConvertToDecimal 1530.936 49.503 1485.654 11.766 1.0
ValuesBenchmark.testConvertToDouble 163.937 55.591 60.378 0.577 2.7
ValuesBenchmark.testConvertToFloat 204.591 109.567 57.611 0.49 3.6
ValuesBenchmark.testConvertToInteger 140.586 3.809 66.496 0.371 2.1
ValuesBenchmark.testConvertToList 1276.364 54.037 1601.568 28.972 0.8
ValuesBenchmark.testConvertToLong 132.029 3.118 76.744 1.112 1.7
ValuesBenchmark.testConvertToMap 1361.082 78.59 1244.339 11.019 1.1
ValuesBenchmark.testConvertToShort 121.575 4.6 63.167 0.311 1.9
ValuesBenchmark.testConvertToString 1667.243 51.186 1580.031 11.391 1.1
ValuesBenchmark.testConvertToStruct 3.819 0.082 1.395 0.009 2.7
ValuesBenchmark.testConvertToTime 2864.609 163.586 2701.721 60.677 1.1
ValuesBenchmark.testConvertToTimestamp 2789.008 30.371 2738.573 19.6 1.0
ValuesBenchmark.testInferSchema 123.196 3.292 99.336 0.867 1.2
ValuesBenchmark.testParseString 43826.599 922.077 13429.742 133.089 3.3

There's a consistent performance degradation for testConvertToList that seems to come from the parseString implementation being slightly less efficient for array inputs. I'll follow up on that separately since this PR already has too much scope, and the degradation is only slight.

One interesting final observation is that the variance/error for all of the tests is lower than the previous implementation. I suspect that this is because many of methods now avoid traversing the switch-case in the convertTo implementation, which lessens the number of branches and increases the branch predictor's success rate.

And here's the final coverage changes:

State Class Method Line
Before 100% (4/4) 81% (40/49) 78% (464/589)
After 100% (6/6) 93% (77/82) 84% (565/669)

@gharris1727 gharris1727 merged commit 11ad5e8 into apache:trunk May 23, 2024
1 check failed
apourchet added a commit to apourchet/kafka that referenced this pull request May 29, 2024
commit cc269b0
Author: Antoine Pourchet <antoine@responsive.dev>
Date:   Wed May 29 14:15:02 2024 -0600

    KAFKA-15045: (KIP-924 pt. 14) Callback to TaskAssignor::onAssignmentComputed (apache#16123)

    This PR adds the logic and wiring necessary to make the callback to
    TaskAssignor::onAssignmentComputed with the necessary parameters.

    We also fixed some log statements in the actual assignment error
    computation, as well as modified the ApplicationState::allTasks method
    to return a Map instead of a Set of TaskInfos.

    Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>

commit 862ea12
Author: Eugene Mitskevich <emitskevich@bloomberg.net>
Date:   Wed May 29 16:14:37 2024 -0400

    MINOR: Fix rate metric spikes (apache#15889)

    Rate reports value in the form of sumOrCount/monitoredWindowSize. It has a bug in monitoredWindowSize calculation, which leads to spikes in result values.

    Reviewers: Jun Rao <junrao@gmail.com>

commit 0f0c9ec
Author: gongxuanzhang <gongxuanzhangmelt@gmail.com>
Date:   Thu May 30 01:08:17 2024 +0800

    KAFKA-16771 First log directory printed twice when formatting storage (apache#16010)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 2d9994e
Author: Andrew Schofield <aschofield@confluent.io>
Date:   Wed May 29 16:31:52 2024 +0100

    KAFKA-16722: Introduce ConsumerGroupPartitionAssignor interface (apache#15998)

    KIP-932 introduces share groups to go alongside consumer groups. Both kinds of group use server-side assignors but it is unlikely that a single assignor class would be suitable for both. As a result, the KIP introduces specific interfaces for consumer group and share group partition assignors.

    This PR introduces only the consumer group interface, `o.a.k.coordinator.group.assignor.ConsumerGroupPartitionAssignor`. The share group interface will come in a later release. The existing implementations of the general `PartitionAssignor` interface have been changed to implement `ConsumerGroupPartitionAssignor` instead and all other code changes are just propagating the change throughout the codebase.

    Note that the code in the group coordinator that actually calculates assignments uses the general `PartitionAssignor` interface so that it can be used with both kinds of group, even though the assignors themselves are specific.

    Reviewers: Apoorv Mittal <amittal@confluent.io>, David Jacot <djacot@confluent.io>

commit 0b75cf7
Author: gongxuanzhang <gongxuanzhangmelt@gmail.com>
Date:   Wed May 29 22:38:00 2024 +0800

    KAFKA-16705 the flag "started" of RaftClusterInstance is false even though the cluster is started (apache#15946)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 8d11d95
Author: Loïc GREFFIER <loic.greffier@hotmail.fr>
Date:   Wed May 29 14:09:22 2024 +0200

    KAFKA-16448: Add ProcessingExceptionHandler interface and implementations (apache#16090)

    This PR is part of KAFKA-16448 which aims to bring a ProcessingExceptionHandler to Kafka Streams in order to deal with exceptions that occur during processing.

    This PR brings ProcessingExceptionHandler interface and default implementations.

    Co-authored-by: Dabz <d.gasparina@gmail.com>
    Co-authored-by: sebastienviale <sebastien.viale@michelin.com>

    Reviewer: Bruno Cadonna <cadonna@apache.org>

commit b73f479
Author: Ramin Gharib <ramingharib@gmail.com>
Date:   Wed May 29 13:12:54 2024 +0200

    KAFKA-16362: Fix type-unsafety in KStreamKStreamJoin caused by isLeftSide (apache#15601)

    The introduced changes provide a cleaner definition of the join side in KStreamKStreamJoin. Before, this was done by using a Boolean flag, which led to returning a raw LeftOrRightValue without generic arguments because the generic type arguments depended on the boolean input.

    Reviewers: Greg Harris <greg.harris@aiven.io>, Bruno Cadonna <cadonna@apache.org>

commit 897cab2
Author: Luke Chen <showuon@gmail.com>
Date:   Wed May 29 15:30:18 2024 +0800

    KAFKA-16399: Add JBOD support in tiered storage (apache#15690)

    After JBOD is supported in KRaft, we should also enable JBOD support in tiered storage. Unit tests and Integration tests are also added.

    Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Igor Soarez <soarez@apple.com>, Mickael Maison <mickael.maison@gmail.com>

commit eefd114
Author: Dongnuo Lyu <139248811+dongnuo123@users.noreply.github.com>
Date:   Wed May 29 02:21:30 2024 -0400

    KAFKA-16832; LeaveGroup API for upgrading ConsumerGroup (apache#16057)

    This patch implements the LeaveGroup API to the consumer groups that are in the mixed mode.

    Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>

commit 9562143
Author: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
Date:   Tue May 28 21:35:02 2024 -0700

    HOTFIX: remove unnecessary list creation (apache#16117)

    Removing a redundant list declaration in the new StickyTaskAssignor implementation

    Reviewers: Antoine Pourchet <antoine@responsive.dev>

commit d64e3fb
Author: Antoine Pourchet <antoine@responsive.dev>
Date:   Tue May 28 20:43:30 2024 -0600

    KAFKA-15045: (KIP-924 pt. 13) AssignmentError calculation added (apache#16114)

    This PR adds the post-processing of the TaskAssignment to figure out if the new assignment is valid, and return an AssignmentError otherwise.

    Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>

commit 8d243df
Author: Antoine Pourchet <antoine@responsive.dev>
Date:   Tue May 28 19:01:18 2024 -0600

    KAFKA-15045: (KIP-924 pt. 12) Wiring in new assignment configs and logic (apache#16074)

    This PR creates the new public config of KIP-924 in StreamsConfig and uses it to instantiate user-created TaskAssignors. If such a TaskAssignor is found and successfully created we then use that assignor to perform the task assignment, otherwise we revert back to the pre KIP-924 world with the internal task assignors.

    Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Almog Gavra <almog@responsive.dev>

commit 56ee139
Author: Antoine Pourchet <antoine@responsive.dev>
Date:   Tue May 28 18:05:51 2024 -0600

    KAFKA-15045: (KIP-924 pt. 11) Implemented StickyTaskAssignor (apache#16052)

    This PR implements the StickyTaskAssignor with the new KIP 924 API.

    Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>

commit 59ba555
Author: Nick Telford <nick.telford@gmail.com>
Date:   Wed May 29 00:23:23 2024 +0100

    KAFKA-15541: Add oldest-iterator-open-since-ms metric (apache#16041)

    Part of [KIP-989](https://cwiki.apache.org/confluence/x/9KCzDw).

    This new `StateStore` metric tracks the timestamp that the oldest
    surviving Iterator was created.

    This timestamp should continue to climb, and closely track the current
    time, as old iterators are closed and new ones created. If the timestamp
    remains very low (i.e. old), that suggests an Iterator has leaked, which
    should enable users to isolate the affected store.

    It will report no data when there are no currently open Iterators.

    Reviewers: Matthias J. Sax <matthias@confluent.io>

commit 4eb60b5
Author: Frederik Rouleau <frouleau@confluent.io>
Date:   Tue May 28 23:56:47 2024 +0200

    KAFKA-16507 Add KeyDeserializationException and ValueDeserializationException with record content (apache#15691)

    Implements KIP-1036.

    Add raw ConsumerRecord data to RecordDeserialisationException to make DLQ implementation easier.

    Reviewers: Kirk True <ktrue@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax <matthias@confluent.io>

commit 4d04eb8
Author: PoAn Yang <payang@apache.org>
Date:   Wed May 29 03:13:33 2024 +0800

    KAFKA-16796 Introduce new org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder (apache#16064)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit a649bc4
Author: Luke Chen <showuon@gmail.com>
Date:   Wed May 29 00:05:49 2024 +0800

    KAFKA-16711: Make sure to update highestOffsetInRemoteStorage after log dir change (apache#15947)

    Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>

commit 64f699a
Author: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>
Date:   Tue May 28 15:22:54 2024 +0100

    KAFKA-15853: Move general configs out of KafkaConfig (apache#16040)

    Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>

commit 699438b
Author: Sanskar Jhajharia <122860866+sjhajharia@users.noreply.github.com>
Date:   Tue May 28 16:34:44 2024 +0530

    MINOR: Fix the config name in ProducerFailureHandlingTest (apache#16099)

    When moving from KafkaConfig.ReplicaFetchMaxBytesProp we used ReplicationConfigs.REPLICA_LAG_TIME_MAX_MS_CONFIG instead of ReplicationConfigs.REPLICA_FETCH_MAX_BYTES_CONFIG. This PR patches the same.

    Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>

commit a57c05b
Author: Ken Huang <100591800+m1a2st@users.noreply.github.com>
Date:   Tue May 28 17:42:33 2024 +0900

    KAFKA-16805 Stop using a ClosureBackedAction to configure Spotbugs reports (apache#16081)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 91284d8
Author: Luke Chen <showuon@gmail.com>
Date:   Tue May 28 12:23:34 2024 +0800

    KAFKA-16709: abortAndPauseCleaning only when future log is not existed (apache#15951)

    When doing alter replica logDirs, we'll create a future log and pause log cleaning for the partition( here). And this log cleaning pausing will resume after alter replica logDirs completes (here). And when in the resuming log cleaning, we'll decrement 1 for the LogCleaningPaused count. Once the count reached 0, the cleaning pause is really resuming. (here). For more explanation about the logCleaningPaused state can check here.

    But, there's still one factor that could increase the LogCleaningPaused count: leadership change (here). When there's a leadership change, we'll check if there's a future log in this partition, if so, we'll create future log and pauseCleaning (LogCleaningPaused count + 1). So, if during the alter replica logDirs:

    1. alter replica logDirs for tp0 triggered (LogCleaningPaused count = 1)
    2. tp0 leadership changed (LogCleaningPaused count = 2)
    3. alter replica logDirs completes, resuming logCleaning (LogCleaningPaused count = 1)
    4. LogCleaning keeps paused because the count is always >  0

    This PR fixes this issue by only abortAndPauseCleaning when future log is not existed. We did the same check in alterReplicaLogDirs. So this change can make sure there's only 1 abortAndPauseCleaning for either abortAndPauseCleaning or maybeAddLogDirFetchers. Tests also added.

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Igor Soarez <soarez@apple.com>

commit adab48d
Author: Greg Harris <greg.harris@aiven.io>
Date:   Mon May 27 18:33:01 2024 -0700

    MINOR: Disable JDK 11 and 17 tests on PRs (apache#16051)

    Signed-off-by: Greg Harris <greg.harris@aiven.io>
    Reviewers: Justine Olshan <jolshan@confluent.io>, David Arthur <mumrah@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>

commit bac8df5
Author: Colin P. McCabe <cmccabe@apache.org>
Date:   Mon May 27 08:53:53 2024 -0700

    MINOR: fix typo in KAFKA-16515

commit da3304e
Author: David Jacot <djacot@confluent.io>
Date:   Mon May 27 17:10:37 2024 +0200

    KAFKA-16371; fix lingering pending commit when handling OFFSET_METADATA_TOO_LARGE (apache#16072)

    This patch was initially created in apache#15536.

    When there is a commit for multiple topic partitions and some, but not all, exceed the offset metadata limit, the pending commit is not properly cleaned up leading to UNSTABLE_OFFSET_COMMIT errors when trying to fetch the offsets with read_committed. This change makes it so the invalid commits are not added to the pendingOffsetCommits set.

    Co-authored-by: Kyle Phelps <kyle.phelps@datadoghq.com>

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Justine Olshan <jolshan@confluent.io>

commit 524ad1e
Author: Kamal Chandraprakash <kchandraprakash@uber.com>
Date:   Mon May 27 15:14:23 2024 +0530

    KAFKA-16452: Don't throw OOORE when converting the offset to metadata (apache#15825)

    Don't throw OFFSET_OUT_OF_RANGE error when converting the offset to metadata, and next time the leader should increment the high watermark by itself after receiving fetch requests from followers. This can happen when checkpoint files are missing and being elected as a leader.

    Reviewers: Luke Chen <showuon@gmail.com>, Jun Rao <junrao@apache.org>

commit d9ee9c9
Author: Nick Telford <nick.telford@gmail.com>
Date:   Sat May 25 20:22:56 2024 +0100

    KAFKA-15541: Use LongAdder instead of AtomicInteger (apache#16076)

    `LongAdder` performs better than `AtomicInteger` when under contention
    from many threads. Since it's possible that many Interactive Query
    threads could create a large number of `KeyValueIterator`s, we don't
    want contention on a metric to be a performance bottleneck.

    The trade-off is memory, as `LongAdder` uses more memory to space out
    independent counters across different cache lines. In practice, I don't
    expect this to cause too many problems, as we're only constructing 1
    per-store.

    Reviewers: Matthias J. Sax <matthias@confluent.io>

commit a8d166c
Author: Ritika Reddy <98577846+rreddy-22@users.noreply.github.com>
Date:   Sat May 25 09:06:15 2024 -0700

    KAFKA-16625; Reverse lookup map from topic partitions to members (apache#15974)

    This patch speeds up the computation of the unassigned partitions by exposing the inverted target assignment. It allows the assignor to check whether a partition is assigned or not.

    Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>

commit d585a49
Author: Jeff Kim <kimkb2011@gmail.com>
Date:   Fri May 24 16:33:57 2024 -0400

    KAFKA-16831: CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size write limit (apache#16059)

    CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size write limit. Otherwise, we default the write limit to the min buffer size of 16384 for the write limit. This causes the coordinator to threw RecordTooLargeException even when it's under the 1MB max batch size limit.

    Reviewers: David Jacot <djacot@confluent.io>

commit 8eea6b8
Author: Edoardo Comar <ecomar@uk.ibm.com>
Date:   Fri May 24 20:33:00 2024 +0100

    MINOR: mention KAFKA-15905 in docs "Notable changes in 3.7.1" (apache#16070)

    * MINOR: mention KAFKA-15905 in docs "Notable changes in 3.7.1/3.8.0"

    Co-Authored-By: Adrian Preston <prestona@uk.ibm.com>

commit 4f55786
Author: Colin P. McCabe <cmccabe@apache.org>
Date:   Mon May 20 15:41:52 2024 -0700

    KAFKA-16515: Fix the ZK Metadata cache confusion between brokers and controllers

    ZkMetadataCache could theoretically return KRaft controller information from a call to
    ZkMetadataCache.getAliveBrokerNode, which doesn't make sense. KRaft controllers are not part of the
    set of brokers. The only use-case for this functionality was in MetadataCacheControllerNodeProvider
    during ZK migration, where it allowed ZK brokers in migration mode to forward requests to
    kcontrollers when appropriate. This PR changes MetadataCacheControllerNodeProvider to simply
    delegate to quorumControllerNodeProvider in this case.

    Reviewers: José Armando García Sancio <jsancio@apache.org>

commit 90892ae
Author: Colin P. McCabe <cmccabe@apache.org>
Date:   Mon May 20 16:23:27 2024 -0700

    KAFKA-16516: Fix the controller node provider for broker to control channel

    Fix the code in the RaftControllerNodeProvider to query RaftManager to find Node information,
    rather than consulting a static map. Add a RaftManager.voterNode function to supply this
    information. In KRaftClusterTest, add testControllerFailover to get more coverage of controller
    failovers.

    Reviewers: José Armando García Sancio <jsancio@apache.org>

commit 2432a18
Author: KrishVora01 <156789009+KrishVora01@users.noreply.github.com>
Date:   Fri May 24 22:21:02 2024 +0530

    KAFKA-16373: KIP-1028:  Adding code to support Apache Kafka Docker Official Images (apache#16027)

    This PR aims to add JVM based Docker Official Image for Apache Kafka as per the following KIP - https://cwiki.apache.org/confluence/display/KAFKA/KIP-1028%3A+Docker+Official+Image+for+Apache+Kafka

    This PR adds the following functionalities:
    Introduces support for Apache Kafka Docker Official Images via:

    GitHub Workflows:

    - Workflow to prepare static source files for Docker images
    - Workflow to build and test Docker official images
    - Scripts to prepare source files and perform Docker image builds and tests

    A new directory for Docker official images, named docker/docker_official_images. This is the new directory to house all Docker Official Image assets.

    Co-authored-by: Vedarth Sharma <vesharma@confluent.io>

    Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Vedarth Sharma <vesharma@confluent.io>

commit 0143c72
Author: Lianet Magrans <98415067+lianetm@users.noreply.github.com>
Date:   Fri May 24 14:19:43 2024 +0200

    KAFKA-16815: Handle FencedInstanceId in HB response (apache#16047)

    Handle FencedInstanceIdException that a consumer may receive in the heartbeat response. This will be the case when a static consumer is removed from the group by and admin client, and another member joins with the same group.instance.id (allowed in). The first member will receive a FencedInstanceId on its next heartbeat. The expectation is that this should be handled as a fatal error.

    There are no actual changes in logic with this PR, given that without being handled, the FencedInstanceId was being treated as an "unexpected error", which are all treated as fatal errors, so the outcome remains the same. But we're introducing this small change just for accuracy in the logic and the logs: FencedInstanceId is expected during heartbeat, a log line is shown describing the situation and why it happened (and it's treated as a fatal error, just like it was before this PR).

    This PR also improves the test to ensure that the error propagated to the app thread matches the one received in the HB.

    Reviewers: Andrew Schofield <aschofield@confluent.io>, David Jacot <djacot@confluent.io>

commit c5cd190
Author: Gantigmaa Selenge <39860586+tinaselenge@users.noreply.github.com>
Date:   Fri May 24 11:50:47 2024 +0100

    MINOR: Refactor SSL/SASL admin integration tests to not use a custom authorizer (apache#15377)

    Reviewers: Mickael Maison <mickael.maison@gmail.com>

commit 520aa86
Author: Jeff Kim <kimkb2011@gmail.com>
Date:   Fri May 24 03:51:50 2024 -0400

    KAFKA-16626; Lazily convert subscribed topic names to topic ids (apache#15970)

    This patch aims to remove the data structure that stores the conversion from topic names to topic ids which was taking time similar to the actual assignment computation. Instead, we reuse the already existing ConsumerGroupMember.subscribedTopicNames() and do the conversion to topic ids when the iterator is requested.

    Reviewers: David Jacot <djacot@confluent.io>

commit 6941598
Author: Krishna Agarwal <62741600+kagarwal06@users.noreply.github.com>
Date:   Fri May 24 12:16:01 2024 +0530

    KAFKA-16826: Integrate Native Docker Image with github actions (apache#16045)

    This PR integrates the Native docker image with the existing github action jobs for the jvm docker image of AK.

    The integration is done to the following actions:

    docker_build_and_test.yml: Builds the docker image and runs sanity tests and CVE scan
    docker_rc_release.yml: Builds the RC docker image for both amd and arm platform and pushes it to the dockerhub.
    docker_promote.yml: Promotes the RC docker image to the released image tag

    Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Vedarth Sharma <142404391+VedarthConfluent@users.noreply.github.com>

commit de32028
Author: Kuan-Po (Cooper) Tseng <brandboat@gmail.com>
Date:   Fri May 24 05:25:53 2024 +0800

    KAFKA-16828 RackAwareTaskAssignorTest failed (apache#16044)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 11ad5e8
Author: Greg Harris <greg.harris@aiven.io>
Date:   Thu May 23 13:23:18 2024 -0700

    MINOR: Refactor Values class to fix checkstyle, add benchmark, optimize exceptions (apache#15469)

    Signed-off-by: Greg Harris <greg.harris@aiven.io>
    Reviewers: Mickael Maison <mickael.maison@gmail.com>
chiacyu pushed a commit to chiacyu/kafka that referenced this pull request Jun 1, 2024
…ze exceptions (apache#15469)

Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: Mickael Maison <mickael.maison@gmail.com>
TaiJuWu pushed a commit to TaiJuWu/kafka that referenced this pull request Jun 8, 2024
…ze exceptions (apache#15469)

Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: Mickael Maison <mickael.maison@gmail.com>
gongxuanzhang pushed a commit to gongxuanzhang/kafka that referenced this pull request Jun 12, 2024
…ze exceptions (apache#15469)

Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: Mickael Maison <mickael.maison@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants