Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-14593: Move LeaderElectionCommand to tools #13204

Merged
merged 7 commits into from
Oct 3, 2023

Conversation

OmniaGM
Copy link
Contributor

@OmniaGM OmniaGM commented Feb 6, 2023

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@OmniaGM OmniaGM force-pushed the KAFKA-14593 branch 3 times, most recently from 5828376 to 8b8c1f4 Compare February 6, 2023 23:52
Copy link
Collaborator

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OmniaGM, thanks for working on this.

I think you need to rebase, as I get lots of checkstyle errors that seems unrelated to your changes.

Then, I leaved some comments and suggestions.

build.gradle Outdated
@@ -1756,6 +1756,7 @@ project(':tools') {

dependencies {
implementation project(':clients')
implementation project(':core')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction from previous migrations is that we don't want to depend on core. I see there are a few kafka.* dependencies in LeaderElectionCommand class. Do you think we can also migrate them as part of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be bit harder to move them to tools. For example the command uses kafka.utils.json which is used by kafka.server, kafka.security.authorizer. And kafka.admin.AdminOperationException is used by kafka.admin, kafka.controller, kafka.zk, etc. So not sure it's easy to move them out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes may be candidates to land in the module server-common?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid making tools depend on core if possible. I opened https://issues.apache.org/jira/browse/KAFKA-14737 to move the kafka.utils.json classes to server-common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the dependencies on the core now. Out of curiosity do we plan to move kafka.utils.TestUtils out of core and maybe into common at some point?

@OmniaGM OmniaGM force-pushed the KAFKA-14593 branch 2 times, most recently from bd21b9d to 817730d Compare February 12, 2023 22:05
@github-actions
Copy link

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch)
If this PR is no longer valid or desired, please feel free to close it. If no activity occurrs in the next 30 days, it will be automatically closed.

@github-actions github-actions bot added the stale Stale PRs label Jun 10, 2023
@github-actions github-actions bot removed the stale Stale PRs label Jun 22, 2023
@OmniaGM OmniaGM requested a review from fvaleri August 2, 2023 15:11
Copy link
Collaborator

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OmniaGM, thanks. I left few minor comments, but I think we are almost there with this.

Copy link
Collaborator

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It looks good overall, I left a few suggestions to cleanup some logic.

@OmniaGM OmniaGM force-pushed the KAFKA-14593 branch 2 times, most recently from b2bcafd to 638f48a Compare August 15, 2023 11:35
@OmniaGM
Copy link
Contributor Author

OmniaGM commented Aug 17, 2023

The test is failing because of an unrelated package :storage:test with JDK 11 and Scala 2.13.

@mimaison
Copy link
Member

I've re-kicked a build, hopefully that will clear the test failures.

@mimaison
Copy link
Member

Still with all the LeaderElectionCommandTest failures
https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-13204/15/testReport/

@OmniaGM
Copy link
Contributor Author

OmniaGM commented Aug 21, 2023

Still with all the LeaderElectionCommandTest failures https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-13204/15/testReport/

I will have a look into the LeaderElectionCommandTest ones.

@OmniaGM
Copy link
Contributor Author

OmniaGM commented Aug 22, 2023

@mimaison I can't reproduce the problem locally. I ran ./gradlew -PscalaVersion=2.13 test --profile --continue -PkeepAliveMode=session -PtestLoggingEvents=started,passed,skipped,failed -PignoreFailures=true -PmaxParallelForks=2 -PmaxTestRetries=1 -PmaxTestRetryFailures=10 but I get different set of failed tests. Any suggestions regards how to reproduce this locally

@mimaison
Copy link
Member

To be honest, I'm not sure. Try rebasing on trunk. Maybe there's a conflict with a recent commit.

@OmniaGM OmniaGM force-pushed the KAFKA-14593 branch 2 times, most recently from e0f51e5 to 926dec0 Compare August 23, 2023 09:04
@OmniaGM
Copy link
Contributor Author

OmniaGM commented Aug 23, 2023

I managed to reproduce some of the failed tests locally. Seems like any test that includes shutting down a broker and bringing it back up causes an odd situation where the leader of the topic shrinks the ISR to itself even though there is another replica online.
This is impacting LeaderElectionCommandTest and PlaintextAdminIntegrationTest

So far the only thing I'm noticing is that every time the test restarts a broker it gets these 2 errors

 ERROR [Broker id=1] Error while processing LeaderAndIsr request with correlationId 11 received from controller 1 epoch 2 (state.change.logger:76)
java.lang.NoClassDefFoundError: org/apache/kafka/server/log/remote/storage/RemoteStorageException
	at kafka.server.ReplicaFetcherThread.<init>(ReplicaFetcherThread.scala:40)
	at kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:50)
	at kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:26)
	at kafka.server.AbstractFetcherManager.addAndStartFetcherThread$1(AbstractFetcherManager.scala:136)
	at kafka.server.AbstractFetcherManager.$anonfun$addFetcherForPartitions$3(AbstractFetcherManager.scala:152)
	at kafka.server.AbstractFetcherManager.$anonfun$addFetcherForPartitions$3$adapted(AbstractFetcherManager.scala:142)
	at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
	at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
	at scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:903)
	at kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:142)
	at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:2098)
	at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:284)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:184)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:149)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.server.log.remote.storage.RemoteStorageException
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
	... 16 more

and

ERROR [KafkaApi-0] Unexpected error handling request RequestHeader(apiKey=LEADER_AND_ISR, apiVersion=7, clientId=1, correlationId=1, headerVersion=2) -- LeaderAndIsrRequestData(controllerId=1, isKRaftController=false, controllerEpoch=2, brokerEpoch=126, type=0, ungroupedPartitionStates=[], topicStates=[LeaderAndIsrTopicState(topicName='__consumer_offsets', topicId=I8g-vLMFQLW9-P5tM1sIAA, partitionStates=[LeaderAndIsrPartitionState(topicName='__consumer_offsets', partitionIndex=3, controllerEpoch=2, leader=2, leaderEpoch=3, isr=[2], partitionEpoch=3, replicas=[2, 0], addingReplicas=[], removingReplicas=[], isNew=false, leaderRecoveryState=0), LeaderAndIsrPartitionState(topicName='__consumer_offsets', partitionIndex=4, controllerEpoch=2, leader=-1, leaderEpoch=2, isr=[0], partitionEpoch=2, replicas=[0, 1], addingReplicas=[], removingReplicas=[], isNew=false, leaderRecoveryState=0), LeaderAndIsrPartitionState(topicName='__consumer_offsets', partitionIndex=1, controllerEpoch=2, leader=2, leaderEpoch=3, isr=[2], partitionEpoch=3, replicas=[0, 2], addingReplicas=[], removingReplicas=[], isNew=false, leaderRecoveryState=0), LeaderAndIsrPartitionState(topicName='__consumer_offsets', partitionIndex=2, controllerEpoch=2, leader=-1, leaderEpoch=2, isr=[0], partitionEpoch=2, replicas=[1, 0], addingReplicas=[], removingReplicas=[], isNew=false, leaderRecoveryState=0)])], liveLeaders=[LeaderAndIsrLiveLeader(brokerId=2, hostName='localhost', port=56589)]) with context RequestContext(header=RequestHeader(apiKey=LEADER_AND_ISR, apiVersion=7, clientId=1, correlationId=1, headerVersion=2), connectionId='127.0.0.1:56583-127.0.0.1:56622-2', clientAddress=/127.0.0.1, principal=User:ANONYMOUS, listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, clientInformation=ClientInformation(softwareName=unknown, softwareVersion=unknown), fromPrivilegedListener=true, principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@4a30c780]) (kafka.server.KafkaApis:76)
java.lang.NoClassDefFoundError: org/apache/kafka/server/log/remote/storage/RemoteStorageException
	at kafka.server.ReplicaFetcherThread.<init>(ReplicaFetcherThread.scala:40)
	at kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:50)
	at kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:26)
	at kafka.server.AbstractFetcherManager.addAndStartFetcherThread$1(AbstractFetcherManager.scala:136)
	at kafka.server.AbstractFetcherManager.$anonfun$addFetcherForPartitions$3(AbstractFetcherManager.scala:152)
	at kafka.server.AbstractFetcherManager.$anonfun$addFetcherForPartitions$3$adapted(AbstractFetcherManager.scala:142)
	at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
	at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
	at scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:903)
	at kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:142)
	at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:2098)
	at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:284)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:184)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:149)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.server.log.remote.storage.RemoteStorageException
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
	... 16 more

Am not sure what changed but the broker is failing to start ReplicaFetcherThread.

@OmniaGM
Copy link
Contributor Author

OmniaGM commented Sep 13, 2023

I found out why the tests were failing because we are hitting a similar problem to this gradle issue#847 that causing transitive dependency problems where storage:api and connect:api were causing unintended conflicts. The gradle issue seems to not be 100% solved so I pushed a fix with a workaround that renames the storage:api project to storage:storage-api. This shouldn't impact the name of the final jar.
I pushed the changes and am waiting for the pipelines to pass.

@OmniaGM
Copy link
Contributor Author

OmniaGM commented Sep 14, 2023

The test is failing for different packages https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-13204/19/testReport/ and one of the tasks succeeded in building but failed with an error

Publishing failed.

The response from https://ge.apache.org/scans/publish/gradle/3.14.1/token was not from Gradle Enterprise.
The specified server address may be incorrect, or your network environment may be interfering.

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and for figuring out the dependency issue. LGTM

@mimaison mimaison merged commit 7553d3f into apache:trunk Oct 3, 2023
1 check failed
@raboof raboof mentioned this pull request Oct 3, 2023
3 tasks
@nizhikov
Copy link
Contributor

nizhikov commented Oct 3, 2023

Hello @OmniaGM @mimaison @fvaleri

Looks like 7553d3f overlaps with 8f8dbad and break trunk compilation.

I prepared #14475 to fix compilation. Please, take a look.

k-wall pushed a commit to k-wall/kafka that referenced this pull request Nov 21, 2023
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
AnatolyPopov pushed a commit to aiven/kafka that referenced this pull request Feb 16, 2024
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants