New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-10727; Handle Kerberos error during re-login as transient failure in clients #9605
KAFKA-10727; Handle Kerberos error during re-login as transient failure in clients #9605
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Rajini. LGTM. I was wondering if we had a similar possibility of currently-non-retriable failure with OAuth Bearer tokens, but it appears that we support multiple simultaneous tokens (see ExpiringCredentialRefreshingLogin; and OAuthBearerLoginModule sets loginRefreshReloginAllowedBeforeLogout to true) -- so what happens there is a new token is retrieved/logged-in before the first one is logged-out, and there is never a moment without valid credentials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajinisivaram Thanks for the PR. LGTM.
@omkreddy @rondagostino Thanks for the reviews. Quota test failure not related, merging to trunk. |
…re in clients (apache#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
…re in clients (apache#9605) (#508) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com> Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com>
…re in clients (apache#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
…re in clients (apache#9605) (#550) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
…re in clients (#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
…re in clients (#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
…re in clients (#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
…re in clients (apache#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
…re in clients (apache#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
…re in clients (apache#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
…re in clients (apache#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
* Update build.gradle * KAFKA-10727; Handle Kerberos error during re-login as transient failure in clients (apache#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com> * Update GssapiAuthenticationTest.scala Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com>
* MINOR: revise assertions in AbstractConfigTest (apache#9180) Reviewers: Chia-Ping Tsai <chia7712@gmail.com> * PR-9180 applied through cherry-pick * Gradle build fail fix. * Revert "Revert KAFKA-12791" This reverts commit f3cd2d7. * Revert "KAFKA-12791: ConcurrentModificationException in AbstractConfig use by KafkaProducer (apache#10704)" This reverts commit dfdf915. * sonarqube integration fix * MINOR: revise assertions in AbstractConfigTest (apache#9180) Reviewers: Chia-Ping Tsai <chia7712@gmail.com> * PR-9180 applied through cherry-pick * Update rat.gradle * MINOR: revise assertions in AbstractConfigTest (apache#9180) Reviewers: Chia-Ping Tsai <chia7712@gmail.com> * PR-9180 applied through cherry-pick * refactoring class AbstractConfig to fix the failing test. * Apache PR 10704 Applied (#3) * KAFKA-12791: ConcurrentModificationException in AbstractConfig use by KafkaProducer (apache#10704) Recently we have noticed multiple instances where KafkaProducers have failed to constructor due to the following exception: ``` org.apache.kafka.common.KafkaException: Failed to construct kafka producer at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:440) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:291) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:318) java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.util.ConcurrentModificationException at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1584) at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1607) at java.base/java.util.AbstractSet.removeAll(AbstractSet.java:171) at org.apache.kafka.common.config.AbstractConfig.unused(AbstractConfig.java:221) at org.apache.kafka.common.config.AbstractConfig.logUnused(AbstractConfig.java:379) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:433) ... 9 more exception.class:org.apache.kafka.common.KafkaException exception.message:Failed to construct kafka producer ``` This is due to the fact that `used` below is a synchronized set. `used` is being modified while removeAll is being called. This is due to the use of RecordingMap in the Sender thread (see below). Switching to a ConcurrentHashSet avoids this issue as it support concurrent iteration. ``` at org.apache.kafka.clients.producer.ProducerConfig.ignore(ProducerConfig.java:569) at org.apache.kafka.common.config.AbstractConfig$RecordingMap.get(AbstractConfig.java:638) at org.apache.kafka.common.network.ChannelBuilders.createPrincipalBuilder(ChannelBuilders.java:242) at org.apache.kafka.common.network.PlaintextChannelBuilder$PlaintextAuthenticator.<init>(PlaintextChannelBuilder.java:96) at org.apache.kafka.common.network.PlaintextChannelBuilder$PlaintextAuthenticator.<init>(PlaintextChannelBuilder.java:89) at org.apache.kafka.common.network.PlaintextChannelBuilder.lambda$buildChannel$0(PlaintextChannelBuilder.java:66) at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:174) at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:164) at org.apache.kafka.common.network.PlaintextChannelBuilder.buildChannel(PlaintextChannelBuilder.java:79) at org.apache.kafka.common.network.PlaintextChannelBuilder.buildChannel(PlaintextChannelBuilder.java:67) at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:356) at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:347) at org.apache.kafka.common.network.Selector.connect(Selector.java:274) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:1097) at org.apache.kafka.clients.NetworkClient.access$700(NetworkClient.java:87) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1276) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1164) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:637) at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:327) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:242) ``` Reviewers: Ismael Juma <ismael@juma.me.uk> * Small refactoring applied to fix bug after cherry-pick. * Update rat.gradle * KAFKA-12791: ConcurrentModificationException in AbstractConfig use by KafkaProducer (apache#10704) Recently we have noticed multiple instances where KafkaProducers have failed to constructor due to the following exception: ``` org.apache.kafka.common.KafkaException: Failed to construct kafka producer at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:440) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:291) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:318) java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.util.ConcurrentModificationException at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1584) at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1607) at java.base/java.util.AbstractSet.removeAll(AbstractSet.java:171) at org.apache.kafka.common.config.AbstractConfig.unused(AbstractConfig.java:221) at org.apache.kafka.common.config.AbstractConfig.logUnused(AbstractConfig.java:379) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:433) ... 9 more exception.class:org.apache.kafka.common.KafkaException exception.message:Failed to construct kafka producer ``` This is due to the fact that `used` below is a synchronized set. `used` is being modified while removeAll is being called. This is due to the use of RecordingMap in the Sender thread (see below). Switching to a ConcurrentHashSet avoids this issue as it support concurrent iteration. ``` at org.apache.kafka.clients.producer.ProducerConfig.ignore(ProducerConfig.java:569) at org.apache.kafka.common.config.AbstractConfig$RecordingMap.get(AbstractConfig.java:638) at org.apache.kafka.common.network.ChannelBuilders.createPrincipalBuilder(ChannelBuilders.java:242) at org.apache.kafka.common.network.PlaintextChannelBuilder$PlaintextAuthenticator.<init>(PlaintextChannelBuilder.java:96) at org.apache.kafka.common.network.PlaintextChannelBuilder$PlaintextAuthenticator.<init>(PlaintextChannelBuilder.java:89) at org.apache.kafka.common.network.PlaintextChannelBuilder.lambda$buildChannel$0(PlaintextChannelBuilder.java:66) at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:174) at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:164) at org.apache.kafka.common.network.PlaintextChannelBuilder.buildChannel(PlaintextChannelBuilder.java:79) at org.apache.kafka.common.network.PlaintextChannelBuilder.buildChannel(PlaintextChannelBuilder.java:67) at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:356) at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:347) at org.apache.kafka.common.network.Selector.connect(Selector.java:274) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:1097) at org.apache.kafka.clients.NetworkClient.access$700(NetworkClient.java:87) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1276) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1164) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:637) at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:327) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:242) ``` Reviewers: Ismael Juma <ismael@juma.me.uk> * Small refactoring applied to fix bug after cherry-pick. Co-authored-by: Lucas Bradstreet <lucas@confluent.io> Co-authored-by: dogukan <dogukan.altay@jobilla.com> Co-authored-by: MertEgeCAN <m.egecan@hotmail.com> * SonarQube Code Smell Refactorings (#4) * refactoring class ClientDnsLookup to fix the code smell. * refactoring class FetchSessionHandler to fix the code smell. * refactoring class NetworkClient to fix the code smell. * Update rat.gradle * refactoring class ClientDnsLookup to fix the code smell. * refactoring class FetchSessionHandler to fix the code smell. * refactoring class NetworkClient to fix the code smell. Co-authored-by: Jason Gustafson <jason@confluent.io> Co-authored-by: dogukan <dogukan.altay@jobilla.com> Co-authored-by: MertEgeCAN <m.egecan@hotmail.com> * Apache PR 9309 Applied (#6) * KAFKA-10503: MockProducer doesn't throw ClassCastException when no partition for topic exists (apache#9309) Reviewer: Matthias J. Sax <matthias@confluent.io> * Update rat.gradle Co-authored-by: Gonzalo Muñoz <gmunozfe@redhat.com> * Apache PR 8665 Applied (#8) * Update build.gradle * KAFKA-9984 Should fail the subscription when pattern is empty (apache#8665) Reviewers: Boyang Chen <boyang@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Matthias J. Sax <matthias@confluent.io> Co-authored-by: zhaohaidao <zhaohaidao2008@hotmail.com> * SonarQube Code Smell: ClusterConnectionStates.java (#9) * Update build.gradle * Code smells fix * Update ClusterConnectionStates.java * Update ClusterConnectionStates.java * Apache PR 9605 Applied (#10) * Update build.gradle * KAFKA-10727; Handle Kerberos error during re-login as transient failure in clients (apache#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com> * Update GssapiAuthenticationTest.scala Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com> * MINOR: revise assertions in AbstractConfigTest (apache#9180) Reviewers: Chia-Ping Tsai <chia7712@gmail.com> Co-authored-by: Sanket Fajage <23031210+sanketfajage@users.noreply.github.com> Co-authored-by: dogukan <dogukan.altay@jobilla.com> Co-authored-by: MertEgeCAN <m.egecan@hotmail.com> Co-authored-by: Lucas Bradstreet <lucas@confluent.io> Co-authored-by: Jason Gustafson <jason@confluent.io> Co-authored-by: Gonzalo Muñoz <gmunozfe@redhat.com> Co-authored-by: zhaohaidao <zhaohaidao2008@hotmail.com> Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com>
* KAFKA-3720 cherry-pick * small fix. * KAFKA-3720 cherry-pick * small fix. * Update rat.gradle * KAFKA-3720 cherry-pick * small fix. * refactoring class AbstractConfig to fix the failing test. * refactoring class PlaintextProducerSendTest.scala to fix the failing test. * refactoring class PlaintextProducerSendTest.scala to fix the failing test. refactoring class BaseProducerSendTest.scala to fix the failing test. * revert refactoring on core changes. * Apache PR 10704 Applied (#3) * KAFKA-12791: ConcurrentModificationException in AbstractConfig use by KafkaProducer (apache#10704) Recently we have noticed multiple instances where KafkaProducers have failed to constructor due to the following exception: ``` org.apache.kafka.common.KafkaException: Failed to construct kafka producer at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:440) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:291) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:318) java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.util.ConcurrentModificationException at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1584) at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1607) at java.base/java.util.AbstractSet.removeAll(AbstractSet.java:171) at org.apache.kafka.common.config.AbstractConfig.unused(AbstractConfig.java:221) at org.apache.kafka.common.config.AbstractConfig.logUnused(AbstractConfig.java:379) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:433) ... 9 more exception.class:org.apache.kafka.common.KafkaException exception.message:Failed to construct kafka producer ``` This is due to the fact that `used` below is a synchronized set. `used` is being modified while removeAll is being called. This is due to the use of RecordingMap in the Sender thread (see below). Switching to a ConcurrentHashSet avoids this issue as it support concurrent iteration. ``` at org.apache.kafka.clients.producer.ProducerConfig.ignore(ProducerConfig.java:569) at org.apache.kafka.common.config.AbstractConfig$RecordingMap.get(AbstractConfig.java:638) at org.apache.kafka.common.network.ChannelBuilders.createPrincipalBuilder(ChannelBuilders.java:242) at org.apache.kafka.common.network.PlaintextChannelBuilder$PlaintextAuthenticator.<init>(PlaintextChannelBuilder.java:96) at org.apache.kafka.common.network.PlaintextChannelBuilder$PlaintextAuthenticator.<init>(PlaintextChannelBuilder.java:89) at org.apache.kafka.common.network.PlaintextChannelBuilder.lambda$buildChannel$0(PlaintextChannelBuilder.java:66) at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:174) at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:164) at org.apache.kafka.common.network.PlaintextChannelBuilder.buildChannel(PlaintextChannelBuilder.java:79) at org.apache.kafka.common.network.PlaintextChannelBuilder.buildChannel(PlaintextChannelBuilder.java:67) at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:356) at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:347) at org.apache.kafka.common.network.Selector.connect(Selector.java:274) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:1097) at org.apache.kafka.clients.NetworkClient.access$700(NetworkClient.java:87) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1276) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1164) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:637) at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:327) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:242) ``` Reviewers: Ismael Juma <ismael@juma.me.uk> * Small refactoring applied to fix bug after cherry-pick. * Update rat.gradle * KAFKA-12791: ConcurrentModificationException in AbstractConfig use by KafkaProducer (apache#10704) Recently we have noticed multiple instances where KafkaProducers have failed to constructor due to the following exception: ``` org.apache.kafka.common.KafkaException: Failed to construct kafka producer at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:440) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:291) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:318) java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.util.ConcurrentModificationException at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1584) at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1607) at java.base/java.util.AbstractSet.removeAll(AbstractSet.java:171) at org.apache.kafka.common.config.AbstractConfig.unused(AbstractConfig.java:221) at org.apache.kafka.common.config.AbstractConfig.logUnused(AbstractConfig.java:379) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:433) ... 9 more exception.class:org.apache.kafka.common.KafkaException exception.message:Failed to construct kafka producer ``` This is due to the fact that `used` below is a synchronized set. `used` is being modified while removeAll is being called. This is due to the use of RecordingMap in the Sender thread (see below). Switching to a ConcurrentHashSet avoids this issue as it support concurrent iteration. ``` at org.apache.kafka.clients.producer.ProducerConfig.ignore(ProducerConfig.java:569) at org.apache.kafka.common.config.AbstractConfig$RecordingMap.get(AbstractConfig.java:638) at org.apache.kafka.common.network.ChannelBuilders.createPrincipalBuilder(ChannelBuilders.java:242) at org.apache.kafka.common.network.PlaintextChannelBuilder$PlaintextAuthenticator.<init>(PlaintextChannelBuilder.java:96) at org.apache.kafka.common.network.PlaintextChannelBuilder$PlaintextAuthenticator.<init>(PlaintextChannelBuilder.java:89) at org.apache.kafka.common.network.PlaintextChannelBuilder.lambda$buildChannel$0(PlaintextChannelBuilder.java:66) at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:174) at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:164) at org.apache.kafka.common.network.PlaintextChannelBuilder.buildChannel(PlaintextChannelBuilder.java:79) at org.apache.kafka.common.network.PlaintextChannelBuilder.buildChannel(PlaintextChannelBuilder.java:67) at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:356) at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:347) at org.apache.kafka.common.network.Selector.connect(Selector.java:274) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:1097) at org.apache.kafka.clients.NetworkClient.access$700(NetworkClient.java:87) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1276) at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1164) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:637) at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:327) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:242) ``` Reviewers: Ismael Juma <ismael@juma.me.uk> * Small refactoring applied to fix bug after cherry-pick. Co-authored-by: Lucas Bradstreet <lucas@confluent.io> Co-authored-by: dogukan <dogukan.altay@jobilla.com> Co-authored-by: MertEgeCAN <m.egecan@hotmail.com> * SonarQube Code Smell Refactorings (#4) * refactoring class ClientDnsLookup to fix the code smell. * refactoring class FetchSessionHandler to fix the code smell. * refactoring class NetworkClient to fix the code smell. * Update rat.gradle * refactoring class ClientDnsLookup to fix the code smell. * refactoring class FetchSessionHandler to fix the code smell. * refactoring class NetworkClient to fix the code smell. Co-authored-by: Jason Gustafson <jason@confluent.io> Co-authored-by: dogukan <dogukan.altay@jobilla.com> Co-authored-by: MertEgeCAN <m.egecan@hotmail.com> * Apache PR 9309 Applied (#6) * KAFKA-10503: MockProducer doesn't throw ClassCastException when no partition for topic exists (apache#9309) Reviewer: Matthias J. Sax <matthias@confluent.io> * Update rat.gradle Co-authored-by: Gonzalo Muñoz <gmunozfe@redhat.com> * Apache PR 8665 Applied (#8) * Update build.gradle * KAFKA-9984 Should fail the subscription when pattern is empty (apache#8665) Reviewers: Boyang Chen <boyang@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Matthias J. Sax <matthias@confluent.io> Co-authored-by: zhaohaidao <zhaohaidao2008@hotmail.com> * SonarQube Code Smell: ClusterConnectionStates.java (#9) * Update build.gradle * Code smells fix * Update ClusterConnectionStates.java * Update ClusterConnectionStates.java * Apache PR 9605 Applied (#10) * Update build.gradle * KAFKA-10727; Handle Kerberos error during re-login as transient failure in clients (apache#9605) We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com> * Update GssapiAuthenticationTest.scala Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com> * KAFKA-3720 cherry-pick * refactoring class PlaintextProducerSendTest.scala to fix the failing test. * refactoring class PlaintextProducerSendTest.scala to fix the failing test. refactoring class BaseProducerSendTest.scala to fix the failing test. * revert refactoring on core changes. Co-authored-by: Sönke Liebau <soenke.liebau@opencore.com> Co-authored-by: dogukan <dogukan.altay@jobilla.com> Co-authored-by: MertEgeCAN <m.egecan@hotmail.com> Co-authored-by: Lucas Bradstreet <lucas@confluent.io> Co-authored-by: Jason Gustafson <jason@confluent.io> Co-authored-by: Gonzalo Muñoz <gmunozfe@redhat.com> Co-authored-by: zhaohaidao <zhaohaidao2008@hotmail.com> Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com>
…sient failure in clients" Integrated PR from apache/kafka: apache#9605
We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this PR treats NO_CRED as a transient failure rather than a fatal authentication exception in clients.
Committer Checklist (excluded from commit message)