Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-14702: Extend server side assignor to support rack aware replica placement #14099

Merged
merged 15 commits into from Jul 28, 2023

Conversation

rreddy-22
Copy link
Contributor

@rreddy-22 rreddy-22 commented Jul 25, 2023

This patch introduces the SubscribedTopicDescriber interface to the group-coordinator module and changes the necessary files to facilitate the assignors in being rack aware. Existing range assignor now has facilities to test for rack awareness in the future, whenever that is implemented.
In order to obtain and provide the rack information from the topic and cluster metadata to the assignors the following modifications were made:-

  1. SubcribedTopicDescribor - Interface passed to the assignor to obtain topic and partition metadata.
    - Assign function in PartitionAssignor interface now has two arguments, Assignment Spec and
    SubscribedTopicDescriber.
    - Removed the AssignmentTopicMetadata class and removed the topics attribute from the AssignmentSpec, all topic
    metadata will now be provided by the SubscribedTopicDescriber.
  2. SubcribedTopicMetadata - Implementation of the interface above.
  3. Add partitionMetadata field to the ConsumerGroupPartitionMetadataValue.json, this will be empty if rack info doesn't exist.
  4. TopicMetadata has a new attribute Map<Integer, Set> partitionRackInfo - Map of partition Id to a set of its racks.

New PR with all the previous commits squashed -
Link to Old PR = #13998

rreddy-22 and others added 4 commits July 24, 2023 19:18
commit 79b8c96
Author: David Mao <47232755+splett2@users.noreply.github.com>
Date:   Mon Jul 24 13:22:25 2023 -0700

    KAFKA-14990: Dynamic producer ID expiration should be applied on a broker restart (apache#13707)

    Dynamic overrides for the producer ID expiration config are not picked up on broker restart in Zookeeper mode. Based on the integration test, this does not apply to KRaft mode.

    Adds a broker restart that fails without the corresponding KafkaConfig change.

    Reviewers: Justine Olshan <jolshan@confluent.io>

commit 38781f9
Author: Justine Olshan <jolshan@confluent.io>
Date:   Mon Jul 24 13:08:57 2023 -0700

    KAFKA-14920: Address timeouts and out of order sequences (apache#14033)

    When creating a verification state entry, we also store sequence and epoch. On subsequent requests, we will take the latest epoch seen and the earliest sequence seen. That way, if we try to append a sequence after the earliest seen sequence, we can block that and retry. This addresses potential OutOfOrderSequence loops caused by errors during verification (coordinator loading, timeouts, etc).

    Reviewers:  David Jacot <david.jacot@gmail.com>,  Artem Livshits <alivshits@confluent.io>
commit 71f8488e5bf30af6f4a465d1fac52ccb9a341396
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Jul 25 15:22:10 2023 -0700

    Deleted PartitionMetadata and added TopicMetadata to the SubscribedTopicMetadata, PR comments

commit 5a935324c11e6f6309ee5451f90ed5052276a734
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Jul 25 10:51:11 2023 -0700

    Minor

commit 1c955b5e251aedc174cf28f43037f08162d89b49
Merge: 2d40125fa8 81f1ccd7a2
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Jul 24 13:02:54 2023 -0700

    Merge branch 'rreddy-22/Rack-Assignor-Interface-Changes' of github.com:rreddy-22/kafka-rreddy into rreddy-22/Rack-Assignor-Interface-Changes

commit 81f1ccd7a25d85eaf1cff713df402e0100d4b4a3
Merge: a39e76f63f 84691b11f6
Author: Ritika Reddy <98577846+rreddy-22@users.noreply.github.com>
Date:   Mon Jul 24 12:56:40 2023 -0700

    Merge branch 'apache:trunk' into rreddy-22/Rack-Assignor-Interface-Changes

commit 2d40125fa8ba16b4f07e6bfe410ceacfd8a112e8
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Jul 24 11:45:21 2023 -0700

    minor

commit a39e76f63f312ecacdcebd2c5015777e160ca0b3
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Jul 24 10:04:10 2023 -0700

    reverted reviewers.py changes

commit c0b464c8936f1df4f6540b8abe7c1fa88fce6e81
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Jul 24 00:02:03 2023 -0700

    reverted grammar changes

commit 6fe72dfce921ab9d77ba6b28d4104c5d92c6dbf3
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Sun Jul 23 23:56:35 2023 -0700

    minor

commit a30a53463e55b72699617360d1c888b7ee310336
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Sun Jul 23 23:55:35 2023 -0700

    minor

commit 57b9a6233a4f697f285b85e6394d3dd407f16404
Merge: 061dac797b 4981fa939d
Author: Ritika Reddy <98577846+rreddy-22@users.noreply.github.com>
Date:   Sun Jul 23 19:52:07 2023 -0700

    Merge branch 'apache:trunk' into rreddy-22/Rack-Assignor-Interface-Changes

commit 061dac797bae56d91659fbc8d7900117eb10a866
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Sun Jul 23 19:43:12 2023 -0700

    Moved SubscribedTopicMetadata to the consumer package

commit 6650ea5cd707f03713fc2c69ba76fafd56d886b8
Merge: 22f2df02de faafed25a1
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Sun Jul 23 19:33:17 2023 -0700

    Merge remote-tracking branch 'origin/rreddy-22/Rack-Assignor-Interface-Changes' into rreddy-22/Rack-Assignor-Interface-Changes

commit 22f2df02deabe1a6b5b8b8cceb432486b9ef4470
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Sun Jul 23 19:25:11 2023 -0700

    minor changes

commit faafed25a10d7ff79fb8a24ab62aa222c0e6516a
Merge: 4511683d4e 1bf73d89d0
Author: Ritika Reddy <98577846+rreddy-22@users.noreply.github.com>
Date:   Fri Jul 21 12:50:59 2023 -0700

    Merge branch 'trunk' into rreddy-22/Rack-Assignor-Interface-Changes

commit 4511683d4ef489cb979769ff6775dfe0250cdeeb
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Fri Jul 21 03:09:27 2023 -0700

    Changes based on PR comments

commit 579bb948fcc78afd163410a5bc43832bc87aaf71
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed Jul 19 14:45:18 2023 -0700

    New interfaces and classes added to facilitate passing rack information to the assignor, modified tests to incorporate changes

commit f68471a99f2e0260b36047a7596b0f9de48794ad
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Jul 13 11:55:20 2023 -0700

    Small edits

commit c50f3bdbf79f1ca127355d2a9002b1fdbf0baa2f
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Jul 11 16:30:32 2023 -0700

    Removed extra lines

commit 478488e0943aa879b27eec5ae5eb79a23dd9cab3
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Jul 11 16:21:54 2023 -0700

    Added rack interface changes, added a new TopicAndClusterMetadata class to handle topic and cluster images and added an AbstractPartitionAssignor

commit ce540f943b8dae0786a87e51cafacb66efd1e64d
Merge: 9815d6db04 fd5b300b57
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Fri Jul 7 12:45:27 2023 -0600

    Merge remote-tracking branch 'upstream/trunk' into rreddy-22/KAFKA-14515-Optimized

commit 9815d6db04c264456b7ead7fc5f735513a868c7d
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Fri Jul 7 12:23:00 2023 -0600

    Abstract Partition Assignor

commit e9ebc4f0a48c5ea08391377ffcdb01c375f2be14
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Jun 29 16:36:53 2023 -0700

    Refactored code to reduce format conversion time

commit 931bcf7f82f9ef9a1b2f0be4b7d92c7f226ace08
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed May 31 16:21:01 2023 -0700

    test changes and code changes

commit e1a032579129a2cd3aae1e93f46570bd57d3343c
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue May 16 09:28:57 2023 -0700

    small changes

commit 1c2475b98f08b92bf876f2ca4eb979a2fca5848a
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu May 11 20:36:59 2023 -0700

    Squashed commit of the following:

    commit c757af5f7c630d532bfee5f6dc45aec603ad8a29
    Author: Federico Valeri <fedevaleri@gmail.com>
    Date:   Fri May 12 04:39:12 2023 +0200

        KAFKA-14752: Kafka examples improvements - demo changes (#13517)

        KAFKA-14752: Kafka examples improvements - demo changes

        Reviewers: Luke Chen <showuon@gmail.com>

    commit 54a4067f81e1434d956ef797274f7b437fe49ea1
    Author: Kamal Chandraprakash <kchandraprakash@uber.com>
    Date:   Thu May 11 21:49:21 2023 +0530

        KAFKA-14559: Fix JMX tool to handle the object names with wildcard and optional attributes (#13060)

        Reviewers: Federico Valeri <fedevaleri@gmail.com>, Satish Duggana <satishd@apache.org>

    commit bd65db82b4bad623b0bb31398979e466978148da
    Author: Josep Prat <josep.prat@aiven.io>
    Date:   Thu May 11 17:55:26 2023 +0200

        MINOR: clean up unused methods in core utils (#13706)

        Reviewers: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Mickael Maison <mimaison@apache.org>

    commit 38adb3956979d7025d148612975cc0b82200b2e1
    Author: Jeff Kim <kimkb2011@gmail.com>
    Date:   Thu May 11 07:51:41 2023 -0400

        MINOR: add test tag for testDeadToDeadIllegalTransition (#13694)

        Reviewers: David Jacot <djacot@confluent.io>

    commit ee4132863553fb4fd2df715b2fbd77f349f978b8
    Author: Federico Valeri <fedevaleri@gmail.com>
    Date:   Thu May 11 11:19:32 2023 +0200

        KAFKA-14752: Kafka examples improvements - processor changes (#13516)

        Reviewers: Luke Chen <showuon@gmail.com>

    commit a263627adb75f1ca5c87f1482cc70b994ba49d63
    Author: Mickael Maison <mimaison@users.noreply.github.com>
    Date:   Thu May 11 11:02:45 2023 +0200

        MINOR: Remove unused methods in CoreUtils (#13170)

        Reviewers: Josep Prat <josep.prat@aiven.io>, Christo Lolov <christololov@gmail.com>

    commit 920a3601ffa7266edf12f3559cfe97c8a5929d03
    Author: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>
    Date:   Thu May 11 04:13:29 2023 +0200

        MINOR: fix a small typo in SharedServer.scala (#13693)

        Diabled -> Disabled

    commit 6d2ad4a38340176184e3f19027b5e0e024c1f2cc
    Author: A. Sophie Blee-Goldman <sophie@confluent.io>
    Date:   Wed May 10 13:39:15 2023 -0700

        HOTFIX: fix the VersionedKeyValueToBytesStoreAdapter#isOpen API (#13695)

        The VersionedKeyValueToBytesStoreAdapter#isOpen API accidentally returns the value of inner.persistent() when it should be returning inner.isOpen()

        Reviewers: Matthias J. Sax <mjsax@apache.org>, Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>, Victoria Xia <victoria.xia@confluent.io>

    commit f17fb75b2de32512f14cb94a7d1bfb0f37485780
    Author: Dániel Urbán <48119872+urbandan@users.noreply.github.com>
    Date:   Wed May 10 16:41:52 2023 +0200

        KAFKA-14978 ExactlyOnceWorkerSourceTask should remove parent metrics (#13690)

        Reviewers: Chris Egerton <chrise@aiven.io>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>

    commit 4653507926a42dccda5c086fcae6278afcfc53ca
    Author: Ritika Reddy <98577846+rreddy-22@users.noreply.github.com>
    Date:   Wed May 10 05:09:12 2023 -0700

        KAFKA-14514; Add Range Assignor on the Server (KIP-848) (#13443)

        This patch adds the RangeAssignor on the server for KIP-848. This range assignor is very different from the old client side implementation. We added functionality to make these assignments sticky while also inheriting crucial properties of the range assignor such as facilitating joins and distributing partitions of a topic somewhat equally amongst its subscribers.

        Reviewers: Philip Nee <philipnee@gmail.com>, Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>

    commit 625ef176ee5f167786003c2d88498632b0b7014b
    Author: Luke Chen <showuon@gmail.com>
    Date:   Wed May 10 16:40:20 2023 +0800

        MINOR: remove kraft readme link (#13691)

        The config/kraft/README.md is already removed. We should also remove the link.

        Reviewers: dengziming <dengziming1993@gmail.com>

    commit 228434d23583189cdcaa7f4a90ebb178ccc17c73
    Author: Jeff Kim <kimkb2011@gmail.com>
    Date:   Tue May 9 10:49:27 2023 -0400

        KAFKA-14500; [1/N] Rewrite MemberMetadata in Java (#13644)

        This patch adds GenericGroupMember which is a rewrite of MemberMetadata in Java.

        Reviewers: David Jacot <djacot@confluent.io>

    commit 59ba9dbbc927ddc8660d0d98d9422909fd306758
    Author: Yash Mayya <yash.mayya@gmail.com>
    Date:   Tue May 9 17:58:45 2023 +0530

        KAFKA-14974: Restore backward compatibility in KafkaBasedLog (#13688)

        `KafkaBasedLog` is a widely used utility class that provides a generic implementation of a shared, compacted log of records in a Kafka topic. It isn't in Connect's public API, but has been used outside of Connect and we try to preserve backward compatibility whenever possible. KAFKA-14455 modified the two overloaded void `KafkaBasedLog::send` methods to return a `Future`. While this change is source compatible, it isn't binary compatible. We can restore backward compatibility simply by renaming the new Future returning send methods, and reinstating the older send methods to delegate to the newer methods.

        This refactoring changes no functionality other than restoring the older methods.

        Reviewers: Randall Hauch <rhauch@gmail.com>

    commit b40a7fc037bb1543c3355fad9c71570f770f5177
    Author: Matthias J. Sax <matthias@confluent.io>
    Date:   Mon May 8 14:24:11 2023 -0700

        HOTFIX: fix broken Streams upgrade system test (#13654)

        Reviewers: Victoria Xia <victoria.xia@confluent.io>, John Roesler <john@confluent.io>

    commit 7634eee2627da39937e3112ffc58bd7cfedc98f2
    Author: David Jacot <djacot@confluent.io>
    Date:   Mon May 8 20:46:07 2023 +0200

        KAFKA-14462; [11/N] Add CurrentAssignmentBuilder (#13638)

        This patch adds the `CurrentAssignmentBuilder` class which encapsulates the reconciliation engine of the consumer group protocol. Given the current state of a member and a desired or target assignment state, the state machine takes the necessary steps to converge the member to its desired state.

        Reviewers: Ritika Reddy <rreddy@confluent.io>, Calvin Liu <caliu@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>

    commit 86daf8ce6573eb79d6e78381dbab738055f914c4
    Author: vamossagar12 <sagarmeansocean@gmail.com>
    Date:   Mon May 8 20:09:47 2023 +0530

        KAFKA-14913: Using ThreadUtils.shutdownExecutorServiceQuietly to close executors in Connect Runtime (#13594)

        #13557 introduced a utils method to close executors silently. This PR leverages that method to close executors in connect runtime. There was duplicate code while closing the executors which isn't the case with this PR.

        Note that there are a few more executors used in Connect runtime but their close methods don't follow this pattern of shutdown, await and shutdown. Some of them have some logic like executor like Worker, so not changing at such places.

        ---------

        Co-authored-by: Sagar Rao <sagarrao@Sagars-MacBook-Pro.local>

        Reviewers: Daniel Urban <durban@cloudera.com>, Yash Mayya <yash.mayya@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>

    commit 2b98f8553ba12ab5d8cb88f5cd0d1198cb424df6
    Author: Christo Lolov <lolovc@amazon.com>
    Date:   Mon May 8 15:24:52 2023 +0100

        KAFKA-14133: Migrate ChangeLogReader mock in TaskManagerTest to Mockito (#13621)

        Migrates ChangeLogReader mock in TaskManagerTest to mockito.

        Reviewer: Bruno Cadonna <cadonna@apache.org>

    commit 347238948b86882a47faee4a2916d1b01333d95f
    Author: Gantigmaa Selenge <39860586+tinaselenge@users.noreply.github.com>
    Date:   Mon May 8 07:36:36 2023 +0100

        KAFKA-14662: Update the ACL list in the doc (#13660)

        Added the missing ACLs to the doc.

        Reviewers: Luke Chen <showuon@gmail.com>

    commit a556def5724efb1dc96bd2d389411a8d2f802f53
    Author: Divij Vaidya <diviv@amazon.com>
    Date:   Mon May 8 08:35:14 2023 +0200

        MINOR: Print the cause of failure for test (#13672)

        Motivation

        PlaintextAdminIntegrationTest fails in a flaky manner with the follow trace (e.g. in this build):

        org.opentest4j.AssertionFailedError: expected: <false> but was: <true>
        	at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        	at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        	at org.junit.jupiter.api.AssertFalse.failNotFalse(AssertFalse.java:63)
        	at org.junit.jupiter.api.AssertFalse.assertFalse(AssertFalse.java:36)
        	at org.junit.jupiter.api.AssertFalse.assertFalse(AssertFalse.java:31)
        	at org.junit.jupiter.api.Assertions.assertFalse(Assertions.java:228)
        	at kafka.api.PlaintextAdminIntegrationTest.testElectUncleanLeadersForOnePartition(PlaintextAdminIntegrationTest.scala:1583)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        The std output doesn't contain useful information that we could use to debug the cause of failure. This is because the test, currently, validates if there is an exception and fails when one is present. It does not print what the exception is.
        Change

            1. Make the test a bit more robust by waiting for server startup.
            2. Fail the test with the actual unexpected exception which will help in debugging the cause of failure.

        Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>

    commit 2607e0edb7c165bf2c340e81c2a39d7bb3b63fbf
    Author: Federico Valeri <fedevaleri@gmail.com>
    Date:   Mon May 8 08:33:31 2023 +0200

        MINOR: Fix producer Callback comment (#13669)

        Fixes the wrong exception name: OffsetMetadataTooLargeException.

        Reviewers: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Luke Chen <showuon@gmail.com>

    commit 78090bb4cdd2494f0b720d34e17ee0cc645fc399
    Author: Federico Valeri <fedevaleri@gmail.com>
    Date:   Mon May 8 04:15:52 2023 +0200

        KAFKA-14752: Kafka examples improvements - producer changes (#13515)

        KAFKA-14752: Kafka examples improvements - producer changes

        Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <christololov@gmail.com>

    commit 6e7144ac24973afdb71ef59a63c6bacbbb1d2714
    Author: Chia-Ping Tsai <chia7712@gmail.com>
    Date:   Sat May 6 02:56:26 2023 +0800

        MINOR: add docs to remind reader that impl of ConsumerPartitionAssign… (#13659)

        Reviewers: David Jacot <djacot@confluent.io>, Kirk True <kirk@kirktrue.pro>

    commit 6bcc497c36a1aef19204b1bfe3b17a8c1c84c059
    Author: Divij Vaidya <diviv@amazon.com>
    Date:   Fri May 5 14:05:20 2023 +0200

        KAFKA-14766: Improve performance of VarInt encoding and decoding (#13312)

        Motivation

        Reading/writing the protocol buffer varInt32 and varInt64 (also called varLong in our code base) is in the hot path of data plane code in Apache Kafka. We read multiple varInt in a record and in long. Hence, even a minor change in performance could extrapolate to larger performance benefit.

        In this PR, we only update varInt32 encoding/decoding.
        Changes

        This change uses loop unrolling and reduces the amount of repetition of calculations. Based on the empirical results from the benchmark, the code has been modified to pick up the best implementation.
        Results

        Performance has been evaluated using JMH benchmarks on JDK 17.0.6. Various implementations have been added in the benchmark and benchmarking has been done for different sizes of varints and varlongs. The benchmark for various implementations have been added at ByteUtilsBenchmark.java

        Reviewers: Ismael Juma <mlists@juma.me.uk>, Luke Chen <showuon@gmail.com>, Alexandre Dupriez <alexandre.dupriez@gmail.com>

    commit e34f88403159cc8381da23dafdf7e3d7403114a2
    Author: Divij Vaidya <diviv@amazon.com>
    Date:   Fri May 5 13:55:17 2023 +0200

        KAFKA-14926: Remove metrics on Log Cleaner shutdown (#13623)

        When Log cleaning is shutdown, it doesn't remove metrics that were registered to `KafkaYammerMetrics.defaultRegistry()` which has one instance per server. Log cleaner's lifecycle is associated with lifecycle of `LogManager` and hence, there is no possibility where log cleaner will be shutdown but the broker won't. Broker shutdown will close the `jmxReporter` and hence, there is no current metric leak here. The motivation for this code change is to "do the right thing" from a code hygiene perspective.

        Reviewers: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Kirk True <kirk@mustardgrain.com>, David Jacot <djacot@confluent.io>

    commit 0822ce0ed1a106a510930bc9ac53a266f54684d7
    Author: David Arthur <mumrah@gmail.com>
    Date:   Fri May 5 04:35:26 2023 -0400

        KAFKA-14840: Support for snapshots during ZK migration (#13461)

        This patch adds support for handling metadata snapshots while in dual-write mode. Prior to this change, if the active
        controller loaded a snapshot, it would get out of sync with the ZK state.

        In order to reconcile the snapshot state with ZK, several methods were added to scan through the metadata in ZK to
        compute differences with the MetadataImage. Since this introduced a lot of code, I opted to split out a lot of methods
        from ZkMigrationClient into their own client interfaces, such as TopicMigrationClient, ConfigMigrationClient, and
        AclMigrationClient. Each of these has some iterator method that lets the caller examine the ZK state in a single pass
        and without using too much memory.

        Reviewers: Colin P. McCabe <cmccabe@apache.org>, Luke Chen <showuon@gmail.com>

    commit 97c36f3f3142580325daa1a6aadb662893390561
    Author: Colin P. McCabe <cmccabe@apache.org>
    Date:   Thu May 4 12:20:33 2023 -0700

        HOTFIX: fix file deletions left out of MINOR: improve QuorumController logging #13540

    commit 63f9f23ec0aaa62f0da93ebc42934f5fce743ddb
    Author: Colin P. McCabe <cmccabe@apache.org>
    Date:   Thu May 4 11:18:03 2023 -0700

        MINOR: improve QuorumController logging #13540

        When creating the QuorumController, log whether ZK migration is enabled.

        When applying a feature level record which sets the metadata version, log the metadata version enum
        rather than the numeric feature level.

        Improve the logging when we replay snapshots in QuorumController. Log both the beginning and the
        end of replay.

        When TRACE is enabled, log every record that is replayed in QuorumController. Since some records
        may contain sensitive information, create RecordRedactor to assist in logging only what is safe to
        put in the log4j file.

        Add logging to ControllerPurgatory. Successful completions are logged at DEBUG; failures are logged
        at INFO, and additions are logged at TRACE.

        Remove SnapshotReason.java, SnapshotReasonTest.java, and
        QuorumController#generateSnapshotScheduled. They are deadcode now that snapshot generation moved to
        org.apache.kafka.image.publisher.SnapshotGenerator.

        Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>

    commit ffd814d25fb97f2ee0b73000788c93ec1d5b9bff
    Author: Justine Olshan <jolshan@confluent.io>
    Date:   Thu May 4 09:55:45 2023 -0700

        KAFKA-14916: Fix code that assumes transactional ID implies all records are transactional (#13607)

        Also modifies verification to only add a partition to verify if it is transactional.

        When verifying we look at all the transactional producer IDs and throw INVALID_RECORD on the request if one is different.

        Reviewers: Kirk True <ktrue@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>

    commit ea81e99e5980c807414651034a8c60426a158ca4
    Author: Philip Nee <pnee@confluent.io>
    Date:   Thu May 4 09:20:01 2023 -0700

        KAFKA-13668: Retry upon missing initProducerId due to authorization error (#12149)

        Producers used to throw a fatal error upon failing initProducerId, which can be caused by authorization errors. In this case, the user will need to instantiate a producer.

        This PR makes authorization errors non-fatal so that the user can retry until the permission is fixed by an admin.

        Here we first transition the producer to the ABORTABLE state, then to the UNINITIALIZED state (so that the producer is recoverable). Upon the subsequent send, the producer will transition to INITIALIZING and attempt to send another InitProducerIdRequest.

        Reviewers: Kirk True <ktrue@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Justine Olshan <jolshan@confluent.io>

    commit dc7819d7f1fe6b0160cd95246420ab10c335410b
    Author: Christo Lolov <lolovc@amazon.com>
    Date:   Thu May 4 11:00:33 2023 +0100

        KAFKA-14594: Move LogDirsCommand to tools module (#13122)

        Reviewers: Mickael Maison <mickael.maison@gmail.com>

    commit d46c3f259cce25c43f20fba3943d5cb34ed909ea
    Author: David Mao <47232755+splett2@users.noreply.github.com>
    Date:   Wed May 3 17:09:43 2023 -0700

        MINOR: Reduce number of threads created for integration test brokers (#13655)

        The integration tests seem to create an unnecessarily large number of threads. This reduces the number of threads created per integration test harness broker.

        Reviewers: Luke Chen <showuon@gmail.com>. Justine Olshan <jolshan@confluent.io>

    commit c08120f83f7318f15dcf14d525876d18caf6afd0
    Author: Jason Gustafson <jason@confluent.io>
    Date:   Wed May 3 15:25:32 2023 -0700

        MINOR: Allow tagged fields with version subset of flexible version range (#13551)

        The generated message types are missing a range check for the case when the tagged version range is a subset of
        the flexible version range. This causes the tagged field count, which is computed correctly, to conflict with the
        number of tags serialized.

        Reviewers: Colin P. McCabe <cmccabe@apache.org>

    commit b620c03ccf48d6d92b219cba35bb1e5e248d2547
    Author: Luke Chen <showuon@gmail.com>
    Date:   Thu May 4 01:08:25 2023 +0800

        KAFKA-14946: fix NPE when merging the deltatable (#13653)

        Fix NPE while merging the deltatable. Because it's possible that hashTier is
        not null but deltatable is null (ex: removing data), we should have null check
        while merging for deltatable like other places did. Also added tests that will
        fail without this change.

        Reviewers: Colin P. McCabe <cmccabe@apache.org>

    commit 4a0b6ebf60ed7614f042443460b490971e8662a4
    Author: Yash Mayya <yash.mayya@gmail.com>
    Date:   Tue May 2 23:16:46 2023 +0530

        KAFKA-14876: Document the new 'PUT /connectors/{name}/stop' REST API for Connect (#13657)

        Reviewers: Chris Egerton <chrise@aiven.io>

    commit 16fc8e1cfff6f0ac29209704a079b0ddcbd0625e
    Author: David Jacot <djacot@confluent.io>
    Date:   Tue May 2 18:04:50 2023 +0200

        KAFKA-14462; [10/N] Add TargetAssignmentBuilder (#13637)

        This patch adds TargetAssignmentBuilder. It is responsible for computing a target assignment for a given group.

        Reviewers: Ritika Reddy <rreddy@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>

    commit f44ee4fab7ef7adf715ecf2b96defa5cc8311949
    Author: Christo Lolov <lolovc@amazon.com>
    Date:   Tue May 2 16:39:31 2023 +0100

        MINOR: Remove unnecessary code in client/connect (#13259)

        Reviewers: Mickael Maison <mickael.maison@gmail.com>

    commit 33012b5ec34305a5133eb6e9e2fb6e8c3178f3b3
    Author: Federico Valeri <fedevaleri@gmail.com>
    Date:   Tue May 2 14:28:42 2023 +0200

        KAFKA-14752: Kafka examples improvements - consumer changes (#13514)

        KAFKA-14752: Kafka examples improvements - consumer changes

        This is extracted from the original PR for better review.
        https://github.com/apache/kafka/pull/13492

        Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

        Reviewers: Christo Lolov <christololov@gmail.com>, Luke Chen <showuon@gmail.com>

    commit 141c76a2c904705f2cd484e96767fcb217c5db25
    Author: Bruno Cadonna <cadonna@apache.org>
    Date:   Tue May 2 14:00:34 2023 +0200

        KAFKA-14133: Migrate topology builder mock in TaskManagerTest to mockito (#13529)

        1. Migrates topology builder mock in TaskManagerTest to mockito.

        2. Replaces the unit test to verify if subscribed partitions are added
        to topology metadata.

        3. Modifies signatures of methods for adding subscribed partitions to
        topology metadata to use sets instead of lists. This makes the
        intent of the methods clearer and makes the tests more portable.

        Reviewers: Christo Lolov <lolovc@amazon.com>, Matthias J. Sax <mjsax@apache.org>

    commit 21af1918eafa30812a955c3c0295b9e968841cd3
    Author: Luke Chen <showuon@gmail.com>
    Date:   Tue May 2 09:54:12 2023 +0800

        MINOR: Add reason to exceptions in QuorumController (#13648)

        Saw this error message in log:

        ERROR [QuorumController id=1] writeNoOpRecord: unable to start processing because of RejectedExecutionException. Reason: null (org.apache.kafka.controller.QuorumController)

        The null reason is not helpful with only RejectedExecutionException. Adding the reason to it.

        Reviewers: David Arthur <mumrah@gmail.com>, Divij Vaidya <diviv@amazon.com>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>

    commit 4773961a44d1f0d1e11d662c7e0fc955027bced2
    Author: Yash Mayya <yash.mayya@gmail.com>
    Date:   Mon May 1 21:51:09 2023 +0530

        MINOR: Fix Javadoc for configureAdminResources  in Connect's RestServer (#13635)

        Reviewers: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Chris Egerton <chrise@aiven.io>

    commit e29942347acc70aa85d47e84e2021f9c24cd7c80
    Author: Proven Provenzano <93720617+pprovenzano@users.noreply.github.com>
    Date:   Mon May 1 09:56:04 2023 -0400

        KAFKA-14859: SCRAM ZK to KRaft migration with dual write (#13628)

        Handle migrating SCRAM records in ZK when migrating from ZK to KRaft.

        This includes handling writing back SCRAM records to ZK while in dual write mode where metadata updates are written to both the KRaft metadata log and to ZK. This allows for rollback of migration to include SCRAM metadata changes.

        Reviewers: David Arthur <mumrah@gmail.com>

    commit 64ebbc577de757830b2f26ee3c8c7b1ddf10f86c
    Author: Philip Nee <pnee@confluent.io>
    Date:   Fri Apr 28 08:46:40 2023 -0700

        MINOR: Fixing typos in the ConsumerCoordinator (#13618)

        Reviewers: Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, David Jacot <djacot@confluent.io>

    commit e55fbceb6667cc1455f7fb2e96421c85741fa7df
    Author: Anton Agestam <anton.agestam@aiven.io>
    Date:   Fri Apr 28 11:54:28 2023 +0100

        MINOR: Fix incorrect description of SessionLifetimeMs (#13649)

        Reviewers: Mickael Maison <mickael.maison@gmail.com>

    commit c6ad151ac3bac0d8d1d6985d230eacaa170b8984
    Author: Philip Nee <pnee@confluent.io>
    Date:   Fri Apr 28 02:08:32 2023 -0700

        KAFKA-14639: A single partition may be revoked and assign during a single round of rebalance (#13550)

        This is a really long story, but the incident started in KAFKA-13419 when we observed a member sending out a topic partition owned from the previous generation when a member missed a rebalance cycle due to REBALANCE_IN_PROGRESS.

        This patch changes the AbstractStickyAssignor.AllSubscriptionsEqual method.  In short, it should no long check and validate only the highest generation.  Instead, we consider 3 cases:
        1. Member will continue to hold on to its partition if there are no other owners
        2. If there are 1+ owners to the same partition. One with the highest generation will win.
        3. If two members of the same generation hold on to the same partition.  We will log an error but remove both from the assignment. (Same with the current logic)

        Here are some important notes that lead to the patch:
        - If a member is kicked out of the group, and `UNKNOWN_MEMBER_ID` will be thrown.
        - It seems to be a common situation that members are late to joinGroup and therefore get `REBALANCE_IN_PROGRESS` error.  This is why we don't want to reset generation because it might cause lots of revocations and can be disruptive

        To summarize the current behavior of different errors:
        `REBALANCE_IN_PROGRESS`
        - heartbeat: requestRejoin if member state is stable
        - joinGroup: rejoin immediately
        - syncGroup: rejoin immediately
        - commit: requestRejoin and fail the commit. Raise this exception if the generation is staled, i.e. another rebalance is already in progress.

        `UNKNOWN_MEMBER_ID`
        - heartbeat: resetStateAndRejoinif generation hasn't changed. otherwise, ignore
        - joinGroup: resetStateAndRejoin if generation unchanged, otherwise rejoin immediately
        - syncGroup:  resetStateAndRejoin if generation unchanged, otherwise rejoin immediately

        `ILLEGAL_GENERATION`
        - heartbeat: resetStateAndRejoinif generation hasn't changed. otherwise, ignore
        - syncGroup: raised the exception if generation has been resetted or the member hasn't completed rebalancing.  then resetStateAndRejoin if generation unchanged, otherwise rejoin immediately

        Reviewers: David Jacot <djacot@confluent.io>

    commit 10b3e667132934084a2d275a204a1a782c2df94e
    Author: Federico Valeri <fedevaleri@gmail.com>
    Date:   Fri Apr 28 10:32:11 2023 +0200

        KAFKA-14584: Deprecate StateChangeLogMerger tool (#13171)

        Reviewers: Mickael Maison <mickael.maison@gmail.com>

    commit d796480fe87fd819fc0ac560ca318759180d4644
    Author: Luke Chen <showuon@gmail.com>
    Date:   Fri Apr 28 14:35:12 2023 +0800

        KAFKA-14909: check zkMigrationReady tag before migration (#13631)

        1. add ZkMigrationReady in apiVersionsResponse
        2. check all nodes if ZkMigrationReady are ready before moving to next migration state

        Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>

    commit c708f7ba5f4f449920cec57a5b69e84e92128b54
    Author: Colin Patrick McCabe <cmccabe@apache.org>
    Date:   Thu Apr 27 19:15:26 2023 -0700

        MINOR: remove spurious call to fatalFaultHandler (#13651)

        Remove a spurious call to fatalFaultHandler accidentally introduced by KAFKA-14805.  We should only
        invoke the fatal fault handller if we are unable to generate the activation records. If we are
        unable to write the activation records, a controller failover should be sufficient to remedy the
        situation.

        Co-authored-by: Luke Chen showuon@gmail.com

        Reviewers: Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>

    commit 056657d84d84e116ffc9460872945b4d2b479ff3
    Author: David Arthur <mumrah@gmail.com>
    Date:   Thu Apr 27 17:04:56 2023 -0400

        MINOR add license to reviewers.py

    commit a08f31ecfefca1a51d64137a344e62a236740e62
    Author: David Arthur <mumrah@gmail.com>
    Date:   Thu Apr 27 14:32:19 2023 -0400

        MINOR: Adding reviewers.py (#11096)

        This script can be used to help build the "Reviewers: " string we include in commit messages.

        Reviewers: Ismael Juma <ismael@juma.me.uk>, Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@apache.org>

    commit 70493336171363cfa95237a0fe14ef57090553e4
    Author: Colin P. McCabe <cmccabe@apache.org>
    Date:   Wed Apr 26 16:10:46 2023 -0700

        KAFKA-14943: Fix ClientQuotaControlManager validation

        Don't allow setting negative or zero values for quotas. Don't allow SCRAM mechanism names to be
        used as client quota names. SCRAM mechanisms are not client quotas. (The confusion arose because of
        internal ZK representation details that treated them both as "client configs.")

        Add unit tests for ClientQuotaControlManager.isValidIpEntity and
        ClientQuotaControlManager.configKeysForEntityType.

        This change doesn't affect metadata record application, only input validation. If there are bad
        client quotas that are set currently, this change will not alter the current behavior (of throwing
        an exception and ignoring the bad quota).

    commit 8bde4e79cdeea7761b54e24516a7d2cc9f52e051
    Author: David Jacot <djacot@confluent.io>
    Date:   Thu Apr 27 14:05:41 2023 +0200

        KAFKA-14462; [9/N] Add RecordHelpers (#13544)

        This patch adds RecordHelpers.

        Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>

    commit dd6690a7a0a565a681c52dbfe0c7c89875bdf8c9
    Author: LinShunKang <linshunkang.chn@gmail.com>
    Date:   Thu Apr 27 10:44:08 2023 +0800

        KAFKA-14944: Reduce CompletedFetch#parseRecord() memory copy (#12545)

        This implements KIP-863: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=225152035
        Direct use ByteBuffer instead of byte[] to deserialize.

        Reviewers: Luke Chen <showuon@gmail.com>, Kirk True <kirk@kirktrue.pro>

    commit c1b5c75d9271638776392822a094e9e7ef37f490
    Author: David Arthur <mumrah@gmail.com>
    Date:   Wed Apr 26 10:20:30 2023 -0400

        KAFKA-14805 KRaft controller supports pre-migration mode (#13407)

        This patch adds the concept of pre-migration mode to the KRaft controller. While in this mode,
        the controller will only allow certain write operations. The purpose of this is to disallow metadata
        changes when the controller is waiting for the ZK migration records to be committed.

        The following ControllerWriteEvent operations are permitted in pre-migration mode

        * completeActivation
        * maybeFenceReplicas
        * writeNoOpRecord
        * processBrokerHeartbeat
        * registerBroker (only for migrating ZK brokers)
        * unregisterBroker

        Raft events and other controller events do not follow the same code path as ControllerWriteEvent,
        so they are not affected by this new behavior.

        This patch also add a new metric as defined in KIP-868: kafka.controller:type=KafkaController,name=ZkMigrationState

        In order to support upgrades from 3.4.0, this patch also redefines the enum value of value 1 to mean
        MIGRATION rather than PRE_MIGRATION.

        Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Colin P. McCabe <cmccabe@apache.org>

    commit 007c0d375a6e70aefb65f58f9f096016c4cbea16
    Author: vamossagar12 <sagarmeansocean@gmail.com>
    Date:   Wed Apr 26 17:53:56 2023 +0530

        KAFKA-14929: Fixing flaky test putTopicStateRetriableFailure (#13634)

        Co-authored-by: Sagar Rao <sagarrao@Sagars-MacBook-Pro.local>

        Reviewers: Daniel Urban <durban@cloudera.com>, Justine Olshan <jolshan@confluent.io>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>

    commit baf127a6633161cb52747467880b006d2f54d3bd
    Author: Greg Harris <greg.harris@aiven.io>
    Date:   Wed Apr 26 00:30:13 2023 -0700

        KAFKA-14666: Add MM2 in-memory offset translation index for offsets behind replication (#13429)

        Reviewers: Daniel Urban <durban@cloudera.com>, Chris Egerton <chrise@aiven.io>

    commit ced1f62c1b1cc0d547dc31fbce538885c29ed1ef
    Author: Victoria Xia <victoria.xia@confluent.io>
    Date:   Tue Apr 25 22:39:23 2023 -0400

        KAFKA-14834: [13/N] Docs updates for versioned store semantics (#13622)

        Reviewers: Matthias J. Sax <matthias@confluent.io>

    commit a7d0b3f753708a93aea92e614833f6f6e7443234
    Author: Said Boudjelda <bmscomp@gmail.com>
    Date:   Tue Apr 25 23:31:04 2023 +0200

        MINOR: Upgrade gradle to 8.1.1 (#13625)

        Also upgrade gradle plugins:
         - `org.owasp.dependencycheck` gradle plugin to version `8.2.1`
         - `com.github.johnrengelman.shadow gradle` plugin to version `8.1.1`

        Gradle release notes:
        * https://docs.gradle.org/8.1.1/release-notes.html

        Reviewers: Ismael Juma <ismael@juma.me.uk>

    commit 9a36da12b7359b7158332c541655716312efb5b3
    Author: David Jacot <djacot@confluent.io>
    Date:   Tue Apr 25 18:50:51 2023 +0200

        KAFKA-14462; [8/N] Add ConsumerGroupMember (#13538)

        This patch adds ConsumerGroupMember.

        Reviewers: Christo Lolov <lolovc@amazon.com>, Jeff Kim <jeff.kim@confluent.io>, Jason Gustafson <jason@confluent.io>

    commit 4780dc773f2cd5a20fa5be38d20137745690a888
    Author: Yash Mayya <yash.mayya@gmail.com>
    Date:   Tue Apr 25 21:20:35 2023 +0530

        KAFKA-14933: Document Connect's log level REST APIs from KIP-495 (#13636)

        Reviewers: Mickael Maison <mickael.maison@gmail.com>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>

    commit ea540fa40042c5e2d808cc4dfc71c71f7466fbe4
    Author: Gantigmaa Selenge <39860586+tinaselenge@users.noreply.github.com>
    Date:   Tue Apr 25 13:28:37 2023 +0100

        KAFKA-14592: Move FeatureCommand to tools (#13459)

        KAFKA-14592: Move FeatureCommand to tools

        Reviewers: Luke Chen <showuon@gmail.com>

    commit d83a734c41c43a03aa571e602cf8535f0893fd79
    Author: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>
    Date:   Tue Apr 25 12:02:24 2023 +0200

        MINOR: only set sslEngine#setUseClientMode to false once when ssl mode is server (#13626)

        The sslEngine.setUseClientMode(false) was duplicated when ssl mode is server during SSLEngine creation
        in DefaultSslEngineFactory.java. The patch attemps to remove the duplicated call.

        Reviewers:   maulin-vasavada <maulin.vasavada@gmail.com>, Divij Vaidya <diviv@amazon.com>, Manikumar Reddy <manikumar.reddy@gmail.com>

    commit 2557a4b842b07ac796193bd9a3ef6b724dc995cf
    Author: Matthias J. Sax <matthias@confluent.io>
    Date:   Mon Apr 24 15:29:57 2023 -0700

        KAFKA-12446: update change encoding to use varint (#13533)

        KIP-904 had the goal in mind to save space when encoding the size on a byte array. However, using UINT32 does not achieve this goal. This PR changes the encoding to VARINT instead.

        Reviewers: Victoria Xia <victoria.xia@confluent.io>,  Farooq Qaiser <fqaiser94@gmail.com>, Walker Carlson <wcarlson@confluent.io>

    commit ab8f2850973b1e9fd548d5b7b8eae458fdd26402
    Author: Victoria Xia <victoria.xia@confluent.io>
    Date:   Mon Apr 24 17:06:26 2023 -0400

        KAFKA-14834: [12/N] Minor code cleanups relating to versioned stores (#13615)

        Reviewers: Matthias J. Sax <matthias@confluent.io>

    commit 6dcdb017327587a6943fa868595fa3488c7f7ef7
    Author: Matthias J. Sax <matthias@confluent.io>
    Date:   Mon Apr 24 12:40:25 2023 -0700

        KAFKA-14862: Outer stream-stream join does not output all results with multiple input partitions (#13592)

        Stream-stream outer join, uses a "shared time tracker" to track stream-time progress for left and right input in a single place. This time tracker is incorrectly shared across tasks.

        This PR introduces a supplier to create a "shared time tracker" object per task, to be shared between the left and right join processors.

        Reviewers: Victoria Xia <victoria.xia@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Walker Carlson <wcarlson@confluent.io>

    commit 2271e748a11919d07698ebce759dca2e3075596a
    Author: Chia-Ping Tsai <chia7712@gmail.com>
    Date:   Mon Apr 24 17:21:19 2023 +0800

        MINOR: fix zookeeper_migration_test.py (#13620)

        Reviewers: Mickael Maison <mimaison@users.noreply.github.com>

    commit c3241236296f9aaee103e475f242e32f04d4c256
    Author: Yash Mayya <yash.mayya@gmail.com>
    Date:   Mon Apr 24 14:06:20 2023 +0530

        KAFKA-14876: Document the new 'GET /connectors/{name}/offsets' REST API for Connect (#13587)

        Reviewers: Mickael Maison <mickael.maison@gmail.com>

    commit 7061475445cb6314e7cf4f9848384224b14f4395
    Author: Greg Harris <greg.harris@aiven.io>
    Date:   Fri Apr 21 12:55:41 2023 -0700

        KAFKA-14905: Reduce flakiness in MM2 ForwardingAdmin test due to admin timeouts (#13575)

        Reduce flakiness of `MirrorConnectorsWithCustomForwardingAdminIntegrationTest`

        Reviewers: Josep Prat <jlprat@apache.org>

    commit ecdef88f744410039b88e90a5979078d5735aa06
    Author: Matthias J. Sax <matthias@confluent.io>
    Date:   Fri Apr 21 12:48:05 2023 -0700

        MINOR: updated KS release notes for 3.5 (#13577)

        Reviewers: Walker Carlson <wcarlson@confluent.io>

    commit dd63d88ac3ea7a9a55a6dacf9c5473e939322a55
    Author: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>
    Date:   Fri Apr 21 15:02:06 2023 +0200

        MINOR: fix noticed typo in raft and metadata projects (#13612)

        Reviewers: Josep Prat <jlprat@apache.org>

    commit c39bf714bbaf2a632be4c2a7e446553fe40ba129
    Author: David Jacot <djacot@confluent.io>
    Date:   Fri Apr 21 11:22:16 2023 +0200

        KAFKA-14462; [7/N] Add ClientAssignor, Assignment, TopicMetadata and VersionedMetadata (#13537)

        This patch adds ClientAssignor, Assignment, TopicMetadata and VersionedMetadata classes.

        Reviewers: Christo Lolov <lolovc@amazon.com>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>

    commit 2d0b816150c79057c813387bd126523b6326a1fc
    Author: David Jacot <djacot@confluent.io>
    Date:   Fri Apr 21 11:19:04 2023 +0200

        MINOR: Move `ControllerPurgatory` to `server-common` (#13555)

        This patch renames from `ControllerPurgatory` to `DeferredEventQueue` and moves it from the `metadata` module to `server-common` module.

        Reviewers: Alexandre Dupriez <alexandre.dupriez@gmail.com>, Ziming Deng <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@apache.org>

    commit df137752542c005c6998c37c03222ffbeca0f349
    Author: Purshotam Chauhan <pchauhan@confluent.io>
    Date:   Fri Apr 21 14:08:23 2023 +0530

        KAFKA-14828: Remove R/W locks using persistent data structures (#13437)

        Currently, StandardAuthorizer uses a R/W lock for maintaining the consistency of data. For the clusters with very high traffic, we will typically see an increase in latencies whenever a write operation comes. The intent of this PR is to get rid of the R/W lock with the help of immutable or persistent collections. Basically, new object references are used to hold the intermediate state of the write operation. After the completion of the operation, the main reference to the cache is changed to point to the new object. Also, for the read operation, the code is changed such that all accesses to the cache for a single read operation are done to a particular cache object only.

        In the PR description, you can find the performance of various libraries at the time of both read and write. Read performance is checked with the existing AuthorizerBenchmark. For write performance, a new AuthorizerUpdateBenchmark has been added which evaluates the performance of the addAcl operation.

        Reviewers:  Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>,  Divij Vaidya <diviv@amazon.com>

    commit 2ee770ac7e576c85a911ccb307339be3d58a8942
    Author: Colin P. McCabe <cmccabe@apache.org>
    Date:   Thu Apr 20 10:21:26 2023 -0700

        Revert "KAFKA-14908: Set setReuseAddress on the kafka server socket (#13572)"

        This reverts commit d04c3e56c29fc6cb876a1074b1108db2c0f37afc.

    commit ef09a2e3fc11a738f6681fd57fb84ad109593fd3
    Author: Justine Olshan <jolshan@confluent.io>
    Date:   Thu Apr 20 09:30:11 2023 -0700

        KAFKA-14904: Pending state blocked verification of transactions (#13579)

        KAFKA-14561 added verification to transactional produce requests to confirm an ongoing transaction.

        There is an edge case where the transaction is added, but the coordinator is writing to the log for another partition. In this case, when verifying, we return CONCURRENT_TRANSACTIONS and retry. However, the next inflight batch is often successful because the write completes.

        When a partition has no entry in the PSM, it will allow any sequence number. This means if we retry the first write to the partition (or first write in a while) we will never be able to write it and get OutOfOrderSequence exceptions. This is a known issue. Since the verification makes this more common, I propose allowing verification on pending ongoing state since the pending state doesn't prevent us from checking the already added partitions.

        The good news is part 2 of KIP-890 will allow us to enforce that the first write for a transaction is sequence 0 and this issue will go away entirely.

        This PR also adds the locking back into the addPartitions/verify path that was incorrectly removed.

        Reviewers: Ismael Juma <ismael@juma.me.uk>, Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>

    commit 7f175feacaba4187bbb3631dff3a1330060f191e
    Author: vamossagar12 <sagarmeansocean@gmail.com>
    Date:   Thu Apr 20 13:50:29 2023 +0530

        KAFKA-14586: Adding redirection for StreamsResetter (#13614)

        Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>

    commit e14dd8024adae746c50c8b7d9cd268e859669576
    Author: Dimitar Dimitrov <30328539+dimitarndimitrov@users.noreply.github.com>
    Date:   Thu Apr 20 05:29:27 2023 +0200

        KAFKA-14821 Implement the listOffsets API with AdminApiDriver (#13432)

        We are handling complex workflows ListOffsets by chaining together MetadataCall instances and ListOffsetsCall instances, there are many complex and error-prone logic. In this PR we rewrote it with the `AdminApiDriver` infra, notable changes better than old logic:
        1. Retry lookup stage on receiving `NOT_LEADER_OR_FOLLOWER` and `LEADER_NOT_AVAILABLE`, whereas in the past we failed the partition directly without retry.
        2. Removing class field `supportsMaxTimestamp` and calculating it on the fly to avoid the mutable state, this won't change any behavior of  the client.
        3. Retry fulfillment stage on `RetriableException`, whereas in the past we just retry fulfillment stage on `InvalidMetadataException`, this means we will retry on `TimeoutException` and other `RetriableException`.

        We also `handleUnsupportedVersionException` to `AdminApiHandler` and `AdminApiLookupStrategy`, they are used to keep consistency with old logic, and we can continue improvise them.

        Reviewers: Ziming Deng <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>

    commit f5de4daa71f0ad31aa64443c5ff43c59712feea5
    Author: Ron Dagostino <rndgstn@gmail.com>
    Date:   Wed Apr 19 19:45:26 2023 -0400

        KAFKA-14887: FinalizedFeatureChangeListener should not shut down when ZK session expires

        FinalizedFeatureChangeListener shuts the broker down when it encounters an issue trying to process feature change
        events. However, it does not distinguish between issues related to feature changes actually failing and other
        exceptions like ZooKeeper session expiration. This introduces the possibility that Zookeeper session expiration
        could cause the broker to shutdown, which is not intended. This patch updates the code to distinguish between
        these two types of exceptions. In the case of something like a ZK session expiration it logs a warning and continues.
        We shutdown the broker only for FeatureCacheUpdateException.

        Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Christo Lolov <christololov@gmail.com>, Colin P. McCabe <cmccabe@apache.org>

    commit 11c8bf4826197533807b2132cfc6599ba70de1c1
    Author: Victoria Xia <victoria.xia@confluent.io>
    Date:   Wed Apr 19 19:34:36 2023 -0400

        KAFKA-14834: [11/N] Update table joins to identify out-of-order records with `isLatest` (#13609)

        This PR fixes a bug in the table-table join handling of out-of-order records in versioned tables where if the latest value for a particular key is a tombstone, by using the isLatest value from the Change object instead of calling get(key) on the state store to fetch timestamps to compare against. As part of this fix, this PR also updates table-table joins to determine whether upstream tables are versioned by using the GraphNode mechanism, instead of checking the table's value getter.

        Part of KIP-914.

        Reviewer: Matthias J. Sax <matthias@confluent.io>

    commit 809966a9a06664e0b521c2298fa0de834e443607
    Author: Matthew de Detrich <matthew.dedetrich@aiven.io>
    Date:   Wed Apr 19 20:54:07 2023 +0200

        KAFKA-13299: Accept duplicate listener on port for IPv4/IPv6 (#11478)

        Loosens the validation so that Kafka can accept duplicate listeners on the same port but if and only if the listeners are valid IP addresses with one address being an IPv4 address and the other being an IPv6 address.

        Reviewers: Josep Prat <jlprat@apache.org>, Luke Chen <showuon@apache.org>

    commit 750cfd86bf36ac7c71c5670e50eb8668f97b4246
    Author: David Arthur <mumrah@gmail.com>
    Date:   Wed Apr 19 14:19:13 2023 -0400

        KAFKA-14918 Only send controller RPCs to migrating ZK brokers (#13606)

        This patch fixes an issue where the KRaft controller could incorrectly send ZK controller RPCs to KRaft brokers.

        Reviewers: Colin P. McCabe <cmccabe@apache.org>

    commit b10716e72370c4e128bddb17bcc107ccab221e47
    Author: hudeqi <1217150961@qq.com>
    Date:   Thu Apr 20 00:49:08 2023 +0800

        KAFKA-14868: Remove all ReplicaManager metrics when it is closed (#13471)

        Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>

    commit d04c3e56c29fc6cb876a1074b1108db2c0f37afc
    Author: Keith Wall <kwall@redhat.com>
    Date:   Wed Apr 19 03:58:29 2023 +0100

        KAFKA-14908: Set setReuseAddress on the kafka server socket (#13572)

        Changes SocketServer to set the setReuseAddress(true) socket option.

        This aids use-cases where kafka is started/stopped on the same port in rapid succession. Examples are: where a kafka cluster is embedded in an integration test suite that starts/stops a cluster before/after each test.

        Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>, Divij Vaidya <diviv@amazon.com>

    commit f905a5a45d87c94d369cde9d1326e6d18b95cf7e
    Author: vamossagar12 <sagarmeansocean@gmail.com>
    Date:   Wed Apr 19 00:46:03 2023 +0530

        MINOR: Fixing gradle build during compileScala and compileTestScala (#13588)

        Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

    commit 3388adf1b52f545e2d69edd30cdd53241b2887f3
    Author: Matthias J. Sax <matthias@confluent.io>
    Date:   Tue Apr 18 11:32:27 2023 -0700

        MINOR: rename internal FK-join processor classes (#13589)

        Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>

    commit abca86511ecc9e081d676976ff6d3b845308f444
    Author: Proven Provenzano <93720617+pprovenzano@users.noreply.github.com>
    Date:   Tue Apr 18 12:41:38 2023 -0400

        KAFKA-14881: Rework UserScramCredentialRecord (#13513)

        Rework UserScramCredentialRecord to store serverKey and StoredKey rather than saltedPassword. This
        is necessary to support migration from ZK, since those are the fields we stored in ZK.  Update
        latest MetadataVersion to IBP_3_5_IV2 and make SCRAM support conditional on this version.  Moved
        ScramCredentialData.java from org.apache.kafka.image to org.apache.kafka.metadata, which seems more
        appropriate.

        Reviewers: Colin P. McCabe <cmccabe@apache.org>

    commit 61530d68ce83467de6190a52da37b3c0af84f0ef
    Author: Jeff Kim <kimkb2011@gmail.com>
    Date:   Tue Apr 18 09:37:04 2023 -0400

        KAFKA-14869: Bump coordinator value records to flexible versions (KIP-915, Part-2) (#13526)

        This patch implemented the second part of KIP-915. It bumps the versions of the value records used by the group coordinator and the transaction coordinator to make them flexible versions. The new versions are not used when writing to the partitions but only when reading from the partitions. This allows downgrades from future versions that will include tagged fields.

        Reviewers: David Jacot <djacot@confluent.io>

    commit b36a170aa3b2738177d7229859db765e0115385b
    Author: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>
    Date:   Tue Apr 18 13:36:56 2023 +0200

        MINOR: fix typos in MigrationClient, StandardAuthorizer, StandardAuthorizerData and KafkaConfigSchema files (#13593)

        Reviewers: Luke Chen <showuon@gmail.com>

    commit 5e9d4de748dff7b91043a9b799716ab4becdae7e
    Author: Jeff Kim <kimkb2011@gmail.com>
    Date:   Tue Apr 18 04:41:54 2023 -0400

        KAFKA-14869: Ignore unknown record types for coordinators (KIP-915, Part-1) (#13511)

        This patch implemented the first part of KIP-915. It updates the group coordinator and the transaction coordinator to ignores unknown record types while loading their respective state from the partitions. This allows downgrades from future versions that will include new record types.

        Reviewers: Alexandre Dupriez <alexandre.dupriez@gmail.com>, David Jacot <djacot@confluent.io>

    commit 454b72161a76b1687a1263157d7cc30a1bdb2506
    Author: Dániel Urbán <48119872+urbandan@users.noreply.github.com>
    Date:   Tue Apr 18 09:40:14 2023 +0200

        KAFKA-14902: KafkaStatusBackingStore retries on a dedicated background thread to avoid stack overflows (#13557)

        KafkaStatusBackingStore uses an infinite retry logic on producer send, which can lead to a stack overflow.
        To avoid the problem, a background thread was added, and the callback submits the retry onto the background thread.

    commit e27926f92b1f6b34ed6731f33c712a5d0d594275
    Author: Ron Dagostino <rndgstn@gmail.com>
    Date:   Mon Apr 17 17:52:28 2023 -0400

        KAFKA-14735: Improve KRaft metadata image change performance at high … (#13280)

        topic counts.

        Introduces the use of persistent data structures in the KRaft metadata image to avoid copying the entire TopicsImage upon every change.  Performance that was O(<number of topics in the cluster>) is now O(<number of topics changing>), which has dramatic time and GC improvements for the most common topic-related metadata events.  We abstract away the chosen underlying persistent collection library via ImmutableMap<> and ImmutableSet<> interfaces and static factory methods.

        Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>, Purshotam Chauhan <pchauhan@confluent.io>

    commit 7159f6c1a81ead277030f55f293943270346ad4e
    Author: Alyssa Huang <ahuang@confluent.io>
    Date:   Mon Apr 17 11:03:45 2023 -0700

        MINOR: KRaftMetadataCache.getPartitionInfo must set all relevant fields

        Fix a case where KRaftMetadataCache.getPartitionInfo was not setting all the PartitionInfo fields it
        should have been. Add a regression test.

        Reviewers: Colin P. McCabe <cmccabe@apache.org>

commit cc88822986e4004e772f376eae47dbe08e29d137
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed May 10 16:29:27 2023 -0700

    small changes

commit feb12ec2ac6e3218ee6b1e56f912da81c8d2773e
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Apr 25 14:58:20 2023 -0700

    grammar and code comments changes

commit 5550e4934ade28f069e4ae9ab97e76982ff3401f
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed Apr 19 17:39:48 2023 -0700

    fixed formatting stuff

commit 28ab29819bf327309ce07196a3fb0492f7fb2281
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed Apr 19 11:38:57 2023 -0700

    consumers -> members

commit e56ded267456bbe1c22ded0d037953545b921d1d
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Apr 17 11:26:23 2023 -0700

    Interface changes incorporated

commit 2d50c278c18baa11b0235330149dd8f9def57c3d
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Apr 13 14:47:33 2023 -0700

    Renamed topicIdToPartition class

commit d33bbfac053ccac77204ac4253698ed356d7c621
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Apr 11 17:31:26 2023 -0700

    Added stickiness test

commit 0cd618affc5af5aa2f66d1cc81c2efa9cf515184
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Apr 11 13:28:24 2023 -0700

    Separated uniform and general builder in different files, added more tests

commit ef814867aba68fc4813359c5cef6ae155ed49a60
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Fri Apr 7 11:24:27 2023 -0700

    import changes

commit dffad677c6e4a33625401d621f046af6f21feb99
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Apr 6 10:16:33 2023 -0700

    Fixed checkstyle and added test file

commit 7fd779418df3504a490c0d10f3848360a4417077
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed Apr 5 15:13:38 2023 -0700

    First draft optimised uniform assignor code

commit 309cc6141f7962e794c90d4107f460fafd51697d
Merge: 3d9264547c 2c1cf03a89
Author: Ritika Reddy <98577846+rreddy-22@users.noreply.github.com>
Date:   Mon Apr 17 11:03:07 2023 -0700

    Merge branch 'trunk' into rreddy-22/KAFKA-14514

commit 3d9264547cf9ba9afe3c27ce333bd7fe332216cc
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Apr 17 10:58:16 2023 -0700

    Interface changes incorporated, added stickiness test and non-existent topic test.

commit 0737c8d2d2dce48c79a889468319c8e3bc9c9436
Merge: a1bdb58e08 57b94ca208
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Apr 17 09:49:23 2023 -0700

    Merge remote-tracking branch 'origin/rreddy-22/KAFKA-14514' into rreddy-22/KAFKA-14514

    # Conflicts:
    #	group-coordinator/src/main/java/org/apache/kafka/coordinator/group/assignor/AssignmentMemberSpec.java
    #	group-coordinator/src/main/java/org/apache/kafka/coordinator/group/assignor/GroupAssignment.java
    #	group-coordinator/src/main/java/org/apache/kafka/coordinator/group/assignor/MemberAssignment.java

commit a1bdb58e082d98b73f50860db5b35684f4eba51b
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed Apr 12 20:01:59 2023 -0700

    Removed putList, putSet, changed generic pair to remainingAssignmentsForMember, addressed PR comments, changed Map.Entry to Map.forEach etc.

commit 288fa1f5f853a0aa5d140bc930d2a8a52eeeff71
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed Apr 12 11:28:40 2023 -0700

    Addressed some PR comments

commit 918d262ae52892c13766649db847ede6928921f8
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Apr 6 11:12:16 2023 -0700

    Made subscribed topics a collection, removed * import exception

commit 7369db3744b4c89f625f6b73db331eaf0512c4eb
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Apr 6 10:55:06 2023 -0700

    minor

commit 7d1626990abc6bfdf29b5c227751d10d8211ed5f
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Apr 3 11:54:59 2023 -0700

    Addressed PR comments

commit a925b02dd3d78c8b9c1415c083f0544edcb404fe
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed Mar 29 12:30:13 2023 -0700

    Addressed PR comments

commit 9f3a423d6a2b482c6ac6d74c4e65bbed70898dfa
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Mar 28 12:21:33 2023 -0700

    removed reduce partitions case

commit dcb8198355e82f36f12d042ed0d0a4c1d71ea8c6
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Mar 27 20:17:14 2023 -0700

    removed reduce partitions case

commit 48a982232c6f1ff371cb44333fd12d6b37ef381a
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Mar 23 13:02:31 2023 -0700

    Server Side Sticky Range Assignor full implementation

commit 68d53b37b5e3fd78b8f96d7917b1fdc565e188dd
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed Apr 12 13:00:07 2023 -0700

    Removed topicIdToPartition class, changed attribute names and added java doc for getter methods

commit 73c7fdcaf824a31af442b1a4aa2db81b37cc5a9e
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Apr 11 15:09:44 2023 -0700

    Made all attributes private and added getter methods, changed names according to PR comments

commit 100a04e5e064be0bda392ca8f02cb7a7e89f16c8
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Fri Apr 7 11:25:33 2023 -0700

    Added toString method and hash

commit 5065109332845541c5591a52683f21d37982d80a
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Apr 6 13:43:26 2023 -0700

    Interface changes for assignment map and subscribed topics, added new TopicIdToPartition data structure

commit 57b94ca208cb374b22d61eb044ef8004ea09479f
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Apr 6 11:12:16 2023 -0700

    Made subscribed topics a collection, removed * import exception

commit 7352cbb5683f676382f56e6ae5c926f3cae98824
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Apr 6 10:55:06 2023 -0700

    minor

commit b4041a7461c53e76d96dfa915ea7c3cd5cd9a1e0
Merge: 4b321aa537 637bc92ba1
Author: Ritika Reddy <98577846+rreddy-22@users.noreply.github.com>
Date:   Thu Apr 6 10:37:58 2023 -0700

    Merge branch 'apache:trunk' into rreddy-22/KAFKA-14514

commit 4b321aa5374427a15346be9b87b7c545ccb7db61
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Apr 3 11:54:59 2023 -0700

    Addressed PR comments

commit 3b6e65ea3b8452dd923991c3dba49d370c8f0d4e
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Wed Mar 29 12:30:13 2023 -0700

    Addressed PR comments

commit 74105291cc687fe41b75de692fc624ab1335341a
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Tue Mar 28 12:21:33 2023 -0700

    removed reduce partitions case

commit 3df34605e078beb9babcbeadc7b119a70da63796
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Mon Mar 27 20:17:14 2023 -0700

    removed reduce partitions case

commit e8d36070e0b4f799bd10b6388fa1ef85e8d51328
Author: Ritika Reddy <rreddy@confluent.io>
Date:   Thu Mar 23 13:02:31 2023 -0700

    Server Side Sticky Range Assignor full implementation
@dajac dajac added the KIP-848 label Jul 26, 2023
@dajac dajac changed the title KAFKA-14702:Extend server side assignor to support rack aware replica placement KAFKA-14702: Extend server side assignor to support rack aware replica placement Jul 26, 2023
@dajac
Copy link
Contributor

dajac commented Jul 26, 2023

@rreddy-22 The build does not look good. Could you check it? Could you also please update the description of the PR to better explain what the PR is doing?

checkstyle/suppressions.xml Outdated Show resolved Hide resolved
Copy link
Contributor

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rreddy-22 Thanks for the PR. I left some comments.

Copy link
Contributor

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the patch.

@dajac
Copy link
Contributor

dajac commented Jul 28, 2023

The build for JDK 17 is stuck... The three others look good. I have seen a few build stuck this week in other PRs so this is clearly not related to changes made in this one. Moreover, we had successful builds previously and we addressed nits in the mean time. Therefore, I will merge this PR.

@dajac dajac merged commit 3709901 into apache:trunk Jul 28, 2023
1 check was pending
rreddy-22 added a commit to rreddy-22/kafka-rreddy that referenced this pull request Aug 8, 2023
commit 938fee2
Author: David Arthur <mumrah@gmail.com>
Date:   Mon Jul 31 09:21:22 2023 -0400

    Fix a Scala 2.12 compile issue (apache#14126)

    Reviewers: Luke Chen <showuon@gmail.com>, Qichao Chu

commit 3ba718e
Author: Yash Mayya <yash.mayya@gmail.com>
Date:   Fri Jul 28 19:35:42 2023 +0100

    MINOR: Remove duplicate instantiation of MockConnectMetrics in AbstractWorkerSourceTaskTest (apache#14091)

    Reviewers: Christo Lolov <christololov@gmail.com>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Greg Harris <greg.harris@aiven.io>

commit 1574b9f
Author: David Jacot <djacot@confluent.io>
Date:   Fri Jul 28 20:28:54 2023 +0200

    MINOR: Code cleanups in group-coordinator module (apache#14117)

    This patch does a few code cleanups in the group-coordinator module.

    It renames Coordinator to CoordinatorShard;
    It renames ReplicatedGroupCoordinator to GroupCoordinatorShard. I was never really happy with this name. The new name makes more sense to me;
    It removes TopicPartition from the GroupMetadataManager. It was only used in log messages. The log context already includes it so we don't have to log it again.
    It renames assignors to consumerGroupAssignors.

    Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>

commit 3709901
Author: Ritika Reddy <98577846+rreddy-22@users.noreply.github.com>
Date:   Fri Jul 28 10:30:04 2023 -0700

    KAFKA-14702: Extend server side assignor to support rack aware replica placement (apache#14099)

    This patch updates the `PartitionAssignor` interface to support rack-awareness. The change introduces the `SubscribedTopicDescriber` interface that can be used to retrieve topic metadata such as the number of partitions or the racks from within an assignor. We use an interface because it allows us to wrap internal data structures instead of having to copy them. It is more efficient.

    Reviewers: David Jacot <djacot@confluent.io>

commit 32c39c8
Author: David Arthur <mumrah@gmail.com>
Date:   Fri Jul 28 13:02:47 2023 -0400

    KAFKA-15263 Check KRaftMigrationDriver state in each event (apache#14115)

    Reviewers: Colin P. McCabe <cmccabe@apache.org>

commit 811ae01
Author: Philip Nee <pnee@confluent.io>
Date:   Fri Jul 28 09:11:20 2023 -0700

    MINOR: Test assign() and assignment() in the integration test (apache#14086)

    A missing piece from KAFKA-14950. This is to test assign() and assignment() in the integration test.

    Also fixed an accidental mistake in the committed API.

    Reviewers: Jun Rao <junrao@gmail.com>

commit 19f9e1e
Author: Jeff Kim <kimkb2011@gmail.com>
Date:   Fri Jul 28 09:13:27 2023 -0400

    KAFKA-14501: Implement Heartbeat protocol in new GroupCoordinator (apache#14056)

    This patch implements the existing Heartbeat API in the new Group Coordinator.

    Reviewers: David Jacot <djacot@confluent.io>

commit dcabc29
Author: David Jacot <djacot@confluent.io>
Date:   Fri Jul 28 14:49:48 2023 +0200

    KAFKA-14048; CoordinatorContext should be protected by a lock (apache#14090)

    Accessing the `CoordinatorContext` in the `CoordinatorRuntime` should be protected by a lock. The runtime guarantees that the context is never access concurrently however it is accessed by multiple threads. The lock is here to ensure that we have a proper memory barrier. The patch does the following:
    1) Adds a lock to `CoordinatorContext`;
    2) Adds helper methods to get the context and acquire/release the lock.
    3) Allow transition from Failed to Loading. Previously, the context was recreated in this case.

    Reviewers: Justine Olshan <jolshan@confluent.io>

commit afe631c
Author: James Shaw <js102@zepler.net>
Date:   Fri Jul 28 10:45:15 2023 +0100

    KAFKA-14967: fix NPE in CreateTopicsResult in MockAdminClient (apache#13671)

    Co-authored-by: James Shaw <james.shaw@masabi.com>
    Reviewers: Mickael Maison <mickael.maison@gmail.com>

commit 722b259
Author: Christo Lolov <lolovc@amazon.com>
Date:   Fri Jul 28 06:40:37 2023 +0100

    KAFKA-14038: Optimise calculation of size for log in remote tier (apache#14049)

    Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>

commit 10bcd4f
Author: Colin Patrick McCabe <cmccabe@apache.org>
Date:   Thu Jul 27 17:01:55 2023 -0700

    KAFKA-15213: provide the exact offset to QuorumController.replay (apache#13643)

    Provide the exact record offset to QuorumController.replay() in all cases. There are several situations
    where this is useful, such as logging, implementing metadata transactions, or handling broker
    registration records.

    In the case where the QC is inactive, and simply replaying records, it is easy to compute the exact
    record offset from the batch base offset and the record index.

    The active QC case is more difficult. Technically, when we submit records to the Raft layer, it can
    choose a batch base offset later than the one we expect, if someone else is also adding records.
    While the QC is the only entity submitting data records, control records may be added at any time.
    In the current implementation, these are really only used for leadership elections. However, this
    could change with the addition of quorum reconfiguration or similar features.

    Therefore, this PR allows the QC to tell the Raft layer that a record append should fail if it
    would have resulted in a batch base offset other than what was expected. This in turn will trigger a
    controller failover. In the future, if automatically added control records become more common, we
    may wish to have a more sophisticated system than this simple optimistic concurrency mechanism. But
    for now, this will allow us to rely on the offset as correct.

    In order that the active QC can learn what offset to start writing at, the PR also adds a new
    RaftClient#endOffset function.

    At the Raft level, this PR adds a new exception, UnexpectedBaseOffsetException. This gets thrown
    when we request a base offset that doesn't match the one the Raft layer would have given us.
    Although this exception should cause a failover, it should not be considered a fault. This
    complicated the exception handling a bit and motivated splitting more of it out into the new
    EventHandlerExceptionInfo class. This will also let us unit test things like slf4j log messages a
    bit better.

    Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>

commit e5861ee
Author: Alyssa Huang <ahuang@confluent.io>
Date:   Thu Jul 27 13:12:25 2023 -0700

    [MINOR] Add latest versions to kraft upgrade kafkatest (apache#14084)

    Reviewers: Ron Dagostino <rndgstn@gmail.com>

commit 6f39ef0
Author: Justine Olshan <jolshan@confluent.io>
Date:   Thu Jul 27 09:36:32 2023 -0700

    MINOR: Adjust Invalid Record Exception for Invalid Txn State as mentioned in KIP-890 (apache#14088)

    Invalid record is a newer error. INVALID_TXN_STATE has been around as long as transactions and is not retriable. This is the desired behavior.

commit 29825ee
Author: David Jacot <djacot@confluent.io>
Date:   Thu Jul 27 13:18:10 2023 +0200

    KAFKA-14499: [3/N] Implement OffsetCommit API (apache#14067)

    This patch introduces the `OffsetMetadataManager` and implements the `OffsetCommit` API for both the old rebalance protocol and the new rebalance protocol. It introduces version 9 of the API but keeps it as unstable for now. The patch adds unit tests to test the API. Integration tests will be done separately.

    Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>

commit 353141e
Author: Divij Vaidya <diviv@amazon.com>
Date:   Thu Jul 27 12:33:34 2023 +0200

    KAFKA-15251: Add 3.5.1 to system tests (apache#14069)

    Reviewers: Matthias J. Sax <matthias@confluent.io>

commit d2fc907
Author: Jeff Kim <kimkb2011@gmail.com>
Date:   Thu Jul 27 02:02:29 2023 -0400

    KAFKA-14500; [6/6] Implement SyncGroup protocol in new GroupCoordinator (apache#14017)

    This patch implements the SyncGroup API in the new group coordinator. All the new unit tests are based on the existing scala tests.

    Reviewers: David Jacot <djacot@confluent.io>

commit ed44bcd
Author: Hao Li <1127478+lihaosky@users.noreply.github.com>
Date:   Wed Jul 26 16:02:52 2023 -0700

    KAFKA-15022: [3/N] use graph to compute rack aware assignment for active stateful tasks (apache#14030)

    Part of KIP-925.

    Reviewers: Matthias J. Sax <matthias@confluent.io>

commit 8135b6d
Author: Said Boudjelda <bmscomp@gmail.com>
Date:   Wed Jul 26 19:52:02 2023 +0200

    KAFKA-15235: Fix broken coverage reports since migration to Gradle 8.x (apache#14075)

    Reviewers: Divij Vaidya <diviv@amazon.com>

commit e5fb9b6
Author: Said Boudjelda <bmscomp@gmail.com>
Date:   Wed Jul 26 19:12:27 2023 +0200

    MINOR: upgrade version of gradle plugin (ben-manes.versions) to 0.47.0 (apache#14098)

    Reviewers: Divij Vaidya <diviv@amazon.com>

commit a900794
Author: David Arthur <mumrah@gmail.com>
Date:   Wed Jul 26 12:54:59 2023 -0400

    KAFKA-15196 Additional ZK migration metrics (apache#14028)

    This patch adds several metrics defined in KIP-866:

    * MigratingZkBrokerCount: the number of zk brokers registered with KRaft
    * ZkWriteDeltaTimeMs: time spent writing MetadataDelta to ZK
    * ZkWriteSnapshotTimeMs: time spent writing MetadataImage to ZK
    * Adds value 4 for "ZK" to ZkMigrationState

    Also fixes a typo in the metric name introduced in apache#14009 (ZKWriteBehindLag -> ZkWriteBehindLag)

    Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>

commit 6d81698
Author: sciclon2 <74413315+sciclon2@users.noreply.github.com>
Date:   Wed Jul 26 15:48:09 2023 +0200

    KAFKA-15243: Set decoded user names to DescribeUserScramCredentialsResponse (apache#14094)

    Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

commit ff390ab
Author: vamossagar12 <sagarmeansocean@gmail.com>
Date:   Wed Jul 26 17:56:20 2023 +0530

    [MINOR] Fix Javadoc comment in KafkaFuture#toCompletionStage (apache#14100)

    Fix Javadoc comment in KafkaFuture#toCompletionStage

    Reviewers: Luke Chen <showuon@gmail.com>

commit bb677c4
Author: Federico Valeri <fedevaleri@gmail.com>
Date:   Wed Jul 26 12:04:34 2023 +0200

    KAFKA-14583: Move ReplicaVerificationTool to tools (apache#14059)

    Reviewers: Mickael Maison <mickael.maison@gmail.com>

commit 4d30cbf
Author: Said Boudjelda <bmscomp@gmail.com>
Date:   Wed Jul 26 11:21:36 2023 +0200

    MINOR: Upgrade the minor version of snappy dependency to 1.1.10.3 (apache#14072)

    Reviewers: Divij Vaidya <diviv@amazon.com>

commit 206a4af
Author: Divij Vaidya <diviv@amazon.com>
Date:   Wed Jul 26 11:19:56 2023 +0200

    MINOR: Add co-authors to release email template (apache#14080)

    Reviewers: Mickael Maison <mickael.maison@gmail.com>

commit 46a8a28
Author: vamossagar12 <sagarmeansocean@gmail.com>
Date:   Wed Jul 26 07:21:23 2023 +0530

    KAFKA-15218: Avoid NPE thrown while deleting topic and fetch from follower concurrently (apache#14051)

    When deleting topics, we'll first clear all the remoteReplicaMap when stopPartitions here. But this time, there might be fetch request coming from follower, and try to check if the replica is eligible to be added into ISR here. At this moment, NPE will be thrown. Although it's fine since this topic is already deleted, it'd be better to avoid it happen.

    Reviewers: Luke Chen <showuon@gmail.com>

commit af1f50f
Author: Matthias J. Sax <matthias@confluent.io>
Date:   Tue Jul 25 14:56:58 2023 -0700

    MINOR: fix docs markup (apache#14085)

    Reviewers: Qichao Chu (@ex172000), Mickael Maison <mickael.maison@gmail.com>

commit e794bc7
Author: David Arthur <mumrah@gmail.com>
Date:   Tue Jul 25 16:05:04 2023 -0400

    MINOR: Add a Builder for KRaftMigrationDriver (apache#14062)

    Reviewers: Justine Olshan <jolshan@confluent.io>

commit 8b027b6
Author: tison <wander4096@gmail.com>
Date:   Tue Jul 25 23:56:49 2023 +0800

    MINOR: Fix typo in ProduceRequest.json (apache#14070)

    Reviewers: Mickael Maison <mickael.maison@gmail.com>

commit 08b3820
Author: Yash Mayya <yash.mayya@gmail.com>
Date:   Tue Jul 25 14:03:29 2023 +0100

    KAFKA-15238: Move DLQ reporter setup from the DistributedHerder's tick thread to the sink task thread (apache#14079)

    Reviewers: Chris Egerton <chrise@aiven.io>

commit 58b8c5c
Author: Chris Egerton <chrise@aiven.io>
Date:   Tue Jul 25 05:12:46 2023 -0700

    MINOR: Downgrade log level for conflicting Connect plugin aliases (apache#14081)

    Reviewers: Greg Harris <greg.harris@aiven.io>

commit c7de30f
Author: Colin Patrick McCabe <cmccabe@apache.org>
Date:   Mon Jul 24 21:13:58 2023 -0700

    KAFKA-15183: Add more controller, loader, snapshot emitter metrics (apache#14010)

    Implement some of the metrics from KIP-938: Add more metrics for
    measuring KRaft performance.

    Add these metrics to QuorumControllerMetrics:
        kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCount
        kafka.controller:type=KafkaController,name=EventQueueOperationsStartedCount
        kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCount
        kafka.controller:type=KafkaController,name=NewActiveControllersCount

    Create LoaderMetrics with these new metrics:
        kafka.server:type=MetadataLoader,name=CurrentMetadataVersion
        kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCount

    Create SnapshotEmitterMetrics with these new metrics:
        kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedBytes
        kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedAgeMs

    Reviewers: Ron Dagostino <rndgstn@gmail.com>

commit 79b8c96
Author: David Mao <47232755+splett2@users.noreply.github.com>
Date:   Mon Jul 24 13:22:25 2023 -0700

    KAFKA-14990: Dynamic producer ID expiration should be applied on a broker restart (apache#13707)

    Dynamic overrides for the producer ID expiration config are not picked up on broker restart in Zookeeper mode. Based on the integration test, this does not apply to KRaft mode.

    Adds a broker restart that fails without the corresponding KafkaConfig change.

    Reviewers: Justine Olshan <jolshan@confluent.io>

commit 38781f9
Author: Justine Olshan <jolshan@confluent.io>
Date:   Mon Jul 24 13:08:57 2023 -0700

    KAFKA-14920: Address timeouts and out of order sequences (apache#14033)

    When creating a verification state entry, we also store sequence and epoch. On subsequent requests, we will take the latest epoch seen and the earliest sequence seen. That way, if we try to append a sequence after the earliest seen sequence, we can block that and retry. This addresses potential OutOfOrderSequence loops caused by errors during verification (coordinator loading, timeouts, etc).

    Reviewers:  David Jacot <david.jacot@gmail.com>,  Artem Livshits <alivshits@confluent.io>
rreddy-22 added a commit to rreddy-22/kafka-rreddy that referenced this pull request Aug 8, 2023
commit e072706
Author: José Armando García Sancio <jsancio@users.noreply.github.com>
Date:   Tue Aug 8 14:31:42 2023 -0700

    KAFKA-15312; Force channel before atomic file move (apache#14162)

    On ext4 file systems we have seen snapshots with zero-length files. This is possible if
    the file is closed and moved before forcing the channel to write to disk.

    Reviewers: Ron Dagostino <rndgstn@gmail.com>, Alok Thatikunta <athatikunta@confluent.io>

commit a1cb4b4
Author: Lucia Cerchie <luciacerchie@gmail.com>
Date:   Tue Aug 8 12:03:42 2023 -0700

    add changes made before merge (apache#14137)

    Change in response to KIP-941.

    New PR due to merge issue.

    Changes line 57 in the RangeQuery class file from:

    public static <K, V> RangeQuery<K, V> withRange(final K lower, final K upper) {
        return new RangeQuery<>(Optional.of(lower), Optional.of(upper));
    }
    to

    public static <K, V> RangeQuery<K, V> withRange(final K lower, final K upper) {
         return new RangeQuery<>(Optional.ofNullable(lower), Optional.ofNullable(upper));
     }
    Testing strategy:

    Since null values can now be entered in RangeQuerys in order to receive full scans, I changed the logic defining query starting at line 1085 in IQv2StoreIntegrationTest.java from:

            final RangeQuery<Integer, V> query;
            if (lower.isPresent() && upper.isPresent()) {
                query = RangeQuery.withRange(lower.get(), upper.get());
            } else if (lower.isPresent()) {
                query = RangeQuery.withLowerBound(lower.get());
            } else if (upper.isPresent()) {
                query = RangeQuery.withUpperBound(upper.get());
            } else {
                query = RangeQuery.withNoBounds();
            }
    to

    query = RangeQuery.withRange(lower.orElse(null), upper.orElse(null));
    because different combinations of isPresent() in the bounds is no longer necessary.

    Reviewers: John Roesler <vvcephei@apache.org>, Bill Bejeck <bbejeck@apache.org>

commit ff4fed5
Author: Greg Harris <greg.harris@aiven.io>
Date:   Tue Aug 8 10:06:35 2023 -0700

    KAFKA-15031: Add plugin.discovery to Connect worker configuration (KIP-898) (apache#14055)

    Reviewers: Chris Egerton <chrise@aiven.io>

commit 60a5117
Author: Hao Li <1127478+lihaosky@users.noreply.github.com>
Date:   Tue Aug 8 08:01:05 2023 -0700

    KAFKA-15022: [7/N] use RackAwareTaskAssignor in HAAssignor (apache#14139)

    Part of KIP-915.

    - Change TaskAssignor interface to take RackAwareTaskAssignor
    - Integrate RackAwareTaskAssignor to StreamsPartitionAssignor and HighAvailabilityTaskAssignor
    - Update HAAssignor tests

    Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias J. Sax <matthias@confluent.io>

commit 1c04ae8
Author: Matthias J. Sax <matthias@confluent.io>
Date:   Tue Aug 8 07:51:59 2023 -0700

    MINOR: Improve JavaDocs of KafkaStreams `context.commit()` (apache#14163)

    Reviewers: Bill Bejeck <bill@confluent.io>

commit 8dec3e6
Author: Hao Li <1127478+lihaosky@users.noreply.github.com>
Date:   Mon Aug 7 11:21:55 2023 -0700

    KAFKA-15022: [6/N] add rack aware assignor configs and update standby optimizer (apache#14150)

    Part of KIP-925.

    - Add configs for rack aware assignor
    - Update standby optimizer in RackAwareTaskAssignor to have more rounds
    - Refactor some method in RackAwareTaskAssignorTest to test utils so that they can also be used in HighAvailabilityTaskAssignorTest and other tests

    Reviewers: Matthias J. Sax <matthias@confluent.io>

commit ac6a536
Author: Maros Orsak <maros.orsak159@gmail.com>
Date:   Mon Aug 7 15:19:55 2023 +0200

    MINOR: Fix MiniKdc Java 17 issue in system tests (apache#14011)

    Kafka system tests with Java version 17 are failing on this issue:

    ```python
    TimeoutError("MiniKdc didn't finish startup",)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/ducktape/tests/runner_client.py", line 186, in _do_run
        data = self.run_test()
      File "/usr/local/lib/python3.6/site-packages/ducktape/tests/runner_client.py", line 246, in run_test
        return self.test_context.function(self.test)
      File "/usr/local/lib/python3.6/site-packages/ducktape/mark/_mark.py", line 433, in wrapper
        return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
      File "/opt/kafka-dev/tests/kafkatest/sanity_checks/test_verifiable_producer.py", line 74, in test_simple_run
        self.kafka.start()
      File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 635, in start
        self.start_minikdc_if_necessary(add_principals)
      File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 596, in start_minikdc_if_necessary
        self.minikdc.start()
      File "/usr/local/lib/python3.6/site-packages/ducktape/services/service.py", line 265, in start
        self.start_node(node, **kwargs)
      File "/opt/kafka-dev/tests/kafkatest/services/security/minikdc.py", line 114, in start_node
        monitor.wait_until("MiniKdc Running", timeout_sec=60, backoff_sec=1, err_msg="MiniKdc didn't finish startup")
      File "/usr/local/lib/python3.6/site-packages/ducktape/cluster/remoteaccount.py", line 754, in wait_until
        allow_fail=True) == 0, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/ducktape/utils/util.py", line 58, in wait_until
        raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
    ducktape.errors.TimeoutError: MiniKdc didn't finish startup
    ```

    Specifically, when one runs the test cases and looks at the logs of the MiniKdc:
    ```java
    Exception in thread "main" java.lang.IllegalAccessException: class kafka.security.minikdc.MiniKdc cannot access class sun.security.krb5.Config (in module java.security.jgss) because module java.security.jgss does not export sun.security.krb5 to unnamed module @24959ca4
        at java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
        at java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
        at java.base/java.lang.reflect.Method.invoke(Method.java:560)
        at kafka.security.minikdc.MiniKdc.refreshJvmKerberosConfig(MiniKdc.scala:268)
        at kafka.security.minikdc.MiniKdc.initJvmKerberosConfig(MiniKdc.scala:245)
        at kafka.security.minikdc.MiniKdc.start(MiniKdc.scala:123)
        at kafka.security.minikdc.MiniKdc$.start(MiniKdc.scala:375)
        at kafka.security.minikdc.MiniKdc$.main(MiniKdc.scala:366)
        at kafka.security.minikdc.MiniKdc.main(MiniKdc.scala)
    ```

    This error is caused by the fact that sun.security module is no longer supported in Java 16 and higher. Related to the [1].
    There are two ways how to solve it, and I present one of them. The second way is to export the ENV variable during the deployment of the containers using Ducktape in [2].

    [1] - https://openjdk.org/jeps/396
    [2] - https://github.com/apache/kafka/blob/trunk/tests/docker/ducker-ak#L308

    Reviewers: Ismael Juma <ismael@juma.me.uk>, Luke Chen <showuon@gmail.com>

commit 7a2e11c
Author: Matthias J. Sax <matthias@confluent.io>
Date:   Sun Aug 6 10:20:08 2023 -0700

    MINOR: update Kafka Streams state.dir doc (apache#14155)

    Default state directory was changes in 2.8.0 release (cf KAFKA-10604)

    Reviewers: Guozhang Wang <wangguoz@gmail.com>

commit 748175c
Author: Luke Chen <showuon@gmail.com>
Date:   Sat Aug 5 13:00:16 2023 +0800

    KAFKA-15189: only init remote topic metrics when enabled (apache#14133)

    Only initialize remote topic metrics when system-wise remote storage is enabled to avoid impacting performance for existing brokers. Also add tests.

    Reviewers: Divij Vaidya <diviv@amazon.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>

commit faf3635
Author: Matthias J. Sax <matthias@confluent.io>
Date:   Fri Aug 4 21:06:53 2023 -0700

    MINOR: improve logging for FK-join (apache#14105)

    Reviewers: Colt McNealy <colt@littlehorse.io>, Walker Carlson <wcarlson@confluent.io>

commit b3db905
Author: Ivan Yurchenko <ivanyu@aiven.io>
Date:   Fri Aug 4 15:53:25 2023 +0300

    KAFKA-15107: Support custom metadata for remote log segment (apache#13984)

    * KAFKA-15107: Support custom metadata for remote log segment

    This commit does the changes discussed in the KIP-917. Mainly, changes the `RemoteStorageManager` interface in order to return `CustomMetadata` and then ensures these custom metadata are stored, propagated, (de-)serialized correctly along with the standard metadata throughout the whole lifecycle. It introduces the `remote.log.metadata.custom.metadata.max.size` to limit the custom metadata size acceptable by the broker and stop uploading in case a piece of metadata exceeds this limit.

    On testing:
    1. `RemoteLogManagerTest` checks the case when a piece of custom metadata is larger than the configured limit.
    2. `RemoteLogSegmentMetadataTest` checks if `createWithUpdates` works correctly, including custom metadata.
    3. `RemoteLogSegmentMetadataTransformTest`, `RemoteLogSegmentMetadataSnapshotTransformTest`, and `RemoteLogSegmentMetadataUpdateTransformTest` were added to test the corresponding class (de-)serialization, including custom metadata.
    4. `FileBasedRemoteLogMetadataCacheTest` checks if custom metadata are being correctly saved and loaded to a file (indirectly, via `equals`).
    5. `RemoteLogManagerConfigTest` checks if the configuration setting is handled correctly.

    Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>

commit 7782741
Author: Bruno Cadonna <cadonna@apache.org>
Date:   Fri Aug 4 09:07:58 2023 +0200

    KAFKA-10199: Change to RUNNING if no pending task to recycle exist (apache#14145)

    A stream thread should only change to RUNNING if there are no
    active tasks in restoration in the state updater and if there
    are no pending tasks to recycle.

    There are situations in which a stream thread might only have
    standby tasks that are recycled to active task after a rebalance.
    In such situations, the stream thread might be faster in checking
    active tasks in restoration then the state updater removing the
    standby task to recycle from the state updater. If that happens
    the stream thread changes to RUNNING although it should wait until
    the standby tasks are recycled to active tasks and restored.

    Reviewers: Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <matthias@confluent.io>

commit e0b7499
Author: flashmouse <jackson_666@qq.com>
Date:   Fri Aug 4 02:17:08 2023 +0800

    KAFKA-15106: Fix AbstractStickyAssignor isBalanced predict (apache#13920)

    in 3.5.0 AbstractStickyAssignor may run useless loop in performReassignments  because isBalanced have a trivial mistake, and result in rebalance timeout in some situation.

    Co-authored-by: lixy <lixy@tuya.com>
    Reviewers: Ritika Reddy <rreddy@confluent.io>, Philip Nee <pnee@confluent.io>, Kirk True <kirk@mustardgrain.com>, Guozhang Wang <wangguoz@gmail.com>

commit b9936d6
Author: Yash Mayya <yash.mayya@gmail.com>
Date:   Thu Aug 3 18:07:35 2023 +0100

    KAFKA-7438: Replace PowerMockRunner with MockitoJUnitRunner in RetryUtilTest (apache#14143)

    Reviewers: Chris Egerton <chrise@aiven.io>

commit 7d39d74
Author: Divij Vaidya <diviv@amazon.com>
Date:   Thu Aug 3 11:05:01 2023 +0200

    MINOR: Fix debug logs to display TimeIndexOffset (apache#13935)

    Reviewers: Luke Chen <showuon@gmail.com>

commit d89b26f
Author: Kamal Chandraprakash <kchandraprakash@uber.com>
Date:   Thu Aug 3 13:56:00 2023 +0530

    KAFKA-12969: Add broker level config synonyms for topic level tiered storage configs (apache#14114)

    KAFKA-12969: Add broker level config synonyms for topic level tiered storage configs.

    Topic -> Broker Synonym:
    local.retention.bytes -> log.local.retention.bytes
    local.retention.ms -> log.local.retention.ms

    We cannot add synonym for `remote.storage.enable` topic property as it depends on KIP-950

    Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>

commit bb48b15
Author: Hao Li <1127478+lihaosky@users.noreply.github.com>
Date:   Wed Aug 2 19:20:23 2023 -0700

    KAFKA-15022: [5/N] compute rack aware assignment for standby tasks (apache#14108)

    Part of KIP-925.

    Reviewer: Matthias J. Sax <matthias@confluent.io>

commit 8aaf7da
Author: Abhijeet Kumar <abhijeet.cse.kgp@gmail.com>
Date:   Wed Aug 2 12:27:25 2023 +0530

    KAFKA-15236: Rename tiered storage metrics (apache#14074)

    Rename tiered storage metrics

    Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>

commit ffe5f9f
Author: Kamal Chandraprakash <kchandraprakash@uber.com>
Date:   Wed Aug 2 12:05:40 2023 +0530

    KAFKA-15272: Fix the logic which finds candidate log segments to upload it to tiered storage (apache#14128)

    In tiered storage, a segment is eligible for deletion from local disk when it gets uploaded to the remote storage.

    If the topic active segment contains some messages and there are no new incoming messages, then the active segment gets rotated to passive segment after the configured log.roll.ms timeout.

    The logic to find the candidate segment in RemoteLogManager does not include the recently rotated passive segment as eligible to upload it to remote storage so the passive segment won't be removed even after if it breaches by retention time/size. (ie) Topic won't be empty after it becomes stale.

    Added unit test to cover the scenario which will fail without this patch.

    Reviewers: Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>

commit 0ce1640
Author: Hao Li <1127478+lihaosky@users.noreply.github.com>
Date:   Tue Aug 1 17:33:24 2023 -0700

    KAFKA-15022: [4/N] use client tag assignor for rack aware standby task assignment (apache#14097)

    Part of KIP-925.

    For rack aware standby task assignment, we can either use the already existing "rack tags" or as a fall-back the newly added "rack.id". This PR unifies both without the need to change the actual standby task assignment logic.

    Reviewers: Matthias J. Sax <matthias@confluent.io>

commit b9a4554
Author: Greg Harris <greg.harris@aiven.io>
Date:   Tue Aug 1 10:05:46 2023 -0700

    KAFKA-15244: Remove PluginType.from(Class) (apache#14089)

    Reviewers: Chris Egerton <chrise@aiven.io>

commit 7ecf518
Author: Christo Lolov <lolovc@amazon.com>
Date:   Tue Aug 1 15:10:39 2023 +0100

    KAFKA-14661: Upgrade Zookeeper to 3.8.1 (apache#13260)

    Reviewers: Divij Vaidya <diviv@amazon.com>, Mickael Maison <mickael.maison@gmail.com>

commit 660e6fe
Author: hzh0425 <642256541@qq.com>
Date:   Tue Aug 1 14:53:42 2023 +0800

    MINOR: Fix some typos in remote.metadata.storage (apache#13133)

    Fix some typos in remote.metadata.storage

    Reviewers: Luke Chen <showuon@gmail.com>

commit 938fee2
Author: David Arthur <mumrah@gmail.com>
Date:   Mon Jul 31 09:21:22 2023 -0400

    Fix a Scala 2.12 compile issue (apache#14126)

    Reviewers: Luke Chen <showuon@gmail.com>, Qichao Chu

commit 3ba718e
Author: Yash Mayya <yash.mayya@gmail.com>
Date:   Fri Jul 28 19:35:42 2023 +0100

    MINOR: Remove duplicate instantiation of MockConnectMetrics in AbstractWorkerSourceTaskTest (apache#14091)

    Reviewers: Christo Lolov <christololov@gmail.com>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Greg Harris <greg.harris@aiven.io>

commit 1574b9f
Author: David Jacot <djacot@confluent.io>
Date:   Fri Jul 28 20:28:54 2023 +0200

    MINOR: Code cleanups in group-coordinator module (apache#14117)

    This patch does a few code cleanups in the group-coordinator module.

    It renames Coordinator to CoordinatorShard;
    It renames ReplicatedGroupCoordinator to GroupCoordinatorShard. I was never really happy with this name. The new name makes more sense to me;
    It removes TopicPartition from the GroupMetadataManager. It was only used in log messages. The log context already includes it so we don't have to log it again.
    It renames assignors to consumerGroupAssignors.

    Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>

commit 3709901
Author: Ritika Reddy <98577846+rreddy-22@users.noreply.github.com>
Date:   Fri Jul 28 10:30:04 2023 -0700

    KAFKA-14702: Extend server side assignor to support rack aware replica placement (apache#14099)

    This patch updates the `PartitionAssignor` interface to support rack-awareness. The change introduces the `SubscribedTopicDescriber` interface that can be used to retrieve topic metadata such as the number of partitions or the racks from within an assignor. We use an interface because it allows us to wrap internal data structures instead of having to copy them. It is more efficient.

    Reviewers: David Jacot <djacot@confluent.io>

commit 32c39c8
Author: David Arthur <mumrah@gmail.com>
Date:   Fri Jul 28 13:02:47 2023 -0400

    KAFKA-15263 Check KRaftMigrationDriver state in each event (apache#14115)

    Reviewers: Colin P. McCabe <cmccabe@apache.org>

commit 811ae01
Author: Philip Nee <pnee@confluent.io>
Date:   Fri Jul 28 09:11:20 2023 -0700

    MINOR: Test assign() and assignment() in the integration test (apache#14086)

    A missing piece from KAFKA-14950. This is to test assign() and assignment() in the integration test.

    Also fixed an accidental mistake in the committed API.

    Reviewers: Jun Rao <junrao@gmail.com>

commit 19f9e1e
Author: Jeff Kim <kimkb2011@gmail.com>
Date:   Fri Jul 28 09:13:27 2023 -0400

    KAFKA-14501: Implement Heartbeat protocol in new GroupCoordinator (apache#14056)

    This patch implements the existing Heartbeat API in the new Group Coordinator.

    Reviewers: David Jacot <djacot@confluent.io>

commit dcabc29
Author: David Jacot <djacot@confluent.io>
Date:   Fri Jul 28 14:49:48 2023 +0200

    KAFKA-14048; CoordinatorContext should be protected by a lock (apache#14090)

    Accessing the `CoordinatorContext` in the `CoordinatorRuntime` should be protected by a lock. The runtime guarantees that the context is never access concurrently however it is accessed by multiple threads. The lock is here to ensure that we have a proper memory barrier. The patch does the following:
    1) Adds a lock to `CoordinatorContext`;
    2) Adds helper methods to get the context and acquire/release the lock.
    3) Allow transition from Failed to Loading. Previously, the context was recreated in this case.

    Reviewers: Justine Olshan <jolshan@confluent.io>

commit afe631c
Author: James Shaw <js102@zepler.net>
Date:   Fri Jul 28 10:45:15 2023 +0100

    KAFKA-14967: fix NPE in CreateTopicsResult in MockAdminClient (apache#13671)

    Co-authored-by: James Shaw <james.shaw@masabi.com>
    Reviewers: Mickael Maison <mickael.maison@gmail.com>

commit 722b259
Author: Christo Lolov <lolovc@amazon.com>
Date:   Fri Jul 28 06:40:37 2023 +0100

    KAFKA-14038: Optimise calculation of size for log in remote tier (apache#14049)

    Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>

commit 10bcd4f
Author: Colin Patrick McCabe <cmccabe@apache.org>
Date:   Thu Jul 27 17:01:55 2023 -0700

    KAFKA-15213: provide the exact offset to QuorumController.replay (apache#13643)

    Provide the exact record offset to QuorumController.replay() in all cases. There are several situations
    where this is useful, such as logging, implementing metadata transactions, or handling broker
    registration records.

    In the case where the QC is inactive, and simply replaying records, it is easy to compute the exact
    record offset from the batch base offset and the record index.

    The active QC case is more difficult. Technically, when we submit records to the Raft layer, it can
    choose a batch base offset later than the one we expect, if someone else is also adding records.
    While the QC is the only entity submitting data records, control records may be added at any time.
    In the current implementation, these are really only used for leadership elections. However, this
    could change with the addition of quorum reconfiguration or similar features.

    Therefore, this PR allows the QC to tell the Raft layer that a record append should fail if it
    would have resulted in a batch base offset other than what was expected. This in turn will trigger a
    controller failover. In the future, if automatically added control records become more common, we
    may wish to have a more sophisticated system than this simple optimistic concurrency mechanism. But
    for now, this will allow us to rely on the offset as correct.

    In order that the active QC can learn what offset to start writing at, the PR also adds a new
    RaftClient#endOffset function.

    At the Raft level, this PR adds a new exception, UnexpectedBaseOffsetException. This gets thrown
    when we request a base offset that doesn't match the one the Raft layer would have given us.
    Although this exception should cause a failover, it should not be considered a fault. This
    complicated the exception handling a bit and motivated splitting more of it out into the new
    EventHandlerExceptionInfo class. This will also let us unit test things like slf4j log messages a
    bit better.

    Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>

commit e5861ee
Author: Alyssa Huang <ahuang@confluent.io>
Date:   Thu Jul 27 13:12:25 2023 -0700

    [MINOR] Add latest versions to kraft upgrade kafkatest (apache#14084)

    Reviewers: Ron Dagostino <rndgstn@gmail.com>

commit 6f39ef0
Author: Justine Olshan <jolshan@confluent.io>
Date:   Thu Jul 27 09:36:32 2023 -0700

    MINOR: Adjust Invalid Record Exception for Invalid Txn State as mentioned in KIP-890 (apache#14088)

    Invalid record is a newer error. INVALID_TXN_STATE has been around as long as transactions and is not retriable. This is the desired behavior.

commit 29825ee
Author: David Jacot <djacot@confluent.io>
Date:   Thu Jul 27 13:18:10 2023 +0200

    KAFKA-14499: [3/N] Implement OffsetCommit API (apache#14067)

    This patch introduces the `OffsetMetadataManager` and implements the `OffsetCommit` API for both the old rebalance protocol and the new rebalance protocol. It introduces version 9 of the API but keeps it as unstable for now. The patch adds unit tests to test the API. Integration tests will be done separately.

    Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>

commit 353141e
Author: Divij Vaidya <diviv@amazon.com>
Date:   Thu Jul 27 12:33:34 2023 +0200

    KAFKA-15251: Add 3.5.1 to system tests (apache#14069)

    Reviewers: Matthias J. Sax <matthias@confluent.io>

commit d2fc907
Author: Jeff Kim <kimkb2011@gmail.com>
Date:   Thu Jul 27 02:02:29 2023 -0400

    KAFKA-14500; [6/6] Implement SyncGroup protocol in new GroupCoordinator (apache#14017)

    This patch implements the SyncGroup API in the new group coordinator. All the new unit tests are based on the existing scala tests.

    Reviewers: David Jacot <djacot@confluent.io>

commit ed44bcd
Author: Hao Li <1127478+lihaosky@users.noreply.github.com>
Date:   Wed Jul 26 16:02:52 2023 -0700

    KAFKA-15022: [3/N] use graph to compute rack aware assignment for active stateful tasks (apache#14030)

    Part of KIP-925.

    Reviewers: Matthias J. Sax <matthias@confluent.io>

commit 8135b6d
Author: Said Boudjelda <bmscomp@gmail.com>
Date:   Wed Jul 26 19:52:02 2023 +0200

    KAFKA-15235: Fix broken coverage reports since migration to Gradle 8.x (apache#14075)

    Reviewers: Divij Vaidya <diviv@amazon.com>

commit e5fb9b6
Author: Said Boudjelda <bmscomp@gmail.com>
Date:   Wed Jul 26 19:12:27 2023 +0200

    MINOR: upgrade version of gradle plugin (ben-manes.versions) to 0.47.0 (apache#14098)

    Reviewers: Divij Vaidya <diviv@amazon.com>

commit a900794
Author: David Arthur <mumrah@gmail.com>
Date:   Wed Jul 26 12:54:59 2023 -0400

    KAFKA-15196 Additional ZK migration metrics (apache#14028)

    This patch adds several metrics defined in KIP-866:

    * MigratingZkBrokerCount: the number of zk brokers registered with KRaft
    * ZkWriteDeltaTimeMs: time spent writing MetadataDelta to ZK
    * ZkWriteSnapshotTimeMs: time spent writing MetadataImage to ZK
    * Adds value 4 for "ZK" to ZkMigrationState

    Also fixes a typo in the metric name introduced in apache#14009 (ZKWriteBehindLag -> ZkWriteBehindLag)

    Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>

commit 6d81698
Author: sciclon2 <74413315+sciclon2@users.noreply.github.com>
Date:   Wed Jul 26 15:48:09 2023 +0200

    KAFKA-15243: Set decoded user names to DescribeUserScramCredentialsResponse (apache#14094)

    Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

commit ff390ab
Author: vamossagar12 <sagarmeansocean@gmail.com>
Date:   Wed Jul 26 17:56:20 2023 +0530

    [MINOR] Fix Javadoc comment in KafkaFuture#toCompletionStage (apache#14100)

    Fix Javadoc comment in KafkaFuture#toCompletionStage

    Reviewers: Luke Chen <showuon@gmail.com>

commit bb677c4
Author: Federico Valeri <fedevaleri@gmail.com>
Date:   Wed Jul 26 12:04:34 2023 +0200

    KAFKA-14583: Move ReplicaVerificationTool to tools (apache#14059)

    Reviewers: Mickael Maison <mickael.maison@gmail.com>

commit 4d30cbf
Author: Said Boudjelda <bmscomp@gmail.com>
Date:   Wed Jul 26 11:21:36 2023 +0200

    MINOR: Upgrade the minor version of snappy dependency to 1.1.10.3 (apache#14072)

    Reviewers: Divij Vaidya <diviv@amazon.com>

commit 206a4af
Author: Divij Vaidya <diviv@amazon.com>
Date:   Wed Jul 26 11:19:56 2023 +0200

    MINOR: Add co-authors to release email template (apache#14080)

    Reviewers: Mickael Maison <mickael.maison@gmail.com>

commit 46a8a28
Author: vamossagar12 <sagarmeansocean@gmail.com>
Date:   Wed Jul 26 07:21:23 2023 +0530

    KAFKA-15218: Avoid NPE thrown while deleting topic and fetch from follower concurrently (apache#14051)

    When deleting topics, we'll first clear all the remoteReplicaMap when stopPartitions here. But this time, there might be fetch request coming from follower, and try to check if the replica is eligible to be added into ISR here. At this moment, NPE will be thrown. Although it's fine since this topic is already deleted, it'd be better to avoid it happen.

    Reviewers: Luke Chen <showuon@gmail.com>

commit af1f50f
Author: Matthias J. Sax <matthias@confluent.io>
Date:   Tue Jul 25 14:56:58 2023 -0700

    MINOR: fix docs markup (apache#14085)

    Reviewers: Qichao Chu (@ex172000), Mickael Maison <mickael.maison@gmail.com>

commit e794bc7
Author: David Arthur <mumrah@gmail.com>
Date:   Tue Jul 25 16:05:04 2023 -0400

    MINOR: Add a Builder for KRaftMigrationDriver (apache#14062)

    Reviewers: Justine Olshan <jolshan@confluent.io>

commit 8b027b6
Author: tison <wander4096@gmail.com>
Date:   Tue Jul 25 23:56:49 2023 +0800

    MINOR: Fix typo in ProduceRequest.json (apache#14070)

    Reviewers: Mickael Maison <mickael.maison@gmail.com>

commit 08b3820
Author: Yash Mayya <yash.mayya@gmail.com>
Date:   Tue Jul 25 14:03:29 2023 +0100

    KAFKA-15238: Move DLQ reporter setup from the DistributedHerder's tick thread to the sink task thread (apache#14079)

    Reviewers: Chris Egerton <chrise@aiven.io>

commit 58b8c5c
Author: Chris Egerton <chrise@aiven.io>
Date:   Tue Jul 25 05:12:46 2023 -0700

    MINOR: Downgrade log level for conflicting Connect plugin aliases (apache#14081)

    Reviewers: Greg Harris <greg.harris@aiven.io>

commit c7de30f
Author: Colin Patrick McCabe <cmccabe@apache.org>
Date:   Mon Jul 24 21:13:58 2023 -0700

    KAFKA-15183: Add more controller, loader, snapshot emitter metrics (apache#14010)

    Implement some of the metrics from KIP-938: Add more metrics for
    measuring KRaft performance.

    Add these metrics to QuorumControllerMetrics:
        kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCount
        kafka.controller:type=KafkaController,name=EventQueueOperationsStartedCount
        kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCount
        kafka.controller:type=KafkaController,name=NewActiveControllersCount

    Create LoaderMetrics with these new metrics:
        kafka.server:type=MetadataLoader,name=CurrentMetadataVersion
        kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCount

    Create SnapshotEmitterMetrics with these new metrics:
        kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedBytes
        kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedAgeMs

    Reviewers: Ron Dagostino <rndgstn@gmail.com>

commit 79b8c96
Author: David Mao <47232755+splett2@users.noreply.github.com>
Date:   Mon Jul 24 13:22:25 2023 -0700

    KAFKA-14990: Dynamic producer ID expiration should be applied on a broker restart (apache#13707)

    Dynamic overrides for the producer ID expiration config are not picked up on broker restart in Zookeeper mode. Based on the integration test, this does not apply to KRaft mode.

    Adds a broker restart that fails without the corresponding KafkaConfig change.

    Reviewers: Justine Olshan <jolshan@confluent.io>

commit 38781f9
Author: Justine Olshan <jolshan@confluent.io>
Date:   Mon Jul 24 13:08:57 2023 -0700

    KAFKA-14920: Address timeouts and out of order sequences (apache#14033)

    When creating a verification state entry, we also store sequence and epoch. On subsequent requests, we will take the latest epoch seen and the earliest sequence seen. That way, if we try to append a sequence after the earliest seen sequence, we can block that and retry. This addresses potential OutOfOrderSequence loops caused by errors during verification (coordinator loading, timeouts, etc).

    Reviewers:  David Jacot <david.jacot@gmail.com>,  Artem Livshits <alivshits@confluent.io>
jeqo pushed a commit to aiven/kafka that referenced this pull request Aug 15, 2023
…a placement (apache#14099)

This patch updates the `PartitionAssignor` interface to support rack-awareness. The change introduces the `SubscribedTopicDescriber` interface that can be used to retrieve topic metadata such as the number of partitions or the racks from within an assignor. We use an interface because it allows us to wrap internal data structures instead of having to copy them. It is more efficient.

Reviewers: David Jacot <djacot@confluent.io>
rreddy-22 added a commit to rreddy-22/kafka-rreddy that referenced this pull request Sep 20, 2023
…a placement (apache#14099)

This patch updates the `PartitionAssignor` interface to support rack-awareness. The change introduces the `SubscribedTopicDescriber` interface that can be used to retrieve topic metadata such as the number of partitions or the racks from within an assignor. We use an interface because it allows us to wrap internal data structures instead of having to copy them. It is more efficient.

Reviewers: David Jacot <djacot@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants