-
Notifications
You must be signed in to change notification settings - Fork 531
feat: merge from apache kafka 3.8 0971924ebc7e65eb7055010d2400626d31967d8c #1427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Fix for resetting HB timer when the request is sent, rather than when a response is received. This ensures a more accurate timing of the HB, so that a member always sends HB on the interval (not in the interval + any delay in receiving the response). This change, along with the logic already in place for checking in-flights, ensures that if the interval expires but there is a HB in-flight, the next HB is only send after the response for the in-flight is received, without waiting for another full interval. This is btw consistent with the timer reset & inflight behaviour for the auto-commit interval. Reviewers: Kirk True <ktrue@confluent.io>, Bruno Cadonna <cadonna@apache.org>
…hen attempting to remove a partition that isn't assigned (#15737) Checking that the TopicPartition is in assignment before attempting to remove it. Also added some logging and refactoring. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Lianet Magrans <lianetmr@gmail.com>
…he user on consumer poll (#15742) When user-defined rebalance listeners fail with an exception, the expectation is that the error should be propagated to the user as a KafkaException and break the poll loop (behavior in the legacy coordinator). The new consumer executes callbacks in the application thread, and sends an event to the background with the callback result and error if any, passing the error along with the event here to the background thread, but does not seem to propagate the exception to the user. Reviewers: Lianet Magrans <lianetmr@gmail.com>, Kirk True <ktrue@confluent.io>, Bruno Cadonna <cadonna@apache.org>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Change the documentation of the Brokers field to make it clear that it doesn't always have all the brokers that are listed as replicas. Reviewer: Colin P. McCabe <cmccabe@apache.org>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…eRecord (#15802) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…nfig (#15796) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…(#15761) * Make ClusterConfig immutable * Make BrokerNode immutable * Refactor out build argument in ControllerNode * Add setPrefix and replace put property with set map in ClusterConfig * Remove rollingBrokerRestart from ClusterInstance interface * Refactor KRaftClusterTest#doOnStartedKafkaCluster Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
… by sorting the topic partition list (#15816) We are seeing flaky test in `testConsumerGroupHeartbeatWithStableClassicGroup` where the error is caused by the different ordering in the expected and actual values. The patch sorts the topic partition list in the records to fix the issue. Reviewers: Jeff Kim <jeff.kim@confluent.io>, Igor Soarez <soarez@apple.com>, David Jacot <djacot@confluent.io>
…nts (#15804) Small cleanup: removed version when excluding shaded dependencies from clients library as it's not needed. Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…rors (#15732) When running ZK migrating to KRaft process, we encountered an issue that the migrating is hanging and the ZkMigrationState cannot move to MIGRATION state. And it is because the pollEvent didn't retry with the retriable MigrationClientException (ZK client retriable errors) while it should. This PR fixes it and add test. And because of this, the poll event will not poll anymore, which causes the KRaftMigrationDriver hanging. Reviewers: Luke Chen <showuon@gmail.com>, Igor Soarez<soarez@apple.com>, Akhilesh C <akhileshchg@users.noreply.github.com>
This fixes a consumer system test that was failing for the new protocol. The failure was because the test was expecting the eager behaviour of partitions being revoked on every rebalance, and it was wrongfully applying it to the runs with the new protocol too. This same situation was previously identified and fixed in other parts of the sys test with #15661. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…ongIncarnationId (#15828) ControllerRegistrationManagerTest is flaky due to the poll in L221. The potential root cause is a race condition between the first poll (L221) and the second poll (L229). Before the second poll, we mock a response (L226), which should be processed by the second poll. However, if the first poll take this away, the second poll would get nothing, and this could lead to an error. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Kuan-Po (Cooper) Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…h (#15824) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Add how to run local website to docs folder. Signed-off-by: PoAn Yang <payang@apache.org> Reviewers: Luke Chen <showuon@gmail.com>
… if inflight (#15723) In some cases, the network layer is very fast and can process a response and send out a follow-up request within the same millisecond timestamp. This is causing problems due to the way we determine if we already have an inflight request. The previous logic for tracking inflight status used timestamps: if the timestamp from the last received response was less than the timestamp from the last sent request, we'd interpret that as having an inflight request. However, this approach would incorrectly return false from RequestState.requestInFlight() if the two timestamps were equal. One result of this faulty logic is that in such cases, the consumer would accidentally send multiple heartbeat requests to the consumer group coordinator. The consumer group coordinator would interpret these requests as 'join group' requests and create members for each request. Therefore, the coordinator was under the false understanding that there were more members in the group than there really were. Consequently, if your luck was really bad, the coordinator might assign partitions to one of the duplicate members. Those partitions would be assigned to a phantom consumer that was not reading any data, and this led to flaky tests. This change introduces a stupid simple flag to RequestState that is set in onSendAttempt and cleared in onSuccessfulAttempt, onFailedAttempt, and reset. A new unit test has been added and this has been tested against all of the consumer unit and integration tests, and has removed all known occurrences of phantom consumer group members in the system tests. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Philip Nee <pnee@confluent.io>
The patch adds a boolean attribute `replayRecords` that specifies whether the records should be replayed. Reviewers: David Jacot <djacot@confluent.io>
This PR do the following cleanup for TestUtils.scala 1) remove unused methods 2) move methods used by single test class out of Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Add more test cases to TopicImageNodeTest.java. Reviewers: Colin P. McCabe <cmccabe@apache.org>
… zero (#15773) Instead of entering pending forever, this PR invoke next schedule after 1ms. However, the side effect is busy-waiting. Hence, This PR also update the docs to remind users about that - the issue about smaller log.segment.delete.delay.ms Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Luke Chen <showuon@gmail.com>, Apoorv Mittal <amittal@confluent.io>
…5821) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
… (#15779) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…16101) Reviewers: Chris Egerton <chrise@aiven.io>
…uration (#16092)" (#16141) This reverts commit 3f70c46. Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
Added the implementation of the quota manager that will be used to throttle copy and fetch requests from the remote storage. Reference KIP-956 Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kchandraprakash@uber.com>, Jun Rao <junrao@gmail.com>
This PR does following things
System tests should bring up Kafka broker in the native mode
System tests should run on Kafka broker in native mode
Extract out native build command so that it can be reused.
Allow system tests to run on Native Kafka broker using Docker mechanism
To run system tests by bringing up Kafka in native mode:
Pass kafka_mode as native in the ducktape globals:--globals '{\"kafka_mode\":\"native\"}'
Running system tests by bringing up kafka in native mode via docker mechanism
_DUCKTAPE_OPTIONS="--globals '{\"kafka_mode\":\"native\"}'" TC_PATHS="tests/kafkatest/tests/" bash tests/docker/run_tests.sh
To only bring up ducker nodes to cater native kafka
bash tests/docker/ducker-ak up -m native
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
This patch exposes the group coordinator config `CONSUMER_GROUP_MIGRATION_POLICY_CONFIG`. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
…ion (#16135) Reviewers: Greg Harris <greg.harris@aiven.io>
…TransactionsTest (#16139) While working on apache/kafka#16120, I noticed that the transaction verification feature is disabled in `TransactionsTest` when the new group coordinator is enabled. We did this initially because the feature was not available in the new group coordinator but we fixed it a long time ago. We can enable it now. Reviewers: Justine Olshan <jolshan@confluent.io>
…d ports (#16112) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…062) Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…e metadata config (#16130) As per KIP-1022, we will rename the unstable metadata versions enabled config to support all feature versions. Features is also updated to return latest production and latest testing versions of each feature. A feature is production ready when the corresponding metadata version (bootstrapMetadataVersion) is production ready. Adds tests for the feature usage of the unstableFeatureVersionsEnabled config Reviewers: David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
… and finish rack-aware standby optimization (#16129) This fills in the implementation details of the standby task assignment utility functions within TaskAssignmentUtils. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
…#16146) Reviewers: Justine Olshan <jolshan@confluent.io>, Satish Duggana <satishd@apache.org>
…lementations (#16090)" (#16142) This reverts commit 8d11d95. We decided to not release KIP-1033 with AK 3.8 Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
add more unit tests to LogSegments and do some small refactor in LogSegments.java Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…t with legacy (#16125) * Timeout exception fetching offsets * Tests
…valid host and ports. (#16048) Modify regex of HOST_PORT_PATTERN to prevent malformed hosts and ports. Reviewers: Luke Chen <showuon@gmail.com>
Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
|
|
Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
No description provided.