Skip to content

Conversation

@superhx
Copy link
Collaborator

@superhx superhx commented Jun 14, 2024

No description provided.

OmniaGM and others added 30 commits April 26, 2024 03:08
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Fix for resetting HB timer when the request is sent, rather than when a response is received. This ensures a more accurate timing of the HB, so that a member always sends HB on the interval (not in the interval + any delay in receiving the response).
This change, along with the logic already in place for checking in-flights, ensures that if the interval expires but there is a HB in-flight, the next HB is only send after the response for the in-flight is received, without waiting for another full interval. This is btw consistent with the timer reset & inflight behaviour for the auto-commit interval.

Reviewers: Kirk True <ktrue@confluent.io>, Bruno Cadonna <cadonna@apache.org>
…hen attempting to remove a partition that isn't assigned (#15737)

Checking that the TopicPartition is in assignment before attempting to remove it.

Also added some logging and refactoring.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Lianet Magrans <lianetmr@gmail.com>
…he user on consumer poll (#15742)

When user-defined rebalance listeners fail with an exception, the expectation is that the error should be propagated to the user as a KafkaException and break the poll loop (behavior in the legacy coordinator). The new consumer executes callbacks in the application thread, and sends an event to the background with the callback result and error if any, passing the error along with the event here to the background thread, but does not seem to propagate the exception to the user.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Kirk True <ktrue@confluent.io>, Bruno Cadonna <cadonna@apache.org>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Change the documentation of the Brokers field to make it clear that it doesn't always have all the
brokers that are listed as replicas.

Reviewer: Colin P. McCabe <cmccabe@apache.org>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…eRecord (#15802)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…nfig (#15796)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…(#15761)

* Make ClusterConfig immutable
* Make BrokerNode immutable
* Refactor out build argument in ControllerNode
* Add setPrefix and replace put property with set map in ClusterConfig
* Remove rollingBrokerRestart from ClusterInstance interface
* Refactor KRaftClusterTest#doOnStartedKafkaCluster

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
… by sorting the topic partition list (#15816)

We are seeing flaky test in `testConsumerGroupHeartbeatWithStableClassicGroup` where the error is caused by the different ordering in the expected and actual values. The patch sorts the topic partition list in the records to fix the issue.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Igor Soarez <soarez@apple.com>, David Jacot <djacot@confluent.io>
…nts (#15804)

Small cleanup: removed version when excluding shaded dependencies from clients library as it's not needed.

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…rors (#15732)

When running ZK migrating to KRaft process, we encountered an issue that the migrating is hanging and the ZkMigrationState cannot move to MIGRATION state. And it is because the pollEvent didn't retry with the retriable MigrationClientException (ZK client retriable errors) while it should. This PR fixes it and add test. And because of this, the poll event will not poll anymore, which causes the KRaftMigrationDriver hanging.

Reviewers: Luke Chen <showuon@gmail.com>, Igor Soarez<soarez@apple.com>, Akhilesh C <akhileshchg@users.noreply.github.com>
This fixes a consumer system test that was failing for the new protocol. The failure was because the test was expecting the eager behaviour of partitions being revoked on every rebalance, and it was wrongfully applying it to the runs with the new protocol too.
This same situation was previously identified and fixed in other parts of the sys test with #15661.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…ongIncarnationId (#15828)

ControllerRegistrationManagerTest is flaky due to the poll in L221. The potential root cause is a race condition between the first poll (L221) and the second poll (L229). Before the second poll, we mock a response (L226), which should be processed by the second poll. However, if the first poll take this away, the second poll would get nothing, and this could lead to an error.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Kuan-Po (Cooper) Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…h (#15824)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Add how to run local website to docs folder.

Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>
… if inflight (#15723)

In some cases, the network layer is very fast and can process a response and send out a follow-up request within the same millisecond timestamp. This is causing problems due to the way we determine if we already have an inflight request.

The previous logic for tracking inflight status used timestamps: if the timestamp from the last received response was less than the timestamp from the last sent request, we'd interpret that as having an inflight request. However, this approach would incorrectly return false from RequestState.requestInFlight() if the two timestamps were equal.

One result of this faulty logic is that in such cases, the consumer would accidentally send multiple heartbeat requests to the consumer group coordinator. The consumer group coordinator would interpret these requests as 'join group' requests and create members for each request. Therefore, the coordinator was under the false understanding that there were more members in the group than there really were. Consequently, if your luck was really bad, the coordinator might assign partitions to one of the duplicate members. Those partitions would be assigned to a phantom consumer that was not reading any data, and this led to flaky tests.

This change introduces a stupid simple flag to RequestState that is set in onSendAttempt and cleared in onSuccessfulAttempt, onFailedAttempt, and reset. A new unit test has been added and this has been tested against all of the consumer unit and integration tests, and has removed all known occurrences of phantom consumer group members in the system tests.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Philip Nee <pnee@confluent.io>
The patch adds a boolean attribute `replayRecords` that specifies whether the records should be replayed.

Reviewers: David Jacot <djacot@confluent.io>
This PR do the following cleanup for TestUtils.scala

1) remove unused methods
2) move methods used by single test class out of

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Add more test cases to TopicImageNodeTest.java.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
… zero (#15773)

Instead of entering pending forever, this PR invoke next schedule after 1ms. However, the side effect is busy-waiting. Hence, This PR also update the docs to remind users about that - the issue about smaller log.segment.delete.delay.ms

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
mimaison and others added 22 commits May 30, 2024 11:51
Reviewers: Luke Chen <showuon@gmail.com>, Apoorv Mittal <amittal@confluent.io>
…5821)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
… (#15779)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…16101)

Reviewers: Chris Egerton <chrise@aiven.io>
…uration (#16092)" (#16141)

This reverts commit 3f70c46.

Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
Added the implementation of the quota manager that will be used to throttle copy and fetch requests from the remote storage. Reference KIP-956

Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kchandraprakash@uber.com>, Jun Rao <junrao@gmail.com>
This PR does following things

System tests should bring up Kafka broker in the native mode
System tests should run on Kafka broker in native mode
Extract out native build command so that it can be reused.
Allow system tests to run on Native Kafka broker using Docker mechanism

To run system tests by bringing up Kafka in native mode:
Pass kafka_mode as native in the ducktape globals:--globals '{\"kafka_mode\":\"native\"}'

Running system tests by bringing up kafka in native mode via docker mechanism
_DUCKTAPE_OPTIONS="--globals '{\"kafka_mode\":\"native\"}'" TC_PATHS="tests/kafkatest/tests/"  bash tests/docker/run_tests.sh

To only bring up ducker nodes to cater native kafka
bash tests/docker/ducker-ak up -m native

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
This patch exposes the group coordinator config `CONSUMER_GROUP_MIGRATION_POLICY_CONFIG`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
…ion (#16135)

Reviewers: Greg Harris <greg.harris@aiven.io>
…TransactionsTest (#16139)

While working on apache/kafka#16120, I noticed that the transaction verification feature is disabled in `TransactionsTest` when the new group coordinator is enabled. We did this initially because the feature was not available in the new group coordinator but we fixed it a long time ago. We can enable it now.

Reviewers: Justine Olshan <jolshan@confluent.io>
…d ports (#16112)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…062)

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…e metadata config (#16130)

As per KIP-1022, we will rename the unstable metadata versions enabled config to support all feature versions.

Features is also updated to return latest production and latest testing versions of each feature.

A feature is production ready when the corresponding metadata version (bootstrapMetadataVersion) is production ready.

Adds tests for the feature usage of the unstableFeatureVersionsEnabled config

Reviewers: David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
… and finish rack-aware standby optimization (#16129)

This fills in the implementation details of the standby task assignment utility functions within TaskAssignmentUtils.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
…#16146)

Reviewers: Justine Olshan <jolshan@confluent.io>, Satish Duggana <satishd@apache.org>
…lementations (#16090)" (#16142)

This reverts commit 8d11d95.

We decided to not release KIP-1033 with AK 3.8

Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
add more unit tests to LogSegments and do some small refactor in LogSegments.java

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…t with legacy (#16125)

* Timeout exception fetching offsets

* Tests
…valid host and ports. (#16048)

Modify regex of HOST_PORT_PATTERN to prevent malformed hosts and ports.

Reviewers: Luke Chen <showuon@gmail.com>
Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
@CLAassistant
Copy link

CLAassistant commented Jun 14, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 32 committers have signed the CLA.

✅ superhx
❌ sjhajharia
❌ OmniaGM
❌ fred-ro
❌ FrankYang0529
❌ showuon
❌ apourchet
❌ dongnuo123
❌ ableegoldman
❌ gongxuanzhang
❌ AndrewJSchofield
❌ loicgreffier
❌ raminqaf
❌ CalvinConfluent
❌ mimaison
❌ emitskevich-blp
❌ jolshan
❌ dengziming
❌ dajac
❌ m1a2st
❌ fanyang
❌ cadonna
❌ muralibasani
❌ nicktelford
❌ chia7712
❌ ahuang98
❌ lianetm
❌ ahmedryasser
❌ abhijeetk88
❌ kagarwal06
❌ brandboat
❌ frankvicky
You have signed the CLA already but the status is still pending? Let us recheck it.

Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
@superhx superhx merged commit c039635 into main Jun 17, 2024
@superhx superhx deleted the merge_3.8 branch June 17, 2024 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.