New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka-16540: Update partitions if min isr config is changed. #15702
base: trunk
Are you sure you want to change the base?
Conversation
@mumrah Can you help take a look? |
metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java
Outdated
Show resolved
Hide resolved
void maybeTriggerMinIsrConfigUpdate(Optional<String> topicName) throws InterruptedException, ExecutionException { | ||
appendWriteEvent("partitionUpdateForMinIsrChange", OptionalLong.empty(), | ||
() -> replicationControl.getPartitionElrUpdatesForConfigChanges(topicName)).get(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling .get()
on an appendWriteEvent doesn't look right to me. If I understand correctly, the appendWriteEvents
are handled in the quorum controller event loop thread.
We would expect replay()
to also be called in the event loop thread. so if we trigger an appendWriteEvent
and block waiting for the result, it would always time out, since we are blocking the processing thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, we basically only need to call the appendWriteEvents and do not wait for the replay().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I think a better way to think about it is that we want to append the min ISR config update atomically with the partition change records. Appending the partition change records once the config change is replayed is difficult to reason about and possibly incorrect. Thinking a bit more about it, triggering a write event from the replay()
for the config change record means that every time we reload the metadata log, we would replay the config change record and generate new partition change records.
Perhaps one example to look at is ReplicationControlManager.handleBrokerFenced
. When a broker is fenced, we generate a broker registration change record along with the leaderAndIsr partition change records. I assume we want to follow a similar model with the topic configuration change events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, I have some misunderstanding about the controller events. Will update. Thanks!
metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/BrokersToElrs.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java
Show resolved
Hide resolved
if (configRecord.name().equals(TopicConfig.MIN_IN_SYNC_REPLICAS_CONFIG)) { | ||
minIsrRecords.add(configRecord); | ||
if (Type.forId(configRecord.resourceType()) == Type.TOPIC) { | ||
if (configRecord.value() == null) topicMap.put(configRecord.resourceName(), configRecord.value()); | ||
else configRemovedTopicMap.put(configRecord.resourceName(), configRecord.value()); | ||
} | ||
} | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the behavior if the default broker config for min.insync.replicas
is changed?
I am not actually sure how that impacts the min.insync.replicas
for existing topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If min.insync.replicas
is not set on the topic config level, the effective min.insync.replicas
of a topic will change if default broker config is updated.
for (ConfigRecord record : minIsrRecords) { | ||
replayInternal(record, configDataCopy, localSnapshotRegistry); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we calling replay here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the implementation challenge part of this PR. To find the effective min ISR value, it requires checking topic config -> dynamic broker config -> default broker config -> ...
Let's say the user updates the default broker config:
- All the topics could be affected.
- The effective min ISR values should be recalculated.
- We need to generate the partition change records along with the config change records, which means the ReplicationControlManager can't use the regular methods for the effective min ISR value. The value should be determined by the config records and the current configs.
I found it easier to make a copy of the configs and apply the min ISR updates on the copy. Then let the ReplicationControlManager check all the partitions with the config copy.
@@ -66,6 +69,7 @@ public class ConfigurationControlManager { | |||
private final TimelineHashMap<ConfigResource, TimelineHashMap<String, String>> configData; | |||
private final Map<String, Object> staticConfig; | |||
private final ConfigResource currentController; | |||
private final MinIsrConfigUpdatePartitionHandler minIsrConfigUpdatePartitionHandler; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe more of a question for someone with more code ownership of the quorum controller code, but I wonder if it would be preferable to handle generating the replication control manager records in the QuorumController.incrementalAlterConfigs
. That would also make it a bit easier to handle validateOnly
which we are not currently handling.
https://issues.apache.org/jira/browse/KAFKA-16540
If the min isr config is changed, we need to update the partitions with ELR if possible.