Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Replicator failed to connect when only enable replication in topic level #21564

Closed

Conversation

liangyepianzhou
Copy link
Contributor

@liangyepianzhou liangyepianzhou commented Nov 12, 2023

Motivation

According to the introduction on our official website, geo-replication can be enabled at the namespace level or topic level.
But in the current implementation, creating producer and consumer will check if the cluster is configured in the namespace policy instead of the topic policy. So if replication is only enabled at the topic level, the replicator will connect fail when creating a producer.

protected CompletableFuture<PartitionedTopicMetadata> getPartitionedTopicMetadataAsync(
TopicName topicName, boolean authoritative, boolean checkAllowAutoCreation) {
// validates global-namespace contains local/peer cluster: if peer/local cluster present then lookup can
// serve/redirect request else fail partitioned-metadata-request so, client fails while creating
// producer/consumer
return validateClusterOwnershipAsync(topicName.getCluster())
.thenCompose(__ -> validateGlobalNamespaceOwnershipAsync(topicName.getNamespaceObject()))
.thenCompose(__ -> validateTopicOperationAsync(topicName, TopicOperation.LOOKUP))
.thenCompose(__ -> {
if (checkAllowAutoCreation) {
return pulsar().getBrokerService()
.fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync(topicName);
} else {
return pulsar().getBrokerService().fetchPartitionedTopicMetadataAsync(topicName);
}
});
}

2023-11-12T23:00:21,915 - WARN  - [pulsar-io-105-5:PulsarWebResource@915] - Namespace missing local cluster name in clusters list: local_cluster=r2 ns=pulsar/testEnableReplicationInTopicLevel-479d54e6-5b63-4218-8af9-3bca99ffba44 clusters=[r1]
2023-11-12T23:00:21,916 - WARN  - [pulsar-io-105-5:ServerCnx@593] - Failed to get Partitioned Metadata [/127.0.0.1:52961] persistent://pulsar/testEnableReplicationInTopicLevel-479d54e6-5b63-4218-8af9-3bca99ffba44/__change_events: Namespace missing local cluster name in clusters list: local_cluster=r2 ns=pulsar/testEnableReplicationInTopicLevel-479d54e6-5b63-4218-8af9-3bca99ffba44 clusters=[r1]
org.apache.pulsar.broker.web.RestException: Namespace missing local cluster name in clusters list: local_cluster=r2 ns=pulsar/testEnableReplicationInTopicLevel-479d54e6-5b63-4218-8af9-3bca99ffba44 clusters=[r1]
	at org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$27(PulsarWebResource.java:916) ~[classes/:?]
	at java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:757) ~[?:?]
	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:735) ~[?:?]
	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2182) ~[?:?]
	at org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$29(PulsarWebResource.java:906) ~[classes/:?]
	at java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:757) ~[?:?]
	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:735) ~[?:?]
	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2182) ~[?:?]
	at org.apache.pulsar.broker.web.PulsarWebResource.checkLocalOrGetPeerReplicationCluster(PulsarWebResource.java:890) ~[classes/:?]
	at org.apache.pulsar.broker.web.PulsarWebResource.checkLocalOrGetPeerReplicationClusterAtTopicLevel(PulsarWebResource.java:950) ~[classes/:?]
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.unsafeGetPartitionedTopicMetadataAsync(PersistentTopicsBase.java:4382) ~[classes/:?]
	at org.apache.pulsar.broker.service.ServerCnx.lambda$handlePartitionMetadataRequest$7(ServerCnx.java:581) ~[classes/:?]
	at java.util.concurrent.CompletableFuture.uniApplyNow(CompletableFuture.java:684) ~[?:?]
	at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:662) ~[?:?]
	at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:2168) ~[?:?]
	at org.apache.pulsar.broker.service.ServerCnx.handlePartitionMetadataRequest(ServerCnx.java:578) ~[classes/:?]
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:134)

Why Does This Situation Occur and the Associated Modifications

  1. PIP-8 Introduce peer cluster for global namespace redirection
    Validation on PartitionedMetadata-Lookup: Fail request if global namespace's replication-clusters doesn't contain current/peer-clusters (Earlier this validation was only present at lookup only). So, client can't create producer/consumer object and doesn't do internal retry for lookup.
  2. Support for setting geo-replication clusters on topic level #12136 Support for setting geo-replication clusters on topic level
    This PR supports the configuration of geo-replication at the topic level.
    However, this PR's test only included tests for policy change, and there was no test to see if replication could work normally if only enabled on the topic level.

Modifications

Draft.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@liangyepianzhou liangyepianzhou marked this pull request as draft November 12, 2023 15:31
@liangyepianzhou liangyepianzhou self-assigned this Nov 12, 2023
@liangyepianzhou liangyepianzhou changed the title [fix][broker] Replicator can not create producer when only enable replication in topic level [fix][broker] Replicator failed to connect when only enable replication in topic level Nov 12, 2023
@apache apache deleted a comment from github-actions bot Nov 13, 2023
@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Nov 13, 2023
@Technoboy- Technoboy- added this to the 3.3.0 milestone Dec 22, 2023
liangyepianzhou added a commit that referenced this pull request Jan 23, 2024
…evel (#21648)

Co-authored-by: Jiwe Guo <technoboy@apache.org>
### Motivation
Because the configurations of clusters at the namespace level are unclear, some flexible topic policies can not work as expected, e.g. geo-replication at the topic level. see more information about the bug at #21564.
### Modifications
Use `replication-clusters`, `allowed-clusters`, and `topic-policy-synchronized-clusters` to replace a single `replication-clusters` originally in the namespace policy.
See more explanation in the proposal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants