Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Thread safety issues with topic policy mutations #21303

Open
1 of 2 tasks
lhotari opened this issue Oct 6, 2023 · 6 comments
Open
1 of 2 tasks

[Bug] Thread safety issues with topic policy mutations #21303

lhotari opened this issue Oct 6, 2023 · 6 comments
Assignees
Labels
category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost Stale type/bug The PR fixed a bug or issue reported a bug

Comments

@lhotari
Copy link
Member

lhotari commented Oct 6, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Version

Since 2.11

Minimal reproduce step

Topic policies are stored in system topics and stored as PulsarEvent instances which are serialized using Avro. The PulsarEvent can contain a TopicPoliciesEvent when it's eventType is TOPIC_POLICY. The TopicPoliciesEvent contains the TopicPolicies instance.

The problem is that TopicPolicies instances are cached and these cached instances are mutable in a non-thread safe way.

What did you expect to see?

TopicPolicies handling should be thread safe.

What did you see instead?

Thread safety concerns aren't covered in TopicPolicies.

Anything else?

There was a similar issue in mutating namespace policies, #9711. That was fixed with an approach where the shared instances are never mutated. The shared/cached instance is cloned and the clone is mutated.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari lhotari added type/bug The PR fixed a bug or issue reported a bug category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost labels Oct 6, 2023
@lhotari
Copy link
Member Author

lhotari commented Oct 7, 2023

@mattisonchao Since you worked on #21231 recently, would you be interested in checking this one out? I created this issue purely from the perspective of code analysis. The topic policy mutations aren't thread safe and that could lead to other consistency issues. Similar thread safety problems were fixed in the namespace policies (issue #9711). For example #9900 by @315157973 is one of the PRs where this problem was fixed. The fix was essentially about not modifying the shared (via cache) instance at all.

@mattisonchao
Copy link
Member

@lhotari Yep, I am working on it.

@mattisonchao mattisonchao self-assigned this Oct 7, 2023
Copy link

github-actions bot commented Nov 7, 2023

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Nov 7, 2023
@lhotari
Copy link
Member Author

lhotari commented Jan 5, 2024

cc @poorbarcode

@lhotari
Copy link
Member Author

lhotari commented Jan 5, 2024

You can drill down to the Namespace Policy update from org.apache.pulsar.broker.admin.impl.NamespacesBase#updatePoliciesAsync . That uses org.apache.pulsar.metadata.api.MetadataCache#readModifyUpdate under the covers. For TopicPolicies, it would be different since Metadata Store isn't used.

@lhotari
Copy link
Member Author

lhotari commented Jan 29, 2024

There will always be possibilities for race conditions if topic mutations happen on any broker. The thread safety issues mentioned in the issue description might be just one part of the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost Stale type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants