Fix modifying Map in policies is not thread safe#9850
Fix modifying Map in policies is not thread safe#9850315157973 wants to merge 3 commits intoapache:masterfrom
Conversation
|
@lhotari Please take a look |
|
@315157973 Good work. I think it would be great if the issue gets fixed with such a simple change. I think that the changes would also have to cover org.apache.pulsar.common.policies.data.AuthPolicies class in order to fix #9711 . The mutations & thread-safe access of |
|
copy-on-write alone isn't sufficient in Java since it could lead to "unsafe publication" where some values in the published reference aren't visible to the other thread (safe publication explained). Sharing a HashSet to another thread requires the usage of Collections.unmodifiableSet wrapper (unless there is another way to ensure "safe publication"). The usage of Collections.unmodifiableSet wrapper ensures safe publication. This is explained in this SO answer: https://stackoverflow.com/a/5379941 Similarly, sharing a HashMap reference requires the usage of Collections.unmodifiableMap wrapper unless there is some other means to ensure "safe publication". |
You're right. Sorry, I missed some points. This Set will only add data when it is created, and it will only be read later. |
As long as the field is accessed in another thread, it's necessary to handle it. |
|
What about |
If this happens in different threads, it must be handled. |
Different threads will not add data to a same |
"safe publication" is also needed for objects that aren't modified. I guess there's an existing way to achieve "safe publication" in that case? |
I understand that you are talking about the visibility of objects to other threads in Policies, just like double check locks. If we want to solve the problem of visibility, we can set all internal objects to volatile, but it doesn't make much sense. Now all the attributes in Policies are objects, and many of them are directly assigned. Does this kind of short-lived invisible have a great impact on us? If it has a big impact, we can open another Issue to solve the visibility of all the objects in the Policy |
Yes,I will add in soon |
Yes, it's really necessary to achieve "safe publication" when new objects are shared to other threads. Otherwise, thread safety issues won't be fixed. Using "volatile" isn't necessary for achieving "safe publication". Since the description of this PR says "Fixes #9711", I think that should be removed from the description or the remaining thread safety issues should be fixed as part of this PR. |
|
Great work @315157973 , really elegant. |
|
It's great that the thread safely of the data structures is now handled so we won't run into infinite loops which would happen with plain HashMaps. With the ConcurrentHashMaps, it will also be possible to handle data consistency issues. There are some remaining data consistency ("lost update") issues when mutating There might be some other locations besides the ones above. Would it help to use @315157973 are you thinking of addressing the mutations of the |
The solution I thought of: |
Yes, I think that it's a good idea to handle the ZK and distributed updates issues separately. Please create a new issue for the data consistency issues around ZK and distributed modifications of policies since that is a broader scope than #9711. My previous question was about preventing the "lost updates" problems that happen in the local data structures. I consider that problem a part of #9711. Please take a look if you could change the mutation logic to prevent the "lost updates" issue in the locations that were listed in my previous comment. @315157973 Could you resolve the local "lost updates" issues as part of this current PR, since after that I think it's ok to close 9711 with this PR? WDYT? |
|
I thought of these solutions: 1 Tag Policies with the version of zk, lock them when reading the cache, and clone a copy of Policies. When updating to zk, if the version is incorrect, re-read the latest cache of zk and redo the modification, and then update. 2 In the unit of namespace, all namespace operations are put into orderExecutor, and each namespace is processed by a fixed thread. Then call back. I tend to use method 2, and this PR will be closed |
Sounds good. It would be useful to write a separate issue which explains the current challenges in updating policies in a distributed setup with multiple brokers. The only problem with this approach is that we cannot close 9711 before the other PRs are delivered. I can provide a fix for 9711 in the meantime (before the ZK & single writer approach is delivered) using Map.compute. That could become obsolete later, but it doesn't cause any harm to do that as an intermediate step since it would fix 9711. |
If it is single-threaded processing, there is no competition problem, and my PR will fix your problem. |
Yes, it will eventually fix it. It would be helpful if you could create a separate issue about the ZK issues that you are planning to address since that is beyond #9711. Are you fine with that? |
|
|
||
| policies.auth_policies.destination_auth.get(topicUri).put(role, actions); | ||
| Set<AuthAction> authActionSet = Collections.newSetFromMap(new ConcurrentHashMap<>()); | ||
| authActionSet.addAll(actions); |
There was a problem hiding this comment.
I'd recommend using the Map.compute method here instead of separate lookups and mutations since besides fixing "lost updates", the code is cleaner also when concurrency is not a concern.
| pulsarResources.getNamespaceResources().set(policiesPath, (policies) -> { | ||
| Set<AuthAction> authActionSet = Collections.newSetFromMap(new ConcurrentHashMap<>()); | ||
| authActionSet.addAll(actions); | ||
| policies.auth_policies.namespace_auth.put(role, authActionSet); |
There was a problem hiding this comment.
use Map.compute for the mutation (rational in the previous comment)
| } else { | ||
| policies.auth_policies.subscription_auth_roles.put(subscriptionName, roles); | ||
| Set<String> roleSet = Collections.newSetFromMap(new ConcurrentHashMap<>()); | ||
| roleSet.addAll(roles); | ||
| policies.auth_policies.subscription_auth_roles.put(subscriptionName, roleSet); | ||
| } |
There was a problem hiding this comment.
handle as part of the Map.compute function
| public BundlesData bundles; | ||
| @SuppressWarnings("checkstyle:MemberName") | ||
| public Map<BacklogQuota.BacklogQuotaType, BacklogQuota> backlog_quota_map = Maps.newHashMap(); | ||
| @JsonDeserialize(as = ConcurrentHashMap.class) |
There was a problem hiding this comment.
I'd rather not follow this approach because it leaves out other fields (that are also non-thread-safe) and other classes that will have the same issue.
Instead, we should try to find a way to the handling of modifications to these policies is synchronized.
There was a problem hiding this comment.
@merlimat That's true that there is this uncomfortable feeling. :) However, it seems that this ConcurrentHashMap based solution makes a lot of sense in this context. One of the benefits is that the change doesn't require huge architectural changes and the risk that it impacts performance is very minor.
Instead, we should try to find a way to the handling of modifications to these policies is synchronized.
Does that help? Wouldn't it mean that reads are also synchronized? Wouldn't that impact performance? Perhaps I misunderstood what you meant with "synchronized".
There was a problem hiding this comment.
Wouldn't it mean that reads are also synchronized?
Not necessarily, we could enforce to always clone the POJO before updating them, in copy-on-write fashion.
There was a problem hiding this comment.
@merlimat Yes, makes sense. There were some ideas in that direction in #9711 (comment) .
There was a problem hiding this comment.
I will create a new PR and use orderedExecutor to make it thread safe
#9711
Motivation
The policies of the namespace contains Map, and when we modify the Policies, the Map API will be called. If we use HashMap, it will cause thread safety issues.
Modifications
HashMap changed to ConcurrentHashMap
Verifying this change
After jackson serialization and deserialization, the type of map is still ConcurrentHashMap