-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][pip] PIP-453: Improve the metadata store threading model #25173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[improve][pip] PIP-453: Improve the metadata store threading model #25173
Conversation
The metadata store thread even spends much more CPU time than the |
It's not related to this PIP, but there's also a possibility to save CPU in derialization. Due to consistency reasons, the MetadataStore cache entries expire after 10 minutes. There's a background refresh in use which means that if the entry has been used before it expires, a new refresh will happen in the background between 5 to 10 minutes from the last refresh. Another detail related to wasted CPU. When an entry gets modified, it would get refreshed 2 times: pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/cache/impl/MetadataCacheImpl.java Lines 274 to 287 in 91e0bf2
pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/cache/impl/MetadataCacheImpl.java Lines 321 to 327 in 91e0bf2
|
|
There are much room to improve for metadata store. I will open a series of PRs in next few weeks. Regarding the cache, I think it should be okay because the cache refresh interval is 5 minutes, which is long enough. Actually I don't think the cache here makes sense. The metadata store listener is able to update the cache. What I can think of is that the cache can prevent outdated metadata in case the listener didn't work correctly. But from such perspective, 5 minutes would be too long. |
|
BTW, I just ran a round of test with the new threading model (as well as a few improvements to move the compute sensitive tasks out of the metadata store thread). Before this change, the tasks executed by |
|
Makes sense. If multiple operators (flush/serialization/deserialization) depend on the same thread, that thread becomes a bottleneck. Using separate threads here looks good to me. |
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: