[improve][pip] PIP-453: Improve the metadata store threading model #25173

BewareMyPower · 2026-01-21T13:25:49Z

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository:

BewareMyPower · 2026-01-22T02:22:07Z

"metadata-store-10-1" #27 [72] prio=5 os_prio=0 cpu=17707.20ms elapsed=5258.47s
"configuration-metadata-store-13-1" #31 [76] prio=5 os_prio=0 cpu=17575.64ms elapsed=5257.48s
"bookkeeper-ml-scheduler-OrderedScheduler-0-0" #54 [98] prio=5 os_prio=0 cpu=105.04ms elapsed=5255.36s
"bookkeeper-ml-scheduler-OrderedScheduler-1-0" #55 [99] prio=5 os_prio=0 cpu=138.76ms elapsed=5255.36s
"bookkeeper-ml-scheduler-OrderedScheduler-2-0" #56 [100] prio=5 os_prio=0 cpu=113.28ms elapsed=5255.36s

The metadata store thread even spends much more CPU time than the bookkeeper-ml worker threads.

lhotari · 2026-01-22T11:10:48Z

The metadata store thread even spends much more CPU time than the bookkeeper-ml worker threads.

It's not related to this PIP, but there's also a possibility to save CPU in derialization. Due to consistency reasons, the MetadataStore cache entries expire after 10 minutes. There's a background refresh in use which means that if the entry has been used before it expires, a new refresh will happen in the background between 5 to 10 minutes from the last refresh.
In many cases, there haven't been any changes since the last refresh. Therefore the deserialization step is completely unnecessary when there haven't been any changes. The previous deserialized value could be used instead of deserializing again.

Another detail related to wasted CPU. When an entry gets modified, it would get refreshed 2 times:

pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/cache/impl/MetadataCacheImpl.java

Lines 274 to 287 in 91e0bf2

    
           @Override 
        
           public CompletableFuture<Void> put(String path, T value, EnumSet<CreateOption> options) { 
        
               final byte[] bytes; 
        
               try { 
        
                   bytes = serde.serialize(path, value); 
        
               } catch (IOException e) { 
        
                   return CompletableFuture.failedFuture(e); 
        
               } 
        
               if (storeExtended != null) { 
        
                   return storeExtended.put(path, bytes, Optional.empty(), options).thenAccept(__ -> refresh(path)); 
        
               } else { 
        
                   return store.put(path, bytes, Optional.empty()).thenAccept(__ -> refresh(path)); 
        
               } 
        
           }

pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/cache/impl/MetadataCacheImpl.java

Lines 321 to 327 in 91e0bf2

    
           public void accept(Notification t) { 
        
               String path = t.getPath(); 
        
               switch (t.getType()) { 
        
               case Created: 
        
               case Modified: 
        
                   refresh(path); 
        
                   break;

BewareMyPower · 2026-01-22T12:47:48Z

There are much room to improve for metadata store. I will open a series of PRs in next few weeks. Regarding the cache, I think it should be okay because the cache refresh interval is 5 minutes, which is long enough. Actually I don't think the cache here makes sense. The metadata store listener is able to update the cache.

What I can think of is that the cache can prevent outdated metadata in case the listener didn't work correctly. But from such perspective, 5 minutes would be too long.

BewareMyPower · 2026-01-22T12:54:51Z

BTW, I just ran a round of test with the new threading model (as well as a few improvements to move the compute sensitive tasks out of the metadata store thread).

"metadata-store-serdes-OrderedExecutor-0-0" #27 [83] prio=5 os_prio=0 cpu=288.75ms
"metadata-store-serdes-OrderedExecutor-1-0" #28 [84] prio=5 os_prio=0 cpu=165.31ms
"metadata-store-serdes-OrderedExecutor-2-0" #29 [85] prio=5 os_prio=0 cpu=252.60ms
"metadata-store-batch-flusher-12-1" #30 [86] prio=5 os_prio=0 cpu=1217.19ms
"metadata-store-events-10-1" #59 [114] prio=5 os_prio=0 cpu=333.75ms
"main-EventThread" #32 [88] daemon prio=5 os_prio=0 cpu=89.03ms

Before this change, the tasks executed by batch-flusher (as well as other 3 serdes threads) would be executed by events thread.

nodece · 2026-01-23T02:07:46Z

Makes sense. If multiple operators (flush/serialization/deserialization) depend on the same thread, that thread becomes a bottleneck. Using separate threads here looks good to me.

[improve][pip] PIP-453: Improve the metadata store threading model

83bbc87

github-actions bot added PIP doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels Jan 21, 2026

Add discussion thread link

e82f028

BewareMyPower self-assigned this Jan 21, 2026

BewareMyPower added release/4.0.9 release/4.1.3 labels Jan 21, 2026

BewareMyPower requested review from Technoboy-, codelipenghui, lhotari, nodece, poorbarcode and shibd January 21, 2026 13:33

add a concern

dc6f286

add voting thread

14628d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][pip] PIP-453: Improve the metadata store threading model #25173

[improve][pip] PIP-453: Improve the metadata store threading model #25173

Uh oh!

BewareMyPower commented Jan 21, 2026

Uh oh!

BewareMyPower commented Jan 22, 2026

Uh oh!

lhotari commented Jan 22, 2026 •

edited

Loading

Uh oh!

BewareMyPower commented Jan 22, 2026

Uh oh!

BewareMyPower commented Jan 22, 2026 •

edited

Loading

Uh oh!

nodece commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[improve][pip] PIP-453: Improve the metadata store threading model #25173

Are you sure you want to change the base?

[improve][pip] PIP-453: Improve the metadata store threading model #25173

Uh oh!

Conversation

BewareMyPower commented Jan 21, 2026

Documentation

Matching PR in forked repository

Uh oh!

BewareMyPower commented Jan 22, 2026

Uh oh!

lhotari commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BewareMyPower commented Jan 22, 2026

Uh oh!

BewareMyPower commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nodece commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lhotari commented Jan 22, 2026 •

edited

Loading

BewareMyPower commented Jan 22, 2026 •

edited

Loading