-
Notifications
You must be signed in to change notification settings - Fork 3.7k
high publish latency of Pulsar on HDD #324
Description
I deployed a Pulsar cluster on 6 identical machines according to Cluster Setup. Each machine is equipped with a Intel(R) Xeon(R) CPU E5645 @ 2.40GHz, 32G RAM, 4*1T 7200rpm HDD and 1Gbit NIC. And I utilized 3 machines to run a zookeeper ensemble and brokers, 2 machines to run Bookkeepers( journal, ledger and index files are located in different disk devices) and 1 machine to run a pulsar-perf produce Command.
It's my plan to repeat the test posted in Announcement post on Yahoo Eng Blog. Thus, I used one client to publish messages to 1 topic for 1 hour with different message size and rate. My result is as attached:

It's obvious that my result is much worse than the official benchmark. By analyzing the logs I find that the most of latency is spent on writing entries to Bookkeeper. So I compared the client publish latency and bookie write latency for several batches and visualize it as follows:

Plus, the disk IO stat for bookie is also recorded and shown as follows:

From my stat we can see most of the time the bookie is under low load so I think I still could do some thing to improve the performance of Pulsar on HDD.
However, there is nearly nothing about optimizing the performance of bookkeeper on the Internet. Although I did try modifying the parameters in config file of bookkeeper, I don't see any improvement. All of my tests are implemented with default parameters in Pulsar.
So I wonder is there anybody trying to do the similar test like me? And do you know how to improve the performance of Pulsar on HDD? I would be grateful if somebody could give me practical advice.
System configuration
OS: CentOS 6.4
File System: ext4
Raid: No Raid
Pulsar version: 1.16.2