high publish latency of Pulsar on HDD

I deployed a Pulsar cluster on 6 identical machines according to [Cluster Setup](https://github.com/yahoo/pulsar/blob/master/docs/ClusterSetup.md). Each machine is equipped with a Intel(R) Xeon(R) CPU E5645 @ 2.40GHz, 32G RAM, 4*1T 7200rpm HDD and 1Gbit NIC. And I utilized 3 machines to run a zookeeper ensemble and brokers, 2 machines to run Bookkeepers( journal, ledger and index files are located in different disk devices) and 1 machine to run a `pulsar-perf produce` Command. 

It's my plan to repeat the test posted in [Announcement post on Yahoo Eng Blog](https://yahooeng.tumblr.com/post/150078336821/open-sourcing-pulsar-pub-sub-messaging-at-scale). Thus, I used one client to publish messages to 1 topic for 1 hour with different message size and rate. My result is as attached:
![myperformance](https://cloud.githubusercontent.com/assets/9900913/24497296/8b473dfc-156d-11e7-9d4a-1c9e2196c08a.png)
It's obvious that my result is much worse than the official benchmark. By analyzing the logs I find that the most of latency is spent on writing entries to Bookkeeper. So I compared the client publish latency and bookie write latency for several batches and visualize it as follows:
  
![latency between client and bookie](https://cloud.githubusercontent.com/assets/9900913/24497995/c6e9e89e-156f-11e7-96e2-2efa2a5963a7.png)
Plus, the disk IO stat for bookie is also recorded and shown as follows:

![bookie io stat](https://cloud.githubusercontent.com/assets/9900913/24498167/518e0ba6-1570-11e7-862d-a8b49c5c00d6.png)
From my stat we can see most of the time the bookie is under low load so I think I still could do some thing to improve the performance of Pulsar on HDD. 

However, there is nearly nothing about optimizing the performance of bookkeeper on the Internet. Although I did try modifying the parameters in config file of bookkeeper, I don't see any improvement. All of my tests are implemented with default parameters in Pulsar. 

So I wonder is there anybody trying to do the similar test like me? And do you know how to improve the performance of Pulsar on HDD? I would be grateful if somebody could give me practical advice.

#### System configuration
**OS**: CentOS 6.4
**File System**: ext4
**Raid**: No Raid
**Pulsar version**: 1.16.2


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

high publish latency of Pulsar on HDD #324

System configuration

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

high publish latency of Pulsar on HDD #324

Description

System configuration

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions