Skip to content

high publish latency of Pulsar on HDD #324

@krumo

Description

@krumo

I deployed a Pulsar cluster on 6 identical machines according to Cluster Setup. Each machine is equipped with a Intel(R) Xeon(R) CPU E5645 @ 2.40GHz, 32G RAM, 4*1T 7200rpm HDD and 1Gbit NIC. And I utilized 3 machines to run a zookeeper ensemble and brokers, 2 machines to run Bookkeepers( journal, ledger and index files are located in different disk devices) and 1 machine to run a pulsar-perf produce Command.

It's my plan to repeat the test posted in Announcement post on Yahoo Eng Blog. Thus, I used one client to publish messages to 1 topic for 1 hour with different message size and rate. My result is as attached:
myperformance
It's obvious that my result is much worse than the official benchmark. By analyzing the logs I find that the most of latency is spent on writing entries to Bookkeeper. So I compared the client publish latency and bookie write latency for several batches and visualize it as follows:

latency between client and bookie
Plus, the disk IO stat for bookie is also recorded and shown as follows:

bookie io stat
From my stat we can see most of the time the bookie is under low load so I think I still could do some thing to improve the performance of Pulsar on HDD.

However, there is nearly nothing about optimizing the performance of bookkeeper on the Internet. Although I did try modifying the parameters in config file of bookkeeper, I don't see any improvement. All of my tests are implemented with default parameters in Pulsar.

So I wonder is there anybody trying to do the similar test like me? And do you know how to improve the performance of Pulsar on HDD? I would be grateful if somebody could give me practical advice.

System configuration

OS: CentOS 6.4
File System: ext4
Raid: No Raid
Pulsar version: 1.16.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    deprecated/questionQuestions should happened in GitHub Discussions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions