New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: default cache size of 3gb #15976

Merged
merged 1 commit into from Jun 30, 2017

Conversation

Projects
None yet
5 participants
@liewegas
Copy link
Member

liewegas commented Jun 28, 2017

Signed-off-by: Sage Weil sage@redhat.com

os/bluestore: default cache size of 3gb
Signed-off-by: Sage Weil <sage@redhat.com>
@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Jun 28, 2017

hmm, it will increase each osd recommend memory size to 5GB(consider recovery and burst io)....

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jun 28, 2017

Yeah. The idea is that bluestore OSDs are newly deployed or converted OSDs, so it is a "safe" opportunity to have a new default. And that 5gb is more in line with what is deployed in the real world. Also, performance improves pretty dramatically going from 1gb -> 2 or 3gb.

@xiexingguo

This comment has been minimized.

Copy link
Member

xiexingguo commented Jun 29, 2017

I've got the same concern as @yuyuyu101.
3GB is a little bit aggressive and it would be better to keep the current conservative but safe default.
Anyway, this is configurable and user can re-config if they'd love to.

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jun 29, 2017

How much physical memory do you deploy for each OSD?

What value would you recommend?

@xiexingguo

This comment has been minimized.

Copy link
Member

xiexingguo commented Jun 29, 2017

How much physical memory do you deploy for each OSD?

That depends. We usually deploy ceph on hosts with 48GB/64GB/128GB ram, and each of them may host up to 12 OSDs.

What value would you recommend?

1GB would be the preferred option, IMO.

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Jun 29, 2017

depends on disk type. user could adopt large memory size for ssd, small memory size for hdd... so 2-3gb per hdd(4tb) should be fine, 4-5gb per ssd just ok

@markhpc

This comment has been minimized.

Copy link
Member

markhpc commented Jun 29, 2017

FWIW, I think we need to fix the memory leak Igor found and see how much memory we end up using after that. I'd be in favor of a larger cache size (3GB) for NVMe by default and a smaller cache size (1GB) for HDD.

Especially for configurations where data is on HDD and the DB partition is placed on flash (big enough to hold all onode/extent metadata), I'm not sure how much a large onode cache in bluestore actually matters. The large flash based DB partition is probably more important. FWIW it looks like an 8GB DB partition is enough to hold metadata for ~600K 4K objects with min_alloc set to 64k (default for HDD).

When everything is on NVMe, the onode cache is very important to achieve high IOPS rates. That's where I think we really want to encourage (perhaps demand) that users allocate plenty of RAM for onode cache. This may matter less if DB/WAL partitions are on even faster storage technologies (nvdimm, optane, etc).

@fiskn

This comment has been minimized.

Copy link
Contributor

fiskn commented Jun 29, 2017

Currently run between 2-4GB per OSD on current nodes with Filestore (12 disks per node). The difference in cost between 32-64GB of ram per host is not that great and in that price range, performance would be preferable.

@liewegas liewegas merged commit a8dd57d into ceph:master Jun 30, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
default Build finished.
Details
make check make check succeeded
Details

@liewegas liewegas deleted the liewegas:wip-bluestore-big-cache branch Jun 30, 2017

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jun 30, 2017

Merging this for now; we can still adjust up/down as we go forward!

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jul 6, 2017

os/bluestore: differ default cache size for hdd/ssd backends
This is a follow-up change of ceph#15976
and makes the bluestore cache capacity self-adaptive for different backends.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jul 6, 2017

os/bluestore: differ default cache size for hdd/ssd backends
This is a follow-up change of ceph#15976
and makes the bluestore cache capacity being self-adaptive for
different backends.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jul 6, 2017

os/bluestore: differ default cache size for hdd/ssd backends
This is a follow-up change of ceph#15976
and makes the bluestore cache capacity being self-adaptive for
different backends.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jul 7, 2017

os/bluestore: differ default cache size for hdd/ssd backends
This is a follow-up change of ceph#15976
and makes the bluestore cache capacity being self-adaptive for
different backends.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jul 7, 2017

os/bluestore: differ default cache size for hdd/ssd backends
This is a follow-up change of ceph#15976
and makes the bluestore cache capacity being self-adaptive for
different backends.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

dingdangzhang added a commit to dingdangzhang/ceph that referenced this pull request Jul 12, 2017

os/bluestore: differ default cache size for hdd/ssd backends
This is a follow-up change of ceph#15976
and makes the bluestore cache capacity being self-adaptive for
different backends.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment