New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: hdd vs ssd defaults for osd op thread pool #15422

Merged
merged 6 commits into from Jun 7, 2017

Conversation

Projects
None yet
4 participants
@liewegas
Member

liewegas commented Jun 2, 2017

The big question here is what other things we should have separate tunables for...

os/ObjectStore: is_rotational() interface
Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas requested a review from markhpc Jun 2, 2017

bool rotational = true;
int r = _open_path();
if (r < 0)
return r;

This comment has been minimized.

@xiexingguo

xiexingguo Jun 2, 2017

Member

return true/ rotational?

@@ -1706,7 +1706,7 @@ int OSD::mkfs(CephContext *cct, ObjectStore *store, const string &dev,
goto free_store;
}
store->set_cache_shards(cct->_conf->osd_op_num_shards);
store->set_cache_shards(1); // doesn't matter for mkfs!

This comment has been minimized.

@xiexingguo

xiexingguo Jun 2, 2017

Member

I think we can just safely drop this line. The BlueStore constructor will do the same.

@xiexingguo

This comment has been minimized.

Member

xiexingguo commented Jun 2, 2017

And this fails the unittest_workqueue script.

liewegas added some commits Jun 1, 2017

os/bluestore: implement is_rotational()
Signed-off-by: Sage Weil <sage@redhat.com>
os/filestore: is_rotational()
Signed-off-by: Sage Weil <sage@redhat.com>
osd: give op threads hdd/ssd specific defaults
Signed-off-by: Sage Weil <sage@redhat.com>
osd: report overall 'rotational' status
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas

This comment has been minimized.

Member

liewegas commented Jun 2, 2017

@markhpc

This comment has been minimized.

Member

markhpc commented Jun 2, 2017

This all looks fine I think. Ideally the total number of threads across OSDs would scale based on the number of cores in the node, but we're probably not at the point where that matters yet.

Probably the most controversial HDD setting to include here for filestore is the split and merge thresholds. Users like to bump the split threshold up because it makes it take longer for write performance to degrade, but it can make the process of splitting a PG really nasty especially at the 2nd or 3rd split level.

For bluestore I think we've already got quite a few of the most important settings segmented for HDD/SSD, though arguably we might want to assign larger default cache values for SSD backed OSDs.

@gregsfortytwo

This comment has been minimized.

Member

gregsfortytwo commented Jun 5, 2017

This looks good to me structurally, but we obviously want to get actual tuning numbers for the rotational versus SSD media or it's not much good.

I think this was also prompted by https://trello.com/c/MNrMr24m/10-automatic-ssd-hdd-tuning, which definitely meant to include all the internal FileStore throttles. Do you want to do those separately?

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 5, 2017

@gregsfortytwo

This comment has been minimized.

Member

gregsfortytwo commented Jun 5, 2017

I think this originated from some comments I or Sam made? In which case, yes, all the WBThrottle values, which need to be tuned dramatically differently for HDD and SSD use cases. 5000 IOs is ~50 seconds on a hard drive, and since our default sync interval is 5 seconds or something that doesn't work out well... :)

@markhpc

This comment has been minimized.

Member

markhpc commented Jun 6, 2017

So a little over a year ago Sam, Somnath, and I did a big investigation into various filestore settings and spent a ton of time going over the nuances of tweaking various things on NVMe and Spinning disks. I don't remember which round of testing this was, but I've attached some of the intermediate results.

In the end after various rounds of testing we settled on:

    filestore_queue_low_threshhold = 0.6
    filestore_queue_high_threshhold = 0.9
    journal_throttle_high_multiple = 0
    journal_throttle_max_multiple = 0
    filestore_queue_max_delay_multiple = 0
    filestore_queue_high_delay_multiple = 0
    filestore_queue_max_ops = 100
    filestore_queue_max_bytes = 209715200
    filestore_expected_throughput_bytes = 419430400
    filestore_expected_throughput_ops = 400
    filestore_caller_concurrency = 30 

That's not even touching any of the wbthrottle settings. Before we go through and start making separate WBThrottle settings for HDD and SSD, I'd like to see some evidence that it's actually holding us back in practice. Somnath did a ton of work on this before we started working on bluestore and it was never clear to me whether any of it really mattered in the end.
journal_throttle.xlsx

@markhpc

markhpc approved these changes Jun 6, 2017

See previous comments

@gregsfortytwo

This comment has been minimized.

Member

gregsfortytwo commented Jun 6, 2017

We can merge without addressing the wbthrottle stuff, but you're looking at it backwards. The purpose of values like filestore_wbthrottle_xfs_ios_hard_limit is to prevent the journal running too far ahead of its backing store. When you're backed by a hard drive and set the limit to 5k IOs, you are essentially disabling that safety check and are liable to have syncs take so long the OSD kills itself.

We used to have that value set to a much lower number, but it was so bad for performance on SSDs it got bumped up to support them by default.

Maybe with other stuff tuned appropriately, the wbthrottle isn't a necessary tuner any more, though.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 7, 2017

I agree we should back down the hdd wbthrottle too, but we'll need to verify it doesn't slow down the write tput workload. Putting it on my list!

@liewegas liewegas merged commit 0e28e75 into ceph:master Jun 7, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@liewegas liewegas deleted the liewegas:wip-osd-rotational branch Jun 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment