-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: allow reweighting of osds by pg (isntead of bytes used) #2199
Conversation
Thanks @liewegas for the patch! It looks good except that if there are multiple pools and weight of PG across different pools are not equal (e.g. .rgw.buckets and .rgw.buckets.index) in terms of disk space consumption, will it bring deviation for those PGs which will hold massive data? |
Hi @liewegas, so glad to see reweight by pg for pools, thank you for the effort! I've discussed with @guangyy about whether we should change the weight or the "crush weight" for osd in the reweighting process. If we change the crush weight, we can keep the whole host's crush weights unchanged. We have developed a tool for this, and done some simple tests compared to changing the weight. I use 100 OSDs and create several EC pools with 2048 pgs and k=8 m=3. Our concern is how decreasing some OSDs' weights will affect the balance when new OSD is added . |
I think that adding new OSDs will perturb the balance for both methods equally. For the host weights, I'm not sure if it matters if the are explicitly required to remain equal of if we let them vary based on what the particular distribution works out to, as they just also susceptible to some variance. I think looking at the pgs per osd will be sufficient? In any case, it seems like adjusting either set of weights is effective. I still leak for the osd weights, but mainly because that is what is implemented. I also like the idea of keeping the target weights separate from the adjusted values. |
Thanks @liewegas , another scenario this patch might not cover is, if there are mixed sized disks/OSDs in the same host (e.g. 4TB, 6TB), will it bring trouble if we do re-weight by PG? |
Looks good. Thanks again @liewegas . |
BTW, will this patch be backported to firefly? |
COMMAND("osd reweight-by-pg " \ | ||
"name=oload,type=CephInt,range=100 " \ | ||
"name=pools,type=CephPoolname,n=N,req=false", \ | ||
"reweight OSDs by utilization [overload-percentage-for-consideration, default 120]", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this line be changed to "reweight by pg"?
This is just like reweight-by-utilization, but looks purely at the PG to OSD mapping, not at the number of bytes used on the target disks. This allows the reweighting to be done before any data is written into the cluster, when no data will need to migrate as a result of the reweight. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Note when OSDs are underloaded, as well. If that is the case, adjust the OSD reweight value if, if possible. (It won't always be possible since weights are capped at 1.) Note that we set the underload threshold to the average, as we want to aggressively adjust weights up (back to 1.0) whenever possible. This gets us a more efficient mapping calculation and reduces the amount of "noise" in the weights. Signed-off-by: Sage Weil <sage@redhat.com>
Allow the reweight-by-pg to look at a specific set of pools. If the list is ommitted, use PGs from all pools. This allows you to focus on a specific pool (the one that will dominate data usage). Otherwise things may not be quite right because other pools may have PGs that contain much less data. Signed-off-by: Sage Weil <sage@redhat.com>
Do not assume that all OSDs are weighted equally for reweight-by-pg. Note that reweight-by-utilization already reweights based on the size of the OSD volume; we presume that this is already reflected by the CRUSH weights. Signed-off-by: Sage Weil <sage@redhat.com>
mon: allow reweighting of osds by pg (isntead of bytes used) Reviewed-by: Guang Yang <yguang@yahoo-inc.com>
Hi sage, I found a little problem with the output of the reweight-by-pg: Currently, the output mixed the overloaed OSDs and the OSDs which are not overloaded but we can assign a higher weight. ie: if we have no OSD overloaded, but have a OSD can be assigned a higher weight |
How about #2338 ? |
No description provided.