Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: allow reweighting of osds by pg (isntead of bytes used) #2199

Merged
merged 5 commits into from
Aug 19, 2014

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Aug 4, 2014

No description provided.

@guangyy
Copy link
Contributor

guangyy commented Aug 6, 2014

Thanks @liewegas for the patch!

It looks good except that if there are multiple pools and weight of PG across different pools are not equal (e.g. .rgw.buckets and .rgw.buckets.index) in terms of disk space consumption, will it bring deviation for those PGs which will hold massive data?

@boydc2014
Copy link

Hi @liewegas, so glad to see reweight by pg for pools, thank you for the effort!

I've discussed with @guangyy about whether we should change the weight or the "crush weight" for osd in the reweighting process.

If we change the crush weight, we can keep the whole host's crush weights unchanged. We have developed a tool for this, and done some simple tests compared to changing the weight.

I use 100 OSDs and create several EC pools with 2048 pgs and k=8 m=3.
The result shows both our method (which change crush weight and keep the total unchanged) and your method can keep the highest load < 1.05 * average load after several rounds(5~7).

Our concern is how decreasing some OSDs' weights will affect the balance when new OSD is added .

@liewegas
Copy link
Member Author

I think that adding new OSDs will perturb the balance for both methods equally. For the host weights, I'm not sure if it matters if the are explicitly required to remain equal of if we let them vary based on what the particular distribution works out to, as they just also susceptible to some variance. I think looking at the pgs per osd will be sufficient?

In any case, it seems like adjusting either set of weights is effective. I still leak for the osd weights, but mainly because that is what is implemented. I also like the idea of keeping the target weights separate from the adjusted values.

@guangyy
Copy link
Contributor

guangyy commented Aug 12, 2014

Thanks @liewegas , another scenario this patch might not cover is, if there are mixed sized disks/OSDs in the same host (e.g. 4TB, 6TB), will it bring trouble if we do re-weight by PG?

@guangyy
Copy link
Contributor

guangyy commented Aug 13, 2014

Looks good. Thanks again @liewegas .

@guangyy
Copy link
Contributor

guangyy commented Aug 13, 2014

BTW, will this patch be backported to firefly?

COMMAND("osd reweight-by-pg " \
"name=oload,type=CephInt,range=100 " \
"name=pools,type=CephPoolname,n=N,req=false", \
"reweight OSDs by utilization [overload-percentage-for-consideration, default 120]", \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this line be changed to "reweight by pg"?

This is just like reweight-by-utilization, but looks purely at the PG to
OSD mapping, not at the number of bytes used on the target disks.  This
allows the reweighting to be done before any data is written into the
cluster, when no data will need to migrate as a result of the reweight.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Note when OSDs are underloaded, as well.  If that is the case, adjust the
OSD reweight value if, if possible.  (It won't always be possible since
weights are capped at 1.)

Note that we set the underload threshold to the average, as we want to
aggressively adjust weights up (back to 1.0) whenever possible.  This gets
us a more efficient mapping calculation and reduces the amount of "noise"
in the weights.

Signed-off-by: Sage Weil <sage@redhat.com>
Allow the reweight-by-pg to look at a specific set of pools.  If the list
is ommitted, use PGs from all pools.  This allows you to focus on a
specific pool (the one that will dominate data usage).  Otherwise things
may not be quite right because other pools may have PGs that contain
much less data.

Signed-off-by: Sage Weil <sage@redhat.com>
Do not assume that all OSDs are weighted equally for reweight-by-pg.

Note that reweight-by-utilization already reweights based on the size of
the OSD volume; we presume that this is already reflected by the CRUSH
weights.

Signed-off-by: Sage Weil <sage@redhat.com>
liewegas added a commit that referenced this pull request Aug 19, 2014
mon: allow reweighting of osds by pg (isntead of bytes used)

Reviewed-by: Guang Yang <yguang@yahoo-inc.com>
@liewegas liewegas merged commit c36b72c into master Aug 19, 2014
@liewegas liewegas deleted the wip-reweight branch August 19, 2014 17:40
@boydc2014
Copy link

Hi sage, I found a little problem with the output of the reweight-by-pg:

Currently, the output mixed the overloaed OSDs and the OSDs which are not overloaded but we can assign a higher weight.

ie: if we have no OSD overloaded, but have a OSD can be assigned a higher weight
the output may be:
average_util: XXX overloaded_util: XXX overloaded osds: osdid(which is not overloaded) [ X -> X]
This seems like this osd is overloaded.

@liewegas
Copy link
Member Author

How about #2338 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants