Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon/OSDMonitor: batch OSDs nodown/noout support #15381

Merged
merged 4 commits into from Jun 14, 2017

Conversation

Projects
None yet
5 participants
@xiexingguo
Copy link
Member

xiexingguo commented May 31, 2017

This patch allow us to add batch OSDs(e.g., from a specific host which is currently in maintenance mode)
into a specific nodown/noout list, which can not be automatically marked down/out
hence can be eliminated from data migration.

This has the same effect with the global nodown/noout flag but is more fine-grained.

Signed-off-by: xie xingguo xie.xingguo@zte.com.cn

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-noout-osdlist branch from f6363cd to 4c106a2 May 31, 2017

@@ -174,6 +174,10 @@ class OSDMap {
float new_full_ratio = -1;

int8_t new_require_min_compat_client = -1;
vector<int32_t> new_noout_osds;

This comment has been minimized.

Copy link
@branch-predictor

branch-predictor May 31, 2017

Member

Use a bit vector/array for that, so each osd can be mapped to single bit and located in constant time and take less memory at the same time.

@@ -505,6 +505,10 @@ void OSDMap::Incremental::encode(bufferlist& bl, uint64_t features) const
if (target_v >= 6) {
::encode(new_require_min_compat_client, bl);
::encode(new_require_osd_release, bl);
::encode(new_noout_osds, bl);

This comment has been minimized.

Copy link
@branch-predictor

branch-predictor May 31, 2017

Member

Shouldn't that go into a separate version (and separate target_v check)?

vector<int32_t> new_noout_osds;
vector<int32_t> old_noout_osds;
vector<int32_t> new_nodown_osds;
vector<int32_t> old_nodown_osds;

This comment has been minimized.

Copy link
@liewegas

liewegas May 31, 2017

Member

I think the way to do this is actually with the osd_state bits... except that we're out of bits. I can do a quick PR to expand it to a uint32_t, though, with the switch to luminous. Then you can base this PR on top of that.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented May 31, 2017

See #15390

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-noout-osdlist branch 5 times, most recently from 23fe821 to f06d99a Jun 7, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jun 7, 2017

See #15390

Done rebasing on that. @liewegas

"osd", "rw", "cli,rest")
COMMAND("osd rm-noout-osds " \
"name=ids,type=CephString,n=N", \
"allow osd(s) <id> [<id>...] to be marked out(if they are currently marked as noout)", \

This comment has been minimized.

Copy link
@liewegas

liewegas Jun 7, 2017

Member

s/out(/out (/
same below

cmd_getval(g_ceph_context, cmdmap, "ids", idvec);
for (unsigned j = 0; j < idvec.size(); j++) {
long osd_start = -1;
long osd_end = -1;

This comment has been minimized.

Copy link
@liewegas

liewegas Jun 7, 2017

Member

nit: usually use "end" in exclusive sense, like [begin, end).. for inclusive maybe [first,last].

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jun 8, 2017

Author Member

Updated.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Jun 7, 2017

@jdurgin thoughts on this? I'm not wild about rm-noout-osds and add-noout-osds (maybe drop -osds suffix?). Also unsure about the range syntax; the caller can do that with $(seq a b).

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Jun 7, 2017

also i suspect a more useful unit of specifying osds is via the crush hierarchy (so that you mark a host,rack,etc as noout or whatever). That suggests a generic syntax for specifying osds that is used for all of the related commands (osd down, osd out, etc.).

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Jun 7, 2017

...or a 'ceph osd ls-tree host4` that spits out the list of osds so that you do the substitution in the shell...

@jdurgin

This comment has been minimized.

Copy link
Member

jdurgin commented Jun 7, 2017

I'd rather leave out the range syntax as well. I like the 'ceph osd ls-tree' idea, and just accepting a list of osd numbers here - it'd be useful for other osd-related commands too

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jun 8, 2017

I'd rather leave out the range syntax as well.

Actually, it's compatible with the old fashion(ceph osd add-noout-osds 0 1 2 still works) and is more flexible, e.g., allow you do it in a mixture of both:

ceph osd add-noout-osds 0 1~2

The benefit is that for large clusters, we do not want to transport a long list of osds, which is annoying for Calamari(or anything like that).

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jun 8, 2017

I like the 'ceph osd ls-tree' idea

Will give a try.

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-noout-osdlist branch 4 times, most recently from 9b176ed to b5231f4 Jun 8, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jun 8, 2017

changelog:

  • add-noout-osds -> add-noout
  • add "ceph osd ls-tree " cmd

Local test result:

    ~# ceph osd tree
    ID WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRIMARY-AFFINITY
    -5 3.00000 root foo
    -4 2.00000     rack foo-rack-1
    -3 1.00000         host foo-host-192-8-8-91
     1 1.00000             osd.1                                     up  1.00000          1.00000
    -6 1.00000         host foo-host-192-8-8-92
     2 1.00000             osd.2                                     up  1.00000          1.00000
    -8 1.00000     rack foo-rack-2
    -7 1.00000         host foo-host-192-8-8-93
     0 1.00000             osd.0                                     up  1.00000          1.00000
    -1       0 root default
    -2       0     host gitbuilder-ceph-rpm-centos7-amd64-basic
    
    ~# ceph osd ls-tree foo
    0
    1
    2
    
    ~# ceph osd ls-tree foo-rack-1
    1
    2
    
    ~# ceph osd ls-tree foo-rack-2
    0
    
    ~# ceph osd ls-tree foo-host-192-8-8-92
    2

@liewegas Can you look again?

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jun 8, 2017

retest this please

any = true;
}
}
}

This comment has been minimized.

Copy link
@liewegas

liewegas Jun 8, 2017

Member

I wonder if we'd be better served with new helper functions like pending_osd_state{set,clear}(int osd, unsigned bit) that hide this ugliness. I see that 'osd down' and most of the other places that touch new_state do bother to check pending_inc and can clobber a racing update. That'll simplify this quite a bit. (Can leave fixing the other would-be users as a follow-up)

// try single osd way
osd_first = parse_osd_id(idvec[j].c_str(), &ss);
osd_last = osd_first;
}

This comment has been minimized.

Copy link
@liewegas

liewegas Jun 8, 2017

Member

I'm still not wild about the range syntax here. The only time i can see thsi being useful is when you're marking the entire cluster down or noout or whatever. In practice, osd ids won't be sequential ranges on hosts, so I'm not sure it really helps anyone. And we have the new 'any' syntax for that.

It would be nice to move this into a helper, though, so that the other commands that take lists of osds can also inherit the 'any' syntax (and maybe we can also make 'all' or '*' work too?).

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Jun 8, 2017

ls-tree looks great!

Thanks @xiexingguo and sorry for all the bike-shedding!

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-noout-osdlist branch from b5231f4 to f94b608 Jun 9, 2017

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-noout-osdlist branch 4 times, most recently from b37abf2 to f8bccb8 Jun 9, 2017

xiexingguo added some commits Apr 25, 2017

mon/OSDMonitor: batch OSDs nodown/noout support
This patch allow us to add batch OSDs(e.g., from a specific host which is currently in maintenance)
into a specific nodown/noout list, which can not be automatically marked down/out
and hence can be eliminated from data migration.

This has the same effect with the global nodown/noout flag but is more fine-grained.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
mon/OSDMonitor: add "ceph osd ls-tree" command
E.g.:
~# ceph osd tree
ID WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRIMARY-AFFINITY
-5 3.00000 root foo
-4 2.00000     rack foo-rack-1
-3 1.00000         host foo-host-192-8-8-91
 1 1.00000             osd.1                                     up  1.00000          1.00000
-6 1.00000         host foo-host-192-8-8-92
 2 1.00000             osd.2                                     up  1.00000          1.00000
-8 1.00000     rack foo-rack-2
-7 1.00000         host foo-host-192-8-8-93
 0 1.00000             osd.0                                     up  1.00000          1.00000
-1       0 root default
-2       0     host gitbuilder-ceph-rpm-centos7-amd64-basic

~# ceph osd ls-tree foo
0
1
2

~# ceph osd ls-tree foo-rack-1
1
2

~# ceph osd ls-tree foo-rack-2
0

~# ceph osd ls-tree foo-host-192-8-8-92
2

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
mon/MonCommand: fix description of "osd rm" command
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-noout-osdlist branch 2 times, most recently from 54a263f to 49c9b83 Jun 12, 2017

mon/OSDMonitor: wildcard support for osd down/out/in/remove commands
E.g.:
~# ceph osd out all
marked out osd.0. marked out osd.1. marked out osd.2.

~# ceph osd in '*'
marked in osd.0. marked in osd.1. marked in osd.2.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jun 12, 2017

@liewegas Sorry for the late response(I was travelling for Ceph Day Beijing last weekend).

I think most of the above issues have been addressed now. Can you recheck?

@liewegas liewegas added the needs-qa label Jun 12, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jun 13, 2017

retest this please

@yuriw yuriw merged commit 3ace41f into ceph:master Jun 14, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details

@xiexingguo xiexingguo deleted the xiexingguo:wip-noout-osdlist branch Jun 15, 2017

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jun 16, 2017

mon/OSDMonitor: batch noup/noin osds support
This is a follow-up of ceph#15381.
This patch also simplifies the original code logic a bit.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jun 16, 2017

mon/OSDMonitor: batch noup/noin osds support
This is a follow-up change of ceph#15381.
This patch also simplifies the original code logic a bit.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jun 16, 2017

mon/OSDMonitor: batch noup/noin osds support
This is a follow-up change of ceph#15381.
This patch also simplifies the original code logic a bit.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jun 16, 2017

mon/OSDMonitor: batch noup/noin osds support
This is a follow-up change of ceph#15381.
This patch also simplifies the original code logic a bit.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jun 16, 2017

mon/OSDMonitor: batch noup/noin osds support
This is a follow-up change of ceph#15381.
This patch also simplifies the original code logic a bit.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jun 16, 2017

mon/OSDMonitor: batch noup/noin osds support
This is a follow-up change of ceph#15381.
This patch also simplifies the original code logic a bit.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jun 16, 2017

mon/OSDMonitor: batch noup/noin osds support
This is a follow-up change of ceph#15381.
This patch also simplifies the original code logic a bit.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jun 18, 2017

mon/OSDMonitor: batch noup/noin osds support
This is a follow-up change of ceph#15381.
This patch also simplifies the original code logic a bit.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

runsisi pushed a commit to runsisi/ceph that referenced this pull request Jun 19, 2017

mon/OSDMonitor: batch noup/noin osds support
This is a follow-up change of ceph#15381.
This patch also simplifies the original code logic a bit.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

@xiexingguo xiexingguo referenced this pull request Jul 4, 2017

Merged

mon, osd: misc fixes #16078

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.