New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: replace osds with `osd destroy` and `osd new` #14074

Merged
merged 15 commits into from Jun 6, 2017

Conversation

Projects
None yet
3 participants
@jecluis
Member

jecluis commented Mar 22, 2017

for context, please see http://pad.ceph.com/p/osd-replacement

Signed-off-by: Joao Eduardo Luis <joao@suse.de>

@jecluis jecluis requested a review from liewegas Mar 22, 2017

@jecluis

This comment has been minimized.

Member

jecluis commented Mar 22, 2017

I'm aware this does not merge. And I am slightly concerned some of the cleanup code, as I haven't tested having the monitor being killed in certain places to check if the code holds without failing somewhere.

MonOpInfo() { }
virtual ~MonOpInfo() { }
};

This comment has been minimized.

@jecluis

jecluis Mar 22, 2017

Member

This is a stub. While it doesn't do anything now, I hope to figure out what it should hold to generically support other classes than OpInfoDestroyOSD for the upcoming osd purge command.

I could have gotten rid of it and instead have a void * in MonOpRequest, but think not only type safety but I'm assuming a void pointer will not play well with the scoped_ptr. Maybe I'm wrong, in which case I'd be delighted to be shown the path.

err = -ENOTSUP;
ss << "operation not supported";
goto done;
}

This comment has been minimized.

@liewegas

liewegas Mar 22, 2017

Member

could probably assert here since it's clear an internal bug

assert(op_info->reply_cb);
string luks_osd_key =
"dm-crypt/osd/" + stringify(op_info->osd_uuid) + "/luks";

This comment has been minimized.

@liewegas

liewegas Mar 22, 2017

Member

I thinkw e also want oc lean out all of daemon-private/osd.NNN/* (see MonCap.cc for the allowed prefix)

@jecluis

This comment has been minimized.

Member

jecluis commented May 3, 2017

@liewegas mind taking a look? This is roughly the final product.

I realized this morning I should probably have a barrier state while we're running osd new, to avoid competing commands from causing havoc while we're in the middle of the state machine, but that's a small change that will barely affect the rest of the code.

Missing are tests.

@jecluis jecluis changed the title from [DNM] mon: implement cross-service `osd destroy` to mon: replace osds with `osd destroy` and `osd new` May 3, 2017

@@ -705,6 +705,12 @@ COMMAND("osd create " \
"name=uuid,type=CephUUID,req=false " \
"name=id,type=CephOsdName,req=false", \
"create new osd (with optional UUID and ID)", "osd", "rw", "cli,rest")
COMMAND("osd new " \
"name=uuid,type=CephUUID,req=true " \
"name=replace_id,type=CephOsdName,req=true", \

This comment has been minimized.

@jecluis

jecluis May 3, 2017

Member

uh, this should probably be id first, uuid second.

goto reply;
}
pending_inc.new_state[id] = CEPH_OSD_DESTROYED;

This comment has been minimized.

@liewegas

liewegas May 5, 2017

Member

if the osd is already DESTROYED, this will un-destroy it (new_state is an xor, sorry).

This comment has been minimized.

@jecluis

jecluis May 5, 2017

Member

sigh right. needs a guard.

ss << "osd." << id << " does not exist";
err = -ENOENT;
goto reply;
} else if (!osdmap.get_info(id).lost_at) {

This comment has been minimized.

@liewegas

liewegas May 5, 2017

Member

I think we can replace this with a check for is_up(). i think 'osd destroy --yes-i-really-mean-it' is scary enough not to require the user to first do 'osd lost'.

This comment has been minimized.

@jecluis

jecluis May 5, 2017

Member

i don't have a strong opinion either way, so I guess just checking for is_up is fine.

@@ -118,6 +118,7 @@ struct ceph_eversion {
#define CEPH_OSD_NEARFULL (1<<5) /* osd is at or above nearfull threshold */
#define CEPH_OSD_BACKFILLFULL (1<<6) /* osd is at or above backfillfull threshold */
#define CEPH_OSD_DESTROYED (1<<7) /* osd has been destroyed */
#define CEPH_OSD_DESTROYING (1<<8) /* osd is being destroyed */

This comment has been minimized.

@liewegas

liewegas May 5, 2017

Member

I don't think we should burn a bit here. This in-between state is an implementation artifact/ugly of the current monitor, not something that should be exposed via the OSDMap. If we need to record in-progress state let's do it in a separate set of key/value pairs in the db?

This comment has been minimized.

@jecluis

jecluis May 5, 2017

Member

I suppose we could keep the state of the op as a key in the 'mon' prefix, and add it as we start the op, and remove it with the last transaction. Yeah. Let me do that.

@jecluis

This comment has been minimized.

Member

jecluis commented May 18, 2017

@liewegas pushed. I haven't run them by make check though, but they will have by the time I wake up tomorrow.

@liewegas

This comment has been minimized.

Member

liewegas commented May 18, 2017

@jecluis when you get a chance can you drop or squash out my hacky test patch?

vector<string> prefixes = { dmcrypt_prefix, daemon_prefix };
MonitorDBStore::TransactionRef t = paxos->get_pending_transaction();
for (auto &p : prefixes) {

This comment has been minimized.

@liewegas

liewegas May 18, 2017

Member

fwiw you can do a literal list here, e.g.

for (auto p : { dmcrypt_prefix, daemon_prefix }) {
   ...

This comment has been minimized.

@jecluis

jecluis May 18, 2017

Member

oh, fancy! TIL thanks :)

@@ -6832,6 +6871,9 @@ bool OSDMonitor::prepare_command_impl(MonOpRequestRef op,
return true;
}
int id = newcrush.get_item_id(name);
int ancestor = 0;
bool has_ancestor = false;

This comment has been minimized.

@liewegas

liewegas May 18, 2017

Member

fwiw this bool is redudnant; any ancestor will be a bucket with a value <0, so you can take 0 to mean 'no ancestor'.

bufferlist value;
value.append(dmcrypt_key);
if (store_exists(dmcrypt_prefix)) {

This comment has been minimized.

@liewegas

liewegas May 18, 2017

Member

will this work if there is a racing competing proposed value?

@liewegas

This looks great. I have a couple minor suggestions, but the main issue is that i think 'osd new' should work for both replacing an existing osd and creating a fresh one. I think that means switching the osd id to be a second optional arg and changing the checks a bit to match.

@@ -712,6 +712,13 @@ COMMAND("osd create " \
"name=uuid,type=CephUUID,req=false " \
"name=id,type=CephOsdName,req=false", \
"create new osd (with optional UUID and ID)", "osd", "rw", "cli,rest")
COMMAND("osd new " \
"name=replace_id,type=CephOsdName,req=true " \

This comment has been minimized.

@liewegas

liewegas May 18, 2017

Member

I think the id should be optional? so that osd new will also work to create a fresh osd? That probably means switching the argument order around

This comment has been minimized.

@jecluis

jecluis May 18, 2017

Member

if we do make osd new also create a fresh osd (which makes a lot more sense, given the command name), then we should also make uuid optional and basically deduplicate the code from osd create.

@liewegas

This comment has been minimized.

Member

liewegas commented May 18, 2017

@jecluis

This comment has been minimized.

Member

jecluis commented May 19, 2017

@liewegas osd create does not expect a uuid, and will generate a uuid in case it is not supplied. Same goes for id. The only thing is that a uuid is expected if an id is provided.

COMMAND("osd create " \
        "name=uuid,type=CephUUID,req=false " \
        "name=id,type=CephOsdName,req=false", \
        "create new osd (with optional UUID and ID)", "osd", "rw", "cli,rest")

I'll mimic osd create anyway. I think that will be the behavior users will be expecting.

@liewegas

This comment has been minimized.

Member

liewegas commented May 19, 2017

@jecluis

This comment has been minimized.

Member

jecluis commented May 30, 2017

regression on osd create. sigh i believe i've now fixed it, but waiting on workunit to confirm.
basically, i assumed the specified id on create would solely be used for idempotency, but apparently we allow gaps. And in adding create-like behavior to osd new, I kind of reworked the create enough to drop that, thus rendering not only osd new incompatible with the current osd create behavior, but also introducing a regression on create.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 2, 2017

2017-06-02T11:07:20.181 INFO:tasks.workunit.client.0.smithi199.stdout:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:96: map_enxio_to_eagain:  test 0 '!=' 0
2017-06-02T11:07:20.181 INFO:tasks.workunit.client.0.smithi199.stdout:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:100: map_enxio_to_eagain:  cat /tmp/cephtool.mab/map_enxio_to_eagain.15400
2017-06-02T11:07:20.181 INFO:tasks.workunit.client.0.smithi199.stdout:2017-06-02 11:07:20.153462 7f8a72f09700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-02T11:07:20.181 INFO:tasks.workunit.client.0.smithi199.stdout:2017-06-02 11:07:20.158795 7f8a72f09700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-02T11:07:20.181 INFO:tasks.workunit.client.0.smithi199.stdout:{
2017-06-02T11:07:20.181 INFO:tasks.workunit.client.0.smithi199.stdout:    "version": "ceph version  12.0.2-1995-g6bf7363 (6bf7363b3bddac5202fd115ab323602a20c32ff7) luminous (dev)"
2017-06-02T11:07:20.181 INFO:tasks.workunit.client.0.smithi199.stdout:}
2017-06-02T11:07:20.182 INFO:tasks.workunit.client.0.smithi199.stdout:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:101: map_enxio_to_eagain:  rm /tmp/cephtool.mab/map_enxio_to_eagain.15400
2017-06-02T11:07:20.182 INFO:tasks.workunit.client.0.smithi199.stdout:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:102: map_enxio_to_eagain:  return 0
2017-06-02T11:07:20.182 INFO:tasks.workunit.client.0.smithi199.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:80: retry_eagain:  rm /tmp/cephtool.mab/retry_eagain.15400
2017-06-02T11:07:20.182 INFO:tasks.workunit.client.0.smithi199.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:81: retry_eagain:  return 0
2017-06-02T11:07:20.182 INFO:tasks.workunit.client.0.smithi199.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1252: test_mon_osd:  ceph osd rm 0
2017-06-02T11:07:20.183 INFO:tasks.workunit.client.0.smithi199.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1252: test_mon_osd:  grep EBUSY
2017-06-02T11:07:20.303 INFO:tasks.workunit.client.0.smithi199.stdout:Error EBUSY: osd.0 is still up; must be down before removal.
2017-06-02T11:07:20.305 INFO:tasks.workunit.client.0.smithi199.stderr:///home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1254: test_mon_osd:  ceph osd ls
2017-06-02T11:07:20.355 INFO:tasks.workunit.client.0.smithi199.stderr:2017-06-02 11:07:20.359337 7f9964c4b700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-02T11:07:20.360 INFO:tasks.workunit.client.0.smithi199.stderr:2017-06-02 11:07:20.364141 7f9964c4b700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-02T11:07:20.469 INFO:tasks.workunit.client.0.smithi199.stderr://home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1254: test_mon_osd:  echo 0 1 2
2017-06-02T11:07:20.469 INFO:tasks.workunit.client.0.smithi199.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1254: test_mon_osd:  local 'old_osds=0 1 2'
2017-06-02T11:07:20.469 INFO:tasks.workunit.client.0.smithi199.stderr://home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1255: test_mon_osd:  ceph osd create
2017-06-02T11:07:20.530 INFO:tasks.workunit.client.0.smithi199.stderr:2017-06-02 11:07:20.533839 7f0fa925a700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-02T11:07:20.537 INFO:tasks.workunit.client.0.smithi199.stderr:2017-06-02 11:07:20.541474 7f0fa925a700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-02T11:07:22.813 INFO:tasks.workunit.client.0.smithi199.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1255: test_mon_osd:  id=3
2017-06-02T11:07:22.813 INFO:tasks.workunit.client.0.smithi199.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1256: test_mon_osd:  ceph osd find 3
2017-06-02T11:07:22.906 INFO:tasks.workunit.client.0.smithi199.stderr:2017-06-02 11:07:22.909738 7f15ab725700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-02T11:07:22.916 INFO:tasks.workunit.client.0.smithi199.stderr:2017-06-02 11:07:22.919893 7f15ab725700 -1 WARNING: all dangerous and experimental features are enabled.
2017-06-02T11:07:23.122 INFO:tasks.workunit.client.0.smithi199.stderr:Error ENOENT: osd.3 does not exist
2017-06-02T11:07:23.131 INFO:tasks.workunit.client.0.smithi199.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1: test_mon_osd:  rm -fr /tmp/cephtool.mab
2017-06-02T11:07:23.134 INFO:tasks.workunit:Stopping ['cephtool'] on client.0...

/a/sage-2017-06-02_08:36:31-rados-wip-sage-testing2-distro-basic-smithi/1255680

fi
# fn=$(mktemp -p $TEMP_DIR)
fn=$(mktemp -p /tmp/muh-test)

This comment has been minimized.

@tchaikov

tchaikov Jun 3, 2017

Contributor
mktemp: failed to create file via template '/tmp/muh-test/tmp.XXXXXXXXXX': No such file or directory

it does not work under some circumstances:

  • /tmp/muh-test does not exist at all
  • some other tests might want to remove /tmp/muh-test without notice.

why not use mktemp $TEMP_DIR/muh-test.XXX ?

This comment has been minimized.

@jecluis

jecluis Jun 3, 2017

Member

...
i knew i should have staged the patches in the morning instead at ~3am.

That was not supposed to go in. No wonder it was succeeding locally and failing somewhere else. Thanks @tchaikov .

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 5, 2017

rebase please

t="all"
fi
fn=$(mktemp -p $TEMP_DIR)

This comment has been minimized.

@tchaikov

tchaikov Jun 5, 2017

Contributor

this is not portable. probably we could use

fn=$(mktemp $TEMP_DIR/secret.XXX)

?

liewegas and others added some commits May 5, 2017

mon/Paxos: add plug/unplug
Simple mechanism to prevent and immediate commit (until the caller does
more stuff, presumably).

Signed-off-by: Sage Weil <sage@redhat.com>
mon/PaxosService: clean up dispatch()
No functional change, just un-nesting these ifs.

Signed-off-by: Sage Weil <sage@redhat.com>
mon/PaxosService: add force_immediate_propose mechanism
Add an easy mechanism for a prepare method to force an immediate proposal.
(Otherwise they have to ensure that a future call to should_propose
returns true with delay==0.0, which is a bit more work for them.)

Signed-off-by: Sage Weil <sage@redhat.com>
mon: add `osd destroy`
This new command will remove a given osd's auth keys, along with any
daemon-private and dm-crypt info contained in the config-key store,
without actually removing the osd from crush.

Instead of removing the osd from the osdmap, we mark it as `destroyed`,
thus preventing competing, racing commands from recreating the osd.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon/OSDMonitor: spin-off osd and crush rm functions
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon: add 'osd purge'
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon/AuthMon: spin-off some functions for later reuse
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon/OSDMonitor: rework `osd create`
We're splitting the chunk of code from `prepare_command_impl()` into two
functions: one will validate if we can perform the command, and the
other will perform the update.

This will allow us to reuse the code in `osd new`.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon/OSDMonitor: make them includes beautiful
These were annoying me.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon: add `osd new`
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
doc/man: document new mon commands
Includes brief description for `ceph osd new`, `ceph osd destroy` and
`ceph osd purge`.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon/OSDMonitor: have `create` behavior on `new`
`osd new` will eventually replace `osd create`, and so we want to keep
as close of a behavior as possible. Some concessions needed to be made
however, specifically in terms of what the command arguments mean for
one command and the other.

For instance, and we think this is possibly the most contentious
decision, while specifying the `id` is absolutely optional, once it has
been specified we will take one of the following approaches:

 1. the osd is destroyed, and as such we will recreate the replacement
    osd.

 2. the osd exists and is not destroyed, thus we will assume it is an
    attempt at an idempotent operation; if not idempotent, fail because
    we then assume the osd was meant to be destroyed but it is not.

A commit should follow this patch series documenting the new commands.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
qa/workunits: adjust expected return on cephtool
We changed the behavior of `osd create` to return EEXIST instead of
EINVAL if an uuid exists with a different `id` than the one the user
provides.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
qa/workunits: add cephtool test for new commands
Tests `ceph osd new`, `ceph osd destroy` and `ceph osd purge`.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon/OSDMonitor: make `destroy` and `purge` idempotent
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
@jecluis

This comment has been minimized.

Member

jecluis commented Jun 5, 2017

@liewegas rebased and repushed

@liewegas liewegas added the needs-qa label Jun 5, 2017

dismissing my review, as comment addressed and i have not reviewed it completely.

@liewegas liewegas merged commit daa5126 into ceph:master Jun 6, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@jecluis jecluis deleted the jecluis:wip-mon-osd-replacement branch Jun 9, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment