New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph-disk: support bluestore #7218
Conversation
@dachary Could you do me a favor of explaining me what DNM represents? Sorry for my poor english:-) |
@xiexingguo It means Do Not Merge, it is set generally when you have some work in progress code and want to receive early feedback |
@theanalyst 👍 Got it. Thank you so very much. |
|
|
@liewegas the refactor is complete (to the extent necessary for ceph-disk prepare). It took a few iterations but I like what I have now. The bluestore support is a draft and I stumbled into a problem: the fsid is assumed to be a sign that --mkfs has run. I worked that around with "bluestore: do not fail on a prepared device" but it has other undesirable side effects. How should we deal with this ? |
with a rebase on the latest master, the bluestore integration tests pass. running them one last time to be sure. Ready for your review @liewegas ;-) |
|
This is awesome! Just three comments:
|
@liewegas I'll address your comments, thanks for the quick review :-). There are zillions things I'd like to simplify now but I tried to keep to the minimum. Unless you think it's important I'd rather keep the want/need for this iteration. I'm semi-worried getting rid of it will create a few additional minor issues that will add delays. |
Yeah, sounds good!
|
@liewegas when you suggest to use type to identify the OSD, I assume it's in the context of activate where the initial proposal was to use the "osd objectstore" configure value instead. I'll change that to use the type instead, it's indeed more sensible. |
Right.
For create, we should either take an explicit arg (--type bluestore or
whatever), or use the config value, or default to filestore.
|
@liewegas I grouped the changes mainly in 0c10294 except for the s/Default/Filestore/ which was done by amending "ceph-disk: refactor prepare". It passes the ceph-disk suite as shown below.
|
One small comment on the bluestore change. Otherwise, I think this is ready for merge! |
@liewegas fixed at 56f0809 and tests are running.
|
@liewegas tests passed |
Is bluestore_block_size used by ceph-disk? I don't think it should be
mentioned as it's really for debugging/testing purposes. (Might even be
best to rename it to avoid confusion.)
Otherwise, looks good.
|
Some backends don't use it, almost all clusters use the default path, and there'll always be a symlink. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
When make all runs in the ceph-detect-init module, it does a "setup.py build" which is not used. Replace it with a python setup.py install in a virtualenv so that tests can add the virtualenv/bin to their PATH and call ceph-detect-init from sources as they would if it was installed. Part of run-tox.sh is moved to tools/setup-virtualenv.sh so that it can be re-used by ceph-disk and other python modules. Signed-off-by: Loic Dachary <loic@dachary.org>
Because ceph-disk unit tests were not run as part of make check, part of the most recent changes broke them. This is a batch fix to sanitize the situation. Since it is now run with make check, that won't happen again. Signed-off-by: Loic Dachary <loic@dachary.org>
Refactor the test / virtualenv setup in the same way it was done for ceph-detect-init. All shell tests use ceph-helpers.sh which is modified to add ceph-disk / ceph-detect-init virtualenv/bin to the PATH to ensure the source version is used even if ceph is installed. See "ceph-detect-init: make all must setup.py install" Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
To help with the prepare refactor. Signed-off-by: Loic Dachary <loic@dachary.org>
Because some variables are global in ceph-disk, tests that modify them interact with each other in non-predictable ways. This will go away eventually but requires a significant refactor. Workaround by running one py.test per test file. Signed-off-by: Loic Dachary <loic@dachary.org>
Remove unused argument. Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
The helper function no longer has journal hardcoded. Signed-off-by: Loic Dachary <loic@dachary.org>
The logic / code path is only modified to the extent necessary for the refactor. The Prepare class roughly replaces the prepare_main function but also handles the prepare subcommand argument parsing. It creates the data and journal objects and delegate the actual work to them via the prepare() method. The Prepare class assumes that preparing an OSD consists on the following phases: * optionally prepare auxiliary devices, such as the journal * prepare a data directory or device * populate the data directory with fsid etc. and optionally symbolic links to the auxiliary devices The PrepareDefault class is derived from Prepare and implements the current model where there only is one auxiliary device, the journal. The PrepareJournal class implements the *journal* functions and is based on a generic class, PrepareSpace which handles the allocation of an auxiliary device. The only journal specific feature is left to the PrepareJournal class: querying the OSD to figure out if a journal is wanted or not. The OSD data directory is prepared via the PrepareData class. It creates a file system if necessary (i.e. if a device) and populate the data directory. Further preparation is then delegated to the auxiliary devices (i.e. adding a symlink to the device for a journal). There was some code paths related dmcrypt / multipath devices in the prepare functions, although it is orthogonal. A class tree for Devices was created to isolate that. Although that was the primary reason for adding a new class tree, two other aspects have also been moved there: ptypes and partition creation. The ptypes are organized into a data structure with a few helpers in the hope it will be easier to maintain. All references to the *_UUID variables have been updated. The creation of a partition is delegated to sgdisk and a wrapper helps reduce the code redundancy. The ptype of a given partition depends on the type of the device (is it dmcrypt'ed or a multipath device ?). It is best implemented by derivation so the prepare function does not need to be concerned about how the ptype of a partition is determined. Many functions could be refactored into a Device class and its derivatives, but that was not done to minimize the size of the refactor. Device knows how to create a partition and figure out the ptype tobe DevicePartition a regular device partition DevicePartitionMultipath a partition of a multipath device DevicePartitionCrypt base class for luks/plain dmcrypt, can map/unmap DevicePartitionCryptPlain knows how to setup dmcrypt plain DevicePartitionCryptLuks knows how to setup dmcrypt plain The CryptHelpers class is introduced to factorize the code snippets that were duplicated in various places but that do not really belong because they are convenience wrappers to figure out: * if dmrypt should be used * the keysize * the dmcrypt type (plain or luks) Signed-off-by: Loic Dachary <loic@dachary.org>
Only support the block file for now. It is handled the same as the journal, only with a different name (block) and it's own set of ptypes depending on multipath or dmcrypt. Signed-off-by: Loic Dachary <loic@dachary.org>
Only support the block file for now. The refactoring consist of replacing main_activate_journal with main_activate_space and a name argument (block, journal). More work will be needed to support multiple auxiliary devices (block.wal etc). But the goal is to minimize the change because this commit is part of a series of commits focusing on refactoring prepare, not the entire ceph-disk codebase. Signed-off-by: Loic Dachary <loic@dachary.org>
Copy paste the journal code and s/journal/block/ More work will be needed to support multiple auxiliary devices (block.wal etc). But the goal is to minimize the change because this commit is part of a series of commits focusing on refactoring prepare, not the entire ceph-disk codebase. Signed-off-by: Loic Dachary <loic@dachary.org>
The objectstore journal and the bluestore block auxiliary device are handled in the same way. Each occurrence of journal in the code is replaced with a variable. A few helpers are added to the Ptype class to factorize the most common lookups but the code logic is unmodified with one exception: the more_osd_info previously added a journal_uuid entry regarless. If there was no journal_uuid file, it would be None. It is changed to only add the {block,journal}_uuid entry if the corresponding file exist. Signed-off-by: Loic Dachary <loic@dachary.org>
It is straightforward because it entirely relies on information collected by ceph-disk list which has full support for bluestore. It loops on all possible auxiliary devices (as found in Spaces.NAMES) and does the associated deactivate / destruction which is merely about handling dmcrypt map / unmap. Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
http://tracker.ceph.com/issues/13422 made it so ceph-osd won't start unless the pidfile can be created successfully. The default location being the current directory, ceph-osd must explicitly be told to write in a directory where it has write permissions. Signed-off-by: Loic Dachary <loic@dachary.org>
The type file in the OSD bluestore data exists and contains the bluestore string. ceph-disk activate should use it instead of the "osd objectstore" configuration value. It is better in case the configuration file changes between prepare and activate. The fsid file cannot be used by bluestore to signify that ceph-osd --mkfs has completed successfully because it is pre-populated by ceph-disk. Introduce the mkfs_done file, dedicated to this, instead of overloading an existing file. Signed-off-by: Sage Weil <sage@redhat.com> Signed-off-by: Loic Dachary <loic@dachary.org>
Proposed merge message: ceph-disk: add support for bluestore When preparing a device or a directory, the flag --bluestore can be set to select the bluestore experimental object store. It is recommended to also set the following configuration variables: osd objectstore=bluestore, enable experimental unrecoverable data corrupting features=* . The bluestore partition is set with a type specific to bluestore and new udev rules were added to run ceph-disk activate at boot time. By default two partitions are created for a bluestore device, in the same way a filestore is split between a data and a journal partition. The bluestore data partition is small and contains administrative information. The bluestore block partition uses the rest of the disk and contains the OSD data. Multipath devices, dmcrypt (plain or luks) are also supported for bluestore. |
|
👍
|
@liewegas ceph-disk passes but I think it deserves a rados suite to be sure. |
k, i'll get it in the next round. Thanks! |
ceph-disk: support bluestore Reviewed-by: Sage Weil <sage@redhat.com>
No description provided.