ceph-disk: support creating block.db and block.wal with customized size for bluestore #10135

Merged
merged 2 commits into from Sep 22, 2016

Projects

None yet

4 participants

@david-z
Member
david-z commented Jul 5, 2016 edited

Within this PR, ceph-disk can create block.db and block.wal with customized size for bluestore now.

For example:

ceph.conf has:

[global]
bluestore fsck on mount = true
bluestore block db size = 67108864
bluestore block wal size = 134217728
bluestore block size = 5368709120
osd objectstore = bluestore
sudo ceph-disk prepare --bluestore /dev/sdb

This will create 4 partitions on sdb as follows, then 'block.db' will symlink to partition 2, 'block.wal' will symlink to partition 3, 'block' will symlink to partition 4.

sudo sgdisk -p /dev/sdb
...
Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048          206847   100.0 MiB   FFFF  ceph data
   2          206848          337919   64.0 MiB    FFFF  ceph block.db
   3          337920          600063   128.0 MiB   FFFF  ceph block.wal
   4          600064        11085823   5.0 GiB     FFFF  ceph block
sudo ceph-disk prepare --bluestore /dev/sdb --block.db /dev/sdc --block.wal /dev/sdc

This will create 2 partitions on sdb and 2 partitions on sdc, then 'block.db' will symlink to partition 1 of sdc, 'block.wal' will symlink to partition 2 of sdc, 'block' will symlink to partition 2 of sdb.

sudo ceph-disk prepare --bluestore /dev/sdb --block.db /dev/sdc1 --block.wal /dev/sdd1

This will create 2 partitions on sdb if sdc1 and sdd1 was existed before, then 'block.db' will symlink to sdc1, 'block.wal' will symlink to partition 1 of sdd1, 'block' will symlink to partition 2 of sdb.

Signed-off-by: Zhi Zhang zhangz.david@outlook.com

@david-z
Member
david-z commented Jul 5, 2016

@liewegas @dachary Could your guys pls take a review? Thanks.

@dachary dachary self-assigned this Jul 5, 2016
@dachary dachary and 1 other commented on an outdated diff Jul 5, 2016
src/ceph-disk/ceph_disk/main.py
@@ -1498,7 +1532,23 @@ def create_partition(self, uuid, name, size=0, num=0):
if num == 0:
num = get_free_partition_index(dev=self.path)
if size > 0:
- new = '--new={num}:0:+{size}M'.format(num=num, size=size)
@dachary
dachary Jul 5, 2016 Member

could you make this valuable change into a separate commit ?

@david-z
david-z Jul 5, 2016 Member

of course, should I make it into a separate commit but in this PR or into a new PR?

@dachary
dachary Jul 5, 2016 Member

as you like

@dachary
Member
dachary commented Jul 5, 2016

This looks good and simple. One minor detail: wal/db should not be positional arguments, otherwise it will be both difficult to document and confusing when related to non-bluestore argument parsing.

With that in mind, you should also update qa/workunits/ceph-disk/ceph-disk-test.py to verify that works as expected.

@david-z
Member
david-z commented Jul 5, 2016

This looks good and simple. One minor detail: wal/db should not be positional arguments, otherwise it will be both difficult to document and confusing when related to non-bluestore argument parsing.

Yes, agreed. Will change them to be optional args.

With that in mind, you should also update qa/workunits/ceph-disk/ceph-disk-test.py to verify that works as expected.

Will do, thanks.

@david-z
Member
david-z commented Jul 6, 2016

@dachary Pls review the latest changes according to previous discussion. I will update ceph-disk-test.py in another PR soon.

@dachary
Member
dachary commented Jul 6, 2016

@david-z please include the update of ceph-disk-test.py in this PR so that the tests related to these commits are found in the same pull request.

@dachary dachary and 1 other commented on an outdated diff Jul 6, 2016
src/ceph-disk/ceph_disk/main.py
@@ -56,6 +56,16 @@
'ready': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
'tobe': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
},
+ 'block.db': {
+ # identical because creating a block is atomic
+ 'ready': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
+ 'tobe': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
@dachary
dachary Jul 6, 2016 Member

the ready/toby uuid must be unique, you cannot re-use the uuid of block

@david-z
david-z Jul 7, 2016 Member

Updated

@tchaikov tchaikov and 1 other commented on an outdated diff Jul 7, 2016
src/ceph-disk/ceph_disk/main.py
@@ -58,13 +58,13 @@
},
'block.db': {
# identical because creating a block is atomic
- 'ready': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
- 'tobe': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
+ 'ready': '30cd0809-c2b2-499c-8879-2d6b78529876',
@tchaikov
tchaikov Jul 7, 2016 Contributor

better off squashing this commit into 5cba75a.

@david-z
david-z Jul 7, 2016 Member

Ok, will do.

@david-z
david-z Jul 7, 2016 Member

Updated

@david-z
Member
david-z commented Jul 8, 2016 edited

@dachary I am just trying one test case "test_deactivate_reactivate_osd" in ceph-disk-test.py using original ceph-disk from 10.2.0 and testing filestore. It always fails at c.check_osd_status(osd_uuid, 'journal').

================================================================================= FAILURES =================================================================================
_______________________________________________________________ TestCephDisk.test_deactivate_reactivate_osd ________________________________________________________________

self = <ceph-disk-test.TestCephDisk object at 0x1f5d8d0>

    def test_deactivate_reactivate_osd(self):
        c = CephDisk()
        disk = c.unused_disks()[0]
        osd_uuid = str(uuid.uuid1())
        c.sh("ceph-disk --verbose zap " + disk)
        c.sh("ceph-disk --verbose prepare --osd-uuid " + osd_uuid +
             " " + disk)
        c.wait_for_osd_up(osd_uuid)
        device = json.loads(c.sh("ceph-disk list --format json " + disk))[0]
        assert len(device['partitions']) == 2
>       c.check_osd_status(osd_uuid, 'journal')

ceph-disk-test.py:259: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ceph-disk-test.CephDisk instance at 0x1f63050>, uuid = '5eecfede-44be-11e6-a315-e8bdd1f96413', space_name = 'journal'

    def check_osd_status(self, uuid, space_name=None):
        data_partition = self.get_osd_partition(uuid)
        assert data_partition['type'] == 'data'
        assert data_partition['state'] == 'active'
        if space_name is not None:
>           space_partition = self.get_space_partition(space_name, uuid)
ceph-disk-test.py:217: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ceph-disk-test.CephDisk instance at 0x1f63050>, name = 'journal', uuid = '5eecfede-44be-11e6-a315-e8bdd1f96413'

    def get_space_partition(self, name, uuid):
        data_partition = self.get_osd_partition(uuid)
>       space_dev = data_partition[name + '_dev']
E       KeyError: 'journal_dev'

I manually checked command "ceph-disk list --format json" and looked through ceph-disk script. It seems no "_dev" suffix as a key there.

{"path": "/dev/sdb", "partitions": [{"dmcrypt": {}, "uuid": "5eecfede-44be-11e6-a315-e8bdd1f96413", "mount": "/var/lib/ceph/osd/ceph-9", "ptype": "4fbd7e29-9d25-41b8-afd0-062c0ceff05d", "is_partition": true, "cluster": "ceph", "state": "active", "fs_type": "xfs", "ceph_fsid": "2836b54a-5f47-49fc-9067-5c59b78164d6", "path": "/dev/sdb1", "type": "data", "whoami": "9", "journal_uuid": "75622827-d1c4-4571-aa65-0dfa8302270b"}, {"dmcrypt": {}, "uuid": null, "ptype": null, "is_partition": true, "path": "/dev/sdb2", "type": "other"}]}

Could you pls take a quick look? Is this failure related to my env or out-dated ceph-disk-test.py? Thanks a lot.

@dachary
Member
dachary commented Jul 8, 2016

@david_z this is because there is an error somewhere else, the code still is using the _dev suffix. You should use the ceph-disk-test.py from master though.

I suggest you start from def test_activate_bluestore(self): instead as it already setup the context you need for bluestore. If that works for you with no code modification, you're good. I assume you setup a virtual machine with three disks attached to it or that you're using a dedicated bare metal machine with three disks, right ?

@david-z
Member
david-z commented Jul 8, 2016

@dachary thanks for the reply. I am using normal physical server with 12 disks. I might find the root cause in ceph-disk to cause this failure, but let me confirm it and then get back to you for review.

@david-z
Member
david-z commented Jul 11, 2016

@dachary ceph-disk-test.py has been updated. test_activate_bluestore can work without code modification, then I add another 2 checks for this test case. And I also add another 2 new test cases for bluestore.

All the test cases have passed on my env, but since we have different env, pls help to review these test cases on your env. Thanks.

@dachary dachary and 1 other commented on an outdated diff Jul 11, 2016
src/ceph-disk/ceph_disk/main.py
@@ -74,6 +84,14 @@
'ready': 'cafecafe-9b03-4f30-b4c6-35865ceff106',
'tobe': '89c57f98-2fe5-4dc0-89c1-35865ceff2be',
},
+ 'block.db': {
+ 'ready': '166418da-c469-4022-adf4-b30afd37f176',
+ 'tobe': '7521c784-4626-4260-bc8d-ba77a0f5ff97',
+ },
+ 'block.wal': {
+ 'ready': '86a32090-3647-40b9-bbbd-38d8c573aa86',
+ 'tobe': '92dad30f-175b-4d40-a5b0-5c0a258b4d28',
@dachary
dachary Jul 11, 2016 Member

s/d28/2be/ because it's cute and convenient to quickly figure out what is tobe (2be)... :-)

@david-z
david-z Jul 11, 2016 Member

Oh, got it :)

@david-z
david-z Jul 11, 2016 Member

@dachary I see the last 3 chars of some other tobe's uuid is still not 2be, for example, [regular][journal][tobe], [regular][block][tobe], [mpath][journal][tobe], [mpath][block][tobe], [mpath][osd][tobe]. Do you want me to change them all this time?

@dachary
dachary Jul 11, 2016 Member

Unfortunately I noticed the cute 2be at the end after adding the [mpath]*[tobe] ones. And now they can't be changed otherwise that may impact partitions with an existing tobe UUID.

@david-z
david-z Jul 12, 2016 Member

OK, I only changed newly-added block.db and block.wal tobe uuid.

@dachary dachary commented on an outdated diff Jul 11, 2016
src/ceph-disk/ceph_disk/main.py
@@ -74,6 +84,14 @@
'ready': 'cafecafe-9b03-4f30-b4c6-35865ceff106',
'tobe': '89c57f98-2fe5-4dc0-89c1-35865ceff2be',
},
+ 'block.db': {
+ 'ready': '166418da-c469-4022-adf4-b30afd37f176',
+ 'tobe': '7521c784-4626-4260-bc8d-ba77a0f5ff97',
@dachary
dachary Jul 11, 2016 Member

s/d28/2be/ because it's cute and convenient to quickly figure out what is tobe (2be)... :-)

@dachary dachary commented on an outdated diff Jul 11, 2016
src/ceph-disk/ceph_disk/main.py
@@ -56,6 +56,16 @@
'ready': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
'tobe': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
},
+ 'block.db': {
+ # identical because creating a block is atomic
+ 'ready': '30cd0809-c2b2-499c-8879-2d6b78529876',
+ 'tobe': '30cd0809-c2b2-499c-8879-2d6b78529876',
+ },
+ 'block.wal': {
+ # identical because creating a block is atomic
+ 'ready': '5ce17fce-4087-4169-b7ff-056cc58473f9',
+ 'tobe': '5ce17fce-4087-4169-b7ff-056cc58473f9',
@dachary
dachary Jul 11, 2016 Member

s/d28/2be/ because it's cute and convenient to quickly figure out what is tobe (2be)... :-)

@dachary dachary commented on an outdated diff Jul 11, 2016
src/ceph-disk/ceph_disk/main.py
@@ -56,6 +56,16 @@
'ready': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
'tobe': 'cafecafe-9b03-4f30-b4c6-b4b80ceff106',
},
+ 'block.db': {
+ # identical because creating a block is atomic
+ 'ready': '30cd0809-c2b2-499c-8879-2d6b78529876',
+ 'tobe': '30cd0809-c2b2-499c-8879-2d6b78529876',
@dachary
dachary Jul 11, 2016 Member

s/d28/2be/ because it's cute and convenient to quickly figure out what is tobe (2be)... :-)

@dachary dachary and 1 other commented on an outdated diff Jul 11, 2016
src/ceph-disk/ceph_disk/main.py
+ )
+
+ if block_size is None:
+ return 64 # MB, default value
+ else:
+ return int(block_size) / 1048576 # MB
+
+ def desired_partition_number(self):
+ num = 1
+ if getattr(self.args, "block.db") == getattr(self.args, "data"):
+ num += 1
+ return num
+
+ @staticmethod
+ def parser():
+ parser = PrepareSpace.parser('block.db', positional = False)
@dachary
dachary Jul 11, 2016 Member

You should:

parser = argparse.ArgumentParser(add_help=False)

instead and that will save you the trouble of modifying PrepareSpace.parser to add the positional argument

@david-z
david-z Jul 11, 2016 Member

If I do this, I can't reuse the common part in PrepareSpace.parser, for 3 other optional arguments, --%s-uuid, --%s-file, --%s-dev. PrepareBluestoreBlock, PrepareBluestoreBlockDB, PrepareBluestoreBlockWAL are reusing this part.

What do you think?

@dachary
dachary Jul 11, 2016 Member

Oh, you're right, my bad.

@dachary dachary and 1 other commented on an outdated diff Jul 11, 2016
src/ceph-disk/ceph_disk/main.py
else:
- num = 0
+ return int(block_size) / 1048576 # MB
+
+ def desired_partition_number(self):
@dachary
dachary Jul 11, 2016 Member

Although the logic to assign partition number look good, it's a little tricky to read (and maintain). What about setting a fixed partition number for block (2) block.db (3) block.wal (4) ?

@david-z
david-z Jul 12, 2016 Member

OK, this will be easy to read.

@david-z
Member
david-z commented Jul 12, 2016

@dachary Pls review the latest changes. Thanks.

@dachary dachary and 1 other commented on an outdated diff Jul 12, 2016
src/ceph-disk/ceph_disk/main.py
else:
- num = 0
- return num
+ return int(block_size) / 1048576 # MB
+
+ def desired_partition_number(self):
+ # FIXME: use a fixed partition number for now
@dachary
dachary Jul 12, 2016 edited Member

I think it's just simpler and more predictable from a user point of view. Do you see a reason why it should be fixed later on ? Initially I had the if self.args.block == self.args.data: because it mimics what is done with journals. So the question really is: does it make sense for bluestore to have self.args.block* on a separate device ? If so and if there can be more than one such partition, the partition number for all of these need to be dynamic. If that's a legitimate use case we must figure out a way to express if the block{,.wal,.db} partition is colocated on a device where other partitions are also located, then num = 0 so that it picks the first available partition.

@david-z
david-z Jul 12, 2016 Member

Thanks for throwing out these questions and I agree with your opinion. From my view, we can't always predict user cases. One OSD's block{.wal,.db} could be on separated device, e.g, on SSD, even colocated with other OSD's block{.wal,.db}, while OSD's data and block is still on HDD. Or all the partitions of one OSD are on one SSD device. Actually we are testing both OSD partition layouts now for bluestore to compare the performance and cost.

So you are right and I also think we should set num = 0 for block{,.wal,.db} partition to let it pick first available partition. Do you agree? If yes, I will update accordingly.

Thanks.

@dachary
dachary Jul 12, 2016 Member

we should set num = 0 for block{,.wal,.db} partition to let it pick first available partition.

That sound right. How about this: for all block block.wal block.db, if it is the same as args.data then use a fixed partition number. If it's different use num = 0 so that it picks the next available partition.

@david-z
david-z Jul 12, 2016 Member

Yeah, great!

@david-z
Member
david-z commented Jul 12, 2016

@dachary Pls review the latest changes. Thanks.

@tchaikov
Contributor

retest this please.

@david-z
Member
david-z commented Jul 28, 2016 edited

@tchaikov Could you pls be more specific? I don't know what I should retest. Do I need to retest this ceph-disk against latest master branch because there are major changes on bluestore? Or current test cases I added have something wrong?

Thanks.

@tchaikov
Contributor

@david-z "retest this please" is a command for the jenkins plugin triggering a build job.

../src/ceph-disk/tests/test_prepare.py:147: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ceph_disk.main.Device object at 0x7f07bc65f890>, uuid = 'UUID'
name = 'journal', size = 200, num = 1

    def create_partition(self, uuid, name, size=0, num=0):
        ptype = self.ptype_tobe_for_name(name)
        if num == 0:
            num = get_free_partition_index(dev=self.path)
        if size > 0:
            # Obtain the sector number with the current alignment
            # correction applied as the actual partition start point.
            # This could create the partition whose size is exactly
            # equal to 'size' passed in.
            beg, err, ret = command(
                [
                    'sgdisk',
                    '-F',
                    self.path
                ],
            )
            LOG.debug("stderr " + err)
>           assert ret == 0
E           assert 2 == 0

../src/ceph-disk/ceph_disk/main.py:1549: AssertionError
----------------------------- Captured stderr call -----------------------------
ptype_tobe_for_name: name = journal
command: Running command: /sbin/sgdisk -F /dev/wholedisk
create_partition: stderr Problem opening /dev/wholedisk for reading! Error is 2.
The specified file does not exist!

could you check the failure test, and run it locally? see https://jenkins.ceph.com/job/ceph-pull-requests/9660/consoleFull#231334942c19247c4-fcb7-4c61-9a5d-7e2b9731c678.

@david-z
Member
david-z commented Jul 28, 2016 edited

@tchaikov I have checked this failure which is caused by following reason:

https://github.com/ceph/ceph/blob/master/src/ceph-disk/tests/test_prepare.py#L139 this line sets a hard-coded device path "/dev/wholedisk", but it seems not existed on the test machine, so sgdisk fails.

Sorry that my local env also doesn't have this device path and I never see such device path before. Could you pls tell me what this device path is? Is it a fake file and should it be created before running test?

Thanks.

@david-z
Member
david-z commented Aug 1, 2016

@tchaikov Pls let me know if you want me to update the test_prepare script. Thanks.

@tchaikov
Contributor
tchaikov commented Aug 1, 2016

Pls let me know if you want me to update the test_prepare script. Thanks.

i don't think you should or need to do so. /dev/wholedisk is a path for testing, and the called methods in that very test are mocked, so it does not need to exist in your system. you might want to fix your change instead and get the tests pass.

@david-z
Member
david-z commented Aug 1, 2016

@tchaikov I made some small changes about the strong check on sgdisk returning value. Now it could be back-compatible with original version before my changes. Could you pls help to trigger jenkins? Thanks.

@tchaikov
Contributor
tchaikov commented Aug 1, 2016

it's triggered by new commits in the PR automatically.

@david-z
Member
david-z commented Aug 1, 2016

OK, I see. :)

@david-z
Member
david-z commented Aug 1, 2016 edited

@tchaikov The related ceph-disk test cases are passed now. Current failure seems irrelevant with my changes. Pls help to take a look. Thanks.

7/145 Test   #3: ceph_objectstore_tool.py ................***Failed   97.43 sec
vstarting....DONE
Wait for health_ok...DONE
Created Replicated pool #1
Created Erasure coded pool #2
Creating 4 objects in replicated pool
2016-08-01 08:29:49.480157 7f1c37fd89c0 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-08-01 08:29:49.480475 7f1c37fd89c0 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-08-01 08:29:49.482585 7f1c37fd89c0 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-08-01 08:29:49.482679 7f1c37fd89c0 -1 asok(0x24ffbb0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: The UNIX domain socket path /home/jenkins-build/build/workspace/ceph-pull-requests/build/ceph_objectstore_tool_dir/out/client.admin.26089.asok is too long! The maximum length on this system is 107
106/145 Test  #13: run-tox-ceph-disk .......................***Failed  158.71 sec
flake8 create: /home/jenkins-build/build/workspace/ceph-pull-requests/src/ceph-disk/.tox/flake8
flake8 installdeps: --use-wheel, --find-links=file:///home/jenkins-build/build/workspace/ceph-pull-requests/src/ceph-disk/wheelhouse, -r/home/jenkins-build/build/workspace/ceph-pull-requests/src/ceph-disk/requirements.txt, -r/home/jenkins-build/build/workspace/ceph-pull-requests/src/ceph-disk/test-requirements.txt, ../ceph-detect-init
flake8 develop-inst: /home/jenkins-build/build/workspace/ceph-pull-requests/src/ceph-disk
flake8 installed: ceph-detect-init==1.0.1,-e git+https://github.com/ceph/ceph.git@59aa36519cfe1aa6a844e4fbe9030b9033d7f1ae#egg=ceph_disk&subdirectory=src/ceph-disk,configobj==5.0.6,configparser==3.5.0,coverage==4.2,discover==0.4.0,enum34==1.1.6,extras==1.0.0,fixtures==3.0.0,flake8==3.0.3,funcsigs==1.0.2,linecache2==1.0.0,mccabe==0.5.2,mock==2.0.0,pbr==1.10.0,pluggy==0.3.1,py==1.4.31,pycodestyle==2.0.0,pyflakes==1.2.3,pytest==2.9.2,python-mimeparse==1.5.2,python-subunit==1.2.0,six==1.10.0,testrepository==0.0.20,testtools==2.2.0,tox==2.3.1,traceback2==1.4.0,unittest2==1.1.0,virtualenv==15.0.2
flake8 runtests: PYTHONHASHSEED='3448820667'
flake8 runtests: commands[0] | flake8 --ignore=H105,H405 ceph_disk tests
...
ERROR: InvocationError: '/home/jenkins-build/build/workspace/ceph-pull-requests/src/ceph-disk/.tox/flake8/bin/flake8 --ignore=H105,H405 ceph_disk tests'
___________________________________ summary ____________________________________
ERROR:   flake8: commands failed
  py27: commands succeeded
...
The following tests FAILED:
      3 - ceph_objectstore_tool.py (Failed)
     13 - run-tox-ceph-disk (Failed)
@david-z
Member
david-z commented Aug 3, 2016 edited

ping @tchaikov @dachary pls help to take a look at these failures and let me know if there is anything I can do. Thanks! :-)

@tchaikov
Contributor
tchaikov commented Aug 3, 2016

Current failure seems irrelevant with my changes. Pls help to take a look. Thanks.

please run the tox test with and without your changes, to see if your changes contribute to the failure of run-tox-ceph-disk.

pls help to take a look at these failures and let me know if there is anything I can do. Thanks! :-)

again,

  1. run the tox tests
  2. fix the failures, in this case, they are errors reported by flake8
flake8 runtests: PYTHONHASHSEED='3448820667'
flake8 runtests: commands[0] | flake8 --ignore=H105,H405 ceph_disk tests
ceph_disk/main.py:1537:66: W291 trailing whitespace
ceph_disk/main.py:1539:68: W291 trailing whitespace
ceph_disk/main.py:1550:80: E501 line too long (80 > 79 characters)
ceph_disk/main.py:1962:32: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:1962:34: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2190:20: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2190:22: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2191:21: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2191:23: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2197:45: E261 at least two spaces before inline comment
ceph_disk/main.py:2219:20: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2219:22: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2220:21: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2220:23: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2224:22: E261 at least two spaces before inline comment
ceph_disk/main.py:2226:45: E261 at least two spaces before inline comment
ceph_disk/main.py:2237:60: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2237:62: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2254:20: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2254:22: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2255:21: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2255:23: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2259:23: E261 at least two spaces before inline comment
ceph_disk/main.py:2261:45: E261 at least two spaces before inline comment
ceph_disk/main.py:2272:61: E251 unexpected spaces around keyword / parameter equals
ceph_disk/main.py:2272:63: E251 unexpected spaces around keyword / parameter equals
ERROR: InvocationError: '/home/jenkins-build/build/workspace/ceph-pull-requests/src/ceph-disk/.tox/flake8/bin/flake8 --ignore=H105,H405 ceph_disk tests'
@david-z
Member
david-z commented Aug 3, 2016

@tchaikov Thanks for showing me the failure reason. I have fixed the code style issue and all checks have passed now.

@david-z
Member
david-z commented Aug 5, 2016

ping @tchaikov @dachary Is this PR good to merge? Thanks. :)

@dachary dachary added the needs-qa label Aug 5, 2016
@dachary
Member
dachary commented Aug 5, 2016

LGTM, needs a ceph-disk suite run to confirm

@tchaikov
Contributor

@dachary the tests failed but seems they also fail in master.

in kchai-2016-08-18_15:39:58-ceph-disk-wip-kefu-testing2---basic-mira/372693/teuthology.log

2016-08-18T22:59:27.794 INFO:tasks.workunit.client.0.mira030.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py:219: Exception
2016-08-18T22:59:27.795 INFO:tasks.workunit.client.0.mira030.stdout:========================= 23 failed in 4289.11 seconds =========================

kchai-2016-08-18_15:39:58-ceph-disk-wip-kefu-testing2---basic-mira/372693/remote/mira030/log/ceph-osd.0.log.gz

2016-08-18 21:45:59.975583 7fd60f332800  0 set uid:gid to 167:167 (ceph:ceph)
2016-08-18 21:45:59.975599 7fd60f332800  0 ceph version v11.0.0-1614-g6cca572 (6cca5729ede0e6a20796822841bbf41853e124f6), process ceph-osd, pid 5404
2016-08-18 21:45:59.975815 7fd60f332800 -1 ESC[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or directoryESC[0m

this issue is tracked at http://tracker.ceph.com/issues/17078.

@dachary
Member
dachary commented Aug 23, 2016

@tchaikov once #10825 and #10824 are merged, ceph-disk passes on master.

@dachary
Member
dachary commented Aug 25, 2016

The ceph-disk suite fails with the following. Looks like it's not too difficult to fix.

http://qa-proxy.ceph.com/teuthology/kchai-2016-08-25_02:27:35-ceph-disk-wip-kefu-testing---basic-vps/384411/teuthology.log

2016-08-25T02:53:57.955 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_deactivate_reactivate_osd PASSED
2016-08-25T02:53:57.955 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_destroy_osd_by_id PASSED
2016-08-25T02:53:57.955 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_destroy_osd_by_dev_path PASSED
2016-08-25T02:53:57.955 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_deactivate_reactivate_dmcrypt_plain PASSED
2016-08-25T02:53:57.955 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_deactivate_reactivate_dmcrypt_luks PASSED
2016-08-25T02:53:57.955 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_dmcrypt_plain_no_lockbox PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_dmcrypt_luks_no_lockbox PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_dmcrypt_luks_with_lockbox PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_lockbox PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_trigger_dmcrypt_journal_lockbox PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_trigger_dmcrypt_data_lockbox PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_trigger_dmcrypt_lockbox PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_no_journal PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_with_journal_dev_no_symlink PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_bluestore FAILED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_bluestore_seperated_block_db_wal PASSED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_bluestore_reuse_db_wal_partition FAILED
2016-08-25T02:53:57.956 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_with_journal_dev_is_symlink PASSED
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_separated_journal FAILED
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_separated_journal_dev_is_symlink FAILED
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_two_separated_journal FAILED
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_reuse_journal FAILED
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_multipath PASSED
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:=================================== FAILURES ===================================
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:_____________________ TestCephDisk.test_activate_bluestore _____________________
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:self = 
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:    def test_activate_bluestore(self):
2016-08-25T02:53:57.957 INFO:tasks.workunit.client.0.vpm043.stdout:        c = CephDisk()
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:        disk = c.unused_disks()[0]
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:        osd_uuid = str(uuid.uuid1())
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:        c.sh("ceph-disk --verbose zap " + disk)
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:        c.conf['global']['osd objectstore'] = 'bluestore'
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:        c.save_conf()
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:        c.sh("ceph-disk --verbose prepare --bluestore --osd-uuid " + osd_uuid +
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:             " " + disk)
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:        c.wait_for_osd_up(osd_uuid)
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:        device = json.loads(c.sh("ceph-disk list --format json " + disk))[0]
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:>       assert len(device['partitions']) == 2
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:E       assert 4 == 2
2016-08-25T02:53:57.958 INFO:tasks.workunit.client.0.vpm043.stdout:E        +  where 4 = len([{'block.db_dev': '/dev/vdb3', 'block.db_uuid': 'ebdc5410-d558-445c-9362-e29ff09337c6', 'block.wal_dev': '/dev/vdb4', ...ath': '/dev/vdb3', ...}, {'block.wal_for': '/dev/vdb1', 'dmcrypt': {}, 'is_partition': True, 'path': '/dev/vdb4', ...}])
@david-z
Member
david-z commented Aug 26, 2016

@dachary I have updated test cases according to the failure. Pls take a look. Thanks.

@dachary
Member
dachary commented Aug 26, 2016

@david-z could you please add this commit https://github.com/ceph/ceph/pull/10825/commits to your branch so that we can run tests more easily ? It will have to be removed before merge but we need it right now otherwise bluestore tests fail.

@david-z
Member
david-z commented Aug 26, 2016 edited

@dachary Sure, I have added this commit. Pls let me know when this PR is ready to merge, then I will remove this commit from my branch before you actually merge this PR.

@dachary
Member
dachary commented Aug 29, 2016

@david-z the fix for the fsck error was merged and the workaround is no longer necessary. Would you be so kind as to remove it and rebase against master ?

@david-z
Member
david-z commented Aug 30, 2016

@dachary Already done what you said. Pls take a further look. Thanks.

@dachary
Member
dachary commented Aug 31, 2016

pushed to gitbuilders under wip-pr-10135

@dachary
Member
dachary commented Sep 1, 2016 edited

teuthology-suite --verbose --suite-branch master --ceph wip-pr-10135 --suite ceph-disk --filter centos_7 --machine-type vps --email loic@dachary.org

@david-z
Member
david-z commented Sep 2, 2016

@dachary Is this test failure caused by below error?

2016-09-01T12:37:31.891 INFO:teuthology.orchestra.run.vpm179.stderr:rmdir: failed to remove 鈥�/home/ubuntu/cephtest鈥�: Directory not empty
2016-09-01T12:37:31.892 ERROR:teuthology.run_tasks:Manager failed: internal.base
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 139, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/internal.py", line 54, in base
    wait=False,
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 426, in wait
    proc.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 166, in wait
    label=self.label)
CommandFailedError: Command failed on vpm179 with status 1: 'find /home/ubuntu/cephtest -ls ; rmdir -- /home/ubuntu/cephtest
@david-z
Member
david-z commented Sep 12, 2016

@dachary I didn't see the last test failure had something related to this PR. Could you pls take a look and let me know if there is anything I can do? Thanks.

@dachary
Member
dachary commented Sep 12, 2016

@david-z the following fails because it does not have enough disks to work with. This means that a previous test did not zap the disks as expected.

2016-09-01T12:35:59.995 INFO:tasks.workunit.client.0.vpm179.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_two_separated_journal FAILED
2016-09-01T12:35:59.995 INFO:tasks.workunit.client.0.vpm179.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_reuse_journal PASSED
2016-09-01T12:35:59.995 INFO:tasks.workunit.client.0.vpm179.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py::TestCephDisk::test_activate_multipath PASSED
2016-09-01T12:35:59.995 INFO:tasks.workunit.client.0.vpm179.stdout:
2016-09-01T12:35:59.995 INFO:tasks.workunit.client.0.vpm179.stdout:=================================== FAILURES ===================================
2016-09-01T12:35:59.995 INFO:tasks.workunit.client.0.vpm179.stdout:_______________ TestCephDisk.test_activate_two_separated_journal _______________
2016-09-01T12:35:59.995 INFO:tasks.workunit.client.0.vpm179.stdout:
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:self = 
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:    def test_activate_two_separated_journal(self):
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:        c = CephDisk()
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:        disks = c.unused_disks()
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:        data_disk = disks[0]
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:        other_data_disk = disks[1]
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:>       journal_disk = disks[2]
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:E       IndexError: list index out of range
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py:621: IndexError
2016-09-01T12:35:59.996 INFO:tasks.workunit.client.0.vpm179.stdout:==================== 1 failed, 22 passed in 961.17 seconds =====================
@dachary
Member
dachary commented Sep 12, 2016 edited

Running the ceph-disk suite again to assert this is not a transient error:

with the same error as above

@david-z
Member
david-z commented Sep 13, 2016

the following fails because it does not have enough disks to work with. This means that a previous test did not zap the disks as expected.

@dachary Thanks for the explanation. But unfortunately same error happened again this time.

@dachary
Member
dachary commented Sep 13, 2016

@david-z since the error happens consistently, it probably means the test you added does not free all the devices it should

@david-z
Member
david-z commented Sep 14, 2016

@dachary Thanks for the advice, I updated some test cases accordingly to zap the disks after each test is finished. Could you pls help to trigger ceph-disk suite test again?

@dachary
Member
dachary commented Sep 14, 2016

pushed to gitbuilders under wip-pr-10135

@dachary
Member
dachary commented Sep 16, 2016 edited

teuthology-suite --verbose --suite-branch master --ceph wip-pr-10135 --suite ceph-disk --filter centos_7 --machine-type vps --email loic@dachary.org

@david-z
Member
david-z commented Sep 18, 2016

@dachary I noticed the last test was also failed because of this test case "test_activate_two_separated_journal".

From the log, it seems creating journal partition on /dev/vdd is failed. But I ran this test case on my env with 10.2.2 and it passed. I see this test case is ran behind "test_activate_multipath", shall we zap disks anyway in "test_activate_two_separated_journal" before creating osd?

Thanks.

2016-09-16T15:20:56.017 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:command: Running command: /usr/sbin/sgdisk -F /dev/vdd
2016-09-16T15:20:56.027 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:create_partition: stderr
2016-09-16T15:20:56.028 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:create_partition: Creating journal partition num 1 size 100 on /dev/vdd
2016-09-16T15:20:56.028 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:command_check_call: Running command: /usr/sbin/sgdisk --new=1:Creating new GPT entries.
2016-09-16T15:20:56.028 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:2048:+100M --change-name=1:ceph journal --partition-guid=1:77de82ad-78a2-426f-9971-703262f4accb --typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/vdd
2016-09-16T15:20:56.046 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:Creating new GPT entries.
2016-09-16T15:20:56.046 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:Could not create partition 1 from 0 to 204799
2016-09-16T15:20:56.046 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:Unable to set partition 1's name to 'ceph journal'!
2016-09-16T15:20:56.046 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:Could not change partition 1's type code to 45b0969e-9b03-4f30-b4c6-b4b80ceff106!
2016-09-16T15:20:56.047 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:Error encountered; not saving changes.
2016-09-16T15:20:56.147 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:Traceback (most recent call last):
2016-09-16T15:20:56.147 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/sbin/ceph-disk", line 9, in <module>
2016-09-16T15:20:56.147 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
2016-09-16T15:20:56.147 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5178, in run
2016-09-16T15:20:56.147 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:main(sys.argv[1:])
2016-09-16T15:20:56.147 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5129, in main
2016-09-16T15:20:56.148 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:args.func(args)
2016-09-16T15:20:56.148 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1830, in main
2016-09-16T15:20:56.148 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:Prepare.factory(args).prepare()
2016-09-16T15:20:56.148 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1818, in prepare
2016-09-16T15:20:56.148 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:self.prepare_locked()
2016-09-16T15:20:56.148 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1850, in prepare_locked
2016-09-16T15:20:56.148 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:self.data.prepare(self.journal)
2016-09-16T15:20:56.148 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2587, in prepare
2016-09-16T15:20:56.148 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:self.prepare_device(*to_prepare_list)
2016-09-16T15:20:56.149 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2763, in prepare_device
2016-09-16T15:20:56.158 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:to_prepare.prepare()
2016-09-16T15:20:56.158 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2026, in prepare
2016-09-16T15:20:56.159 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:self.prepare_device()
2016-09-16T15:20:56.159 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2116, in prepare_device
2016-09-16T15:20:56.159 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:num=num)
2016-09-16T15:20:56.159 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1578, in create_partition
2016-09-16T15:20:56.159 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:self.path,
2016-09-16T15:20:56.159 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 474, in command_check_call
2016-09-16T15:20:56.159 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:return subprocess.check_call(arguments)
2016-09-16T15:20:56.159 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
2016-09-16T15:20:56.160 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:raise CalledProcessError(retcode, cmd)
2016-09-16T15:20:56.160 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:subprocess.CalledProcessError: Command '['/usr/sbin/sgdisk', '--new=1:Creating new GPT entries.\n2048:+100M', '--change-name=1:ceph journal', '--partition-guid=1:77de82ad-78a2-426f-9971-703262f4accb', '--typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106', '--mbrtogpt', '--', '/dev/vdd']' returned non-zero exit status 4
2016-09-16T15:22:13.966 INFO:tasks.workunit.client.0.vpm043.stdout:=================================== FAILURES ===================================
2016-09-16T15:22:13.966 INFO:tasks.workunit.client.0.vpm043.stdout:_______________ TestCephDisk.test_activate_two_separated_journal _______________
2016-09-16T15:22:13.966 INFO:tasks.workunit.client.0.vpm043.stdout:
2016-09-16T15:22:13.966 INFO:tasks.workunit.client.0.vpm043.stdout:self = <ceph-disk-test.TestCephDisk object at 0x28068d0>
2016-09-16T15:22:13.966 INFO:tasks.workunit.client.0.vpm043.stdout:
2016-09-16T15:22:13.966 INFO:tasks.workunit.client.0.vpm043.stdout:    def test_activate_two_separated_journal(self):
2016-09-16T15:22:13.966 INFO:tasks.workunit.client.0.vpm043.stdout:        c = CephDisk()
2016-09-16T15:22:13.966 INFO:tasks.workunit.client.0.vpm043.stdout:        disks = c.unused_disks()
2016-09-16T15:22:13.966 INFO:tasks.workunit.client.0.vpm043.stdout:        data_disk = disks[0]
2016-09-16T15:22:13.967 INFO:tasks.workunit.client.0.vpm043.stdout:        other_data_disk = disks[1]
2016-09-16T15:22:13.967 INFO:tasks.workunit.client.0.vpm043.stdout:        journal_disk = disks[2]
2016-09-16T15:22:13.967 INFO:tasks.workunit.client.0.vpm043.stdout:>       osd_uuid = self.activate_separated_journal(data_disk, journal_disk)
2016-09-16T15:22:13.967 INFO:tasks.workunit.client.0.vpm043.stdout:
2016-09-16T15:22:13.967 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py:624:
2016-09-16T15:22:13.967 INFO:tasks.workunit.client.0.vpm043.stdout:_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2016-09-16T15:22:13.967 INFO:tasks.workunit.client.0.vpm043.stdout:../../../workunit.client.0/ceph-disk/ceph-disk-test.py:603: in activate_separated_journal
2016-09-16T15:22:13.967 INFO:tasks.workunit.client.0.vpm043.stdout:    " " + data_disk + " " + journal_disk)
2016-09-16T15:22:13.967 INFO:tasks.workunit.client.0.vpm043.stdout:_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:command = 'ceph-disk --verbose prepare --osd-uuid 29bcf1e8-7c21-11e6-b88d-5254002abbaa /dev/vdb /dev/vdd'
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:    @staticmethod
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:    def sh(command):
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:        LOG.debug(":sh: " + command)
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:        proc = subprocess.Popen(
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:            args=command,
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:            stdout=subprocess.PIPE,
2016-09-16T15:22:13.968 INFO:tasks.workunit.client.0.vpm043.stdout:            stderr=subprocess.STDOUT,
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:            shell=True,
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:            bufsize=1)
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:        lines = []
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:        with proc.stdout:
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:            for line in iter(proc.stdout.readline, b''):
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:                line = line.decode('utf-8')
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:                if 'dangerous and experimental' in line:
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:                    LOG.debug('SKIP dangerous and experimental')
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:                    continue
2016-09-16T15:22:13.969 INFO:tasks.workunit.client.0.vpm043.stdout:                lines.append(line)
2016-09-16T15:22:13.970 INFO:tasks.workunit.client.0.vpm043.stdout:                LOG.debug(line.strip().encode('ascii', 'ignore'))
2016-09-16T15:22:13.970 INFO:tasks.workunit.client.0.vpm043.stdout:        if proc.wait() != 0:
2016-09-16T15:22:13.970 INFO:tasks.workunit.client.0.vpm043.stdout:            raise subprocess.CalledProcessError(
2016-09-16T15:22:13.970 INFO:tasks.workunit.client.0.vpm043.stdout:                returncode=proc.returncode,
2016-09-16T15:22:13.970 INFO:tasks.workunit.client.0.vpm043.stdout:>               cmd=command
2016-09-16T15:22:13.970 INFO:tasks.workunit.client.0.vpm043.stdout:            )
2016-09-16T15:22:13.970 INFO:tasks.workunit.client.0.vpm043.stdout:E           CalledProcessError: Command 'ceph-disk --verbose prepare --osd-uuid 29bcf1e8-7c21-11e6-b88d-5254002abbaa /dev/vdb /dev/vdd' returned non-zero exit status 1
@dachary
Member
dachary commented Sep 19, 2016 edited

I'll setup an environment to try and fix this.

teuthology-suite --verbose --suite-branch wip-ceph-disk --ceph wip-pr-10135 --suite ceph-disk --filter centos_7 --machine-type vps --email loic@dachary.org

@dachary
Member
dachary commented Sep 19, 2016
2016-09-16T15:20:56.028 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:command_check_call: Running command: /usr/sbin/sgdisk --new=1:Creating new GPT entries.
2016-09-16T15:20:56.028 INFO:tasks.workunit.client.0.vpm043.stderr:DEBUG:CephDisk:2048:+100M --change-name=1:ceph journal --partition-guid=1:77de82ad-78a2-426f-9971-703262f4accb --typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/vdd

The --new=1:Creating new GPT entries. ... is bugous and consistently happens. You should be able to reproduce it by running all tests in sequence. This is because 3546193 uses too much output from sgdisk -F and should be fixed.

@dachary
Member
dachary commented Sep 19, 2016

I was able to pass all tests without 3546193. Unless there is a compelling reason to include this commit, you can remove it so we can merge this pull request. And then try to figure out how to fix the implementation of 3546193.

Zhi Zhang added some commits Jul 12, 2016
Zhi Zhang ceph-disk: support creating block.db and block.wal with customized si…
…ze for bluestore

Signed-off-by: Zhi Zhang <zhangz.david@outlook.com>
4c1cd4a
Zhi Zhang ceph-disk: update/add ceph-disk test cases for bluestore
Signed-off-by: Zhi Zhang <zhangz.david@outlook.com>
67b11b0
@david-z
Member
david-z commented Sep 20, 2016

@dachary Thanks, I have removed the commit 3546193. Pls help to take a look.

The reason I commit 3546193 is to prepare for another change on ceph-disk later. Bluestore requires 4K block alignment, but sometimes ceph-disk won't get such partition on a whole disk, so osd with bluestore can't start. What I am thinking about is to calculate the partition size with 4K block alignment according to either bluestore_block_XXX_size or default value.

Anyway, we can discuss this more further later. :)

@dachary dachary added the needs-qa label Sep 22, 2016
@dachary
Member
dachary commented Sep 22, 2016

I think it's good to merge now. As a precaution it would be good to have it included in a qa run (maybe from @liewegas ?). I don't expect any surprise but ... better be safe :-) Great work !

@dachary
Member
dachary commented Sep 22, 2016

nothing but the ceph-disk suite uses ceph-disk (ceph-qa-suite/tasks/ceph.py does ceph-osd --mkfs directly). No need to run more suites.

@dachary dachary removed the needs-qa label Sep 22, 2016
@dachary dachary merged commit eb968f8 into ceph:master Sep 22, 2016

2 checks passed

Signed-off-by all commits in this PR are signed
Details
default Build finished.
Details
@liewegas
Member
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment