Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nautilus: mgr/fs/volumes misc fixes #36167

Merged
merged 7 commits into from Jul 27, 2020

Conversation

ajarr
Copy link
Contributor

@ajarr ajarr commented Jul 18, 2020

@ajarr ajarr added the cephfs Ceph File System label Jul 18, 2020
@ajarr ajarr requested review from kotreshhr and batrick July 18, 2020 13:32
@smithfarm
Copy link
Contributor

Python Unit-test Failure

FAIL: test_pool_create (tasks.mgr.dashboard.test_pool.PoolTest)

2020-07-18 13:27:50,157.157 INFO:__main__:----------------------------------------------------------------------
2020-07-18 13:27:50,157.157 INFO:__main__:Traceback (most recent call last):
2020-07-18 13:27:50,157.157 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/test_pool.py", line 267, in test_pool_create
2020-07-18 13:27:50,157.157 INFO:__main__:    self.assertIn(conf, new_pool['configuration'])
2020-07-18 13:27:50,157.157 INFO:__main__:AssertionError: {'name': 'rbd_qos_bps_limit', 'source': 1, 'value': '2048'} not found in []
2020-07-18 13:27:50,157.157 INFO:__main__:
2020-07-18 13:27:50,157.157 INFO:__main__:----------------------------------------------------------------------
2020-07-18 13:27:50,157.157 INFO:__main__:Ran 82 tests in 1001.796s
2020-07-18 13:27:50,158.158 INFO:__main__:
2020-07-18 13:27:50,158.158 INFO:__main__:

@smithfarm
Copy link
Contributor

jenkins test dashboard backend

@smithfarm smithfarm added this to the nautilus milestone Jul 18, 2020
@smithfarm smithfarm added nautilus-batch-1 nautilus point releases needs-doc needs-qa and removed needs-doc labels Jul 18, 2020
@smithfarm
Copy link
Contributor

@ajarr There are a couple more that you might consider including here:

@smithfarm
Copy link
Contributor

smithfarm commented Jul 18, 2020

FAIL: test_selftest_cluster_log (tasks.mgr.test_module_selftest.TestModuleSelftest)

2020-07-18 16:45:02,758.758 INFO:__main__:----------------------------------------------------------------------
2020-07-18 16:45:02,759.759 INFO:__main__:Traceback (most recent call last):
2020-07-18 16:45:02,759.759 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/test_module_selftest.py", line 320, in test_selftest_cluster_log
2020-07-18 16:45:02,759.759 INFO:__main__:    priority, message)
2020-07-18 16:45:02,759.759 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 97, in __exit__
2020-07-18 16:45:02,760.760 INFO:__main__:    raise AssertionError("Expected log message not found: '{0}'".format(expected_pattern))
2020-07-18 16:45:02,760.760 INFO:__main__:AssertionError: Expected log message not found: '[WRN] foo bar warning'
2020-07-18 16:45:02,760.760 INFO:__main__:
2020-07-18 16:45:02,760.760 INFO:__main__:----------------------------------------------------------------------
2020-07-18 16:45:02,761.761 INFO:__main__:Ran 197 tests in 3474.817s
2020-07-18 16:45:02,761.761 INFO:__main__:
2020-07-18 16:45:02,761.761 INFO:__main__:

@ajarr
Copy link
Contributor Author

ajarr commented Jul 18, 2020

@ajarr There are a couple more that you might consider including here:

Thanks, @smithfarm. The fixes for the following two issues are in different parts of the code base, so I preferred to handle them as separate PRs.

Even though the conflicts are minor I've asked the PR author to backport it.

The PR is here #36180

@smithfarm
Copy link
Contributor

jenkins test dashboard backend

@ajarr
Copy link
Contributor Author

ajarr commented Jul 20, 2020

@kotreshhr can you quickly take a look at this PR?

Copy link
Contributor

@kotreshhr kotreshhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ajarr
Copy link
Contributor Author

ajarr commented Jul 20, 2020

@@ -98,9 +98,24 @@ def delete_fs_volume(self, volname, confirm):
"that is what you want, re-issue the command followed by " \
"--yes-i-really-mean-it.".format(volname)

ret, out, err = self.mgr.check_mon_command({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kotreshhr tests are failing now with,

AttributeError: 'Module' object has no attribute 'check_mon_command'

check_mon_command is not in nautilus. Is it OK just to use mon_command instead?

Copy link
Contributor

@kotreshhr kotreshhr Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mon_command can be used. The only difference between them is that 'check_mon_command' raises if ret !=0 . So we can add that validation here in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed that'd raise an unhandled exception. If ret !=0 I just return the output of mon_command().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@kotreshhr kotreshhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ajarr
Copy link
Contributor Author

ajarr commented Jul 23, 2020

I now see that all the jobs in the above run fail at test_volume_create() when trying to remove the volume created by the test.

I see this in the mgr log, /a/rraja-2020-07-22_15:24:27-fs-ajarr-nautilus-testing-2020-07-22-distro-basic-smithi/5249500/remote/smithi169/log/ceph-mgr.x.log.gz

2020-07-22 18:55:00.182 7f9ea47dd700 10 mgr.server _allowed_command  client.admin capable
2020-07-22 18:55:00.182 7f9ea47dd700  0 log_channel(audit) log [DBG] : from='client.18473 -' entity='client.admin' cmd=[{"yes-i-really-mean-it": "--yes-i-really-mean-it", "prefix": "fs volume rm", "vol_name": "volume_906", "target": ["mgr", ""]}]: dispatch
2020-07-22 18:55:00.182 7f9ea47dd700 10 mgr.server _handle_command passing through 4
2020-07-22 18:55:00.182 7f9ea4fde700 20 mgr Gil Switched to new thread state 0x56015b9b10d0
2020-07-22 18:55:00.182 7f9ea4fde700  1 -- 172.21.15.169:0/12022 --> [v2:172.21.15.169:3300/0,v1:172.21.15.169:6789/0] -- mon_command({"prefix": "config get", "who": "mon", "key": "mon_allow_pool_delete", "format": "json"} v 0) v1 -- 0x56015fd65400 con 0x560158f68d80
2020-07-22 18:55:00.182 7f9eb56c5700  1 -- 172.21.15.169:0/12022 <== mon.1 v2:172.21.15.169:3300/0 11406 ==== mon_command_ack([{"prefix": "config get", "who": "mon", "key": "mon_allow_pool_delete", "format": "json"}]=-22 unrecognized entity 'mon' v146) v1 ==== 147+0+0 (crc 0 0 0) 0x5601589bf180 con 0x560158f68d80
2020-07-22 18:55:00.182 7f9ea2fda700  1 -- 172.21.15.169:0/12022 --> [v2:172.21.15.169:3300/0,v1:172.21.15.169:6789/0] -- mon_get_version(what=osdmap handle=4) v1 -- 0x560159c9a000 con 0x560158f68d80
2020-07-22 18:55:00.182 7f9eb56c5700  1 -- 172.21.15.169:0/12022 <== mon.1 v2:172.21.15.169:3300/0 11407 ==== mon_get_version_reply(handle=4 version=711) v2 ==== 24+0+0 (crc 0 0 0) 0x5601581f44c0 con 0x560158f68d80
2020-07-22 18:55:00.182 7f9eb3ec2700 10 MonCommandCompletion::finish()
2020-07-22 18:55:00.182 7f9eb3ec2700 20 mgr Gil Switched to new thread state 0x56015b3bc580
2020-07-22 18:55:00.182 7f9eb3ec2700 20 mgr ~Gil Destroying new thread state 0x56015b3bc580
2020-07-22 18:55:00.182 7f9eb3ec2700 10 mgr notify_all notify_all: notify_all command
2020-07-22 18:55:00.182 7f9ea4fde700 20 mgr[volumes] mon_command: 'config get' -> -22 in 0.001s
2020-07-22 18:55:00.182 7f9ea4fde700 20 mgr ~Gil Destroying new thread state 0x56015b9b10d0
2020-07-22 18:55:00.182 7f9ea4fde700 -1 mgr.server reply reply (22) Invalid argument unrecognized entity 'mon'
2020-07-22 18:55:00.182 7f9ea4fde700  1 -- [v2:172.21.15.169:6800/12022,v1:172.21.15.169:6801/12022] --> 172.21.15.161:0/2540246774 -- command_reply(tid 0: -22 unrecognized entity 'mon') v1 -- 0x56015b90a960 con 0x56016061c480

And in the mon log, /a/rraja-2020-07-22_15:24:27-fs-ajarr-nautilus-testing-2020-07-22-distro-basic-smithi/5249500/remote/smithi169/log/ceph-mon.b.log.gz

2020-07-22 18:55:00.182 7fd19c149700  1 -- [v2:172.21.15.169:3300/0,v1:172.21.15.169:6789/0] <== mgr.4113 172.21.15.169:0/12022 7904 ==== mon_command({"prefix": "config get", "who": "mon", "key": "mon_allow_pool_delete", "format": "json"} v 0) v1 ==== 130+0+0 (crc 0 0 0) 0x55d4042e2800 con 0x55d401db0d00
2020-07-22 18:55:00.182 7fd19c149700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x55d40234e6c0 for mgr.4113
2020-07-22 18:55:00.182 7fd19c149700 20 mon.b@1(peon) e1  caps allow profile mgr
2020-07-22 18:55:00.182 7fd19c149700  0 mon.b@1(peon) e1 handle_command mon_command({"prefix": "config get", "who": "mon", "key": "mon_allow_pool_delete", "format": "json"} v 0) v1
2020-07-22 18:55:00.182 7fd19c149700 20 is_capable service=config command=config get read addr 172.21.15.169:0/12022 on cap allow profile mgr
2020-07-22 18:55:00.182 7fd19c149700 20  allow so far , doing grant allow profile mgr
2020-07-22 18:55:00.182 7fd19c149700 20  match
2020-07-22 18:55:00.182 7fd19c149700 10 mon.b@1(peon) e1 _allowed_command capable
2020-07-22 18:55:00.182 7fd19c149700  0 log_channel(audit) log [DBG] : from='mgr.4113 172.21.15.169:0/12022' entity='mgr.x' cmd=[{"prefix": "config get", "who": "mon", "key": "mon_allow_pool_delete", "format": "json"}]: dispatch
2020-07-22 18:55:00.182 7fd19c149700  1 -- [v2:172.21.15.169:3300/0,v1:172.21.15.169:6789/0] --> [v2:172.21.15.169:3300/0,v1:172.21.15.169:6789/0] -- log(1 entries from seq 771 at 2020-07-22 18:55:00.187022) v1 -- 0x55d4025efb00 con 0x55d40101a400
2020-07-22 18:55:00.182 7fd19c149700 10 mon.b@1(peon).paxosservice(config 1..146) dispatch 0x55d4042e2800 mon_command({"prefix": "config get", "who": "mon", "key": "mon_allow_pool_delete", "format": "json"} v 0) v1 from mgr.4113 172.21.15.169:0/12022 con 0x55d401db0d00
2020-07-22 18:55:00.182 7fd19c149700  5 mon.b@1(peon).paxos(paxos active c 6025..6773) is_readable = 1 - now=2020-07-22 18:55:00.187101 lease_expire=2020-07-22 18:55:04.535168 has v0 lc 6773
2020-07-22 18:55:00.182 7fd19c149700  2 mon.b@1(peon) e1 send_reply 0x55d4040e5570 0x55d4059e6240 mon_command_ack([{"prefix": "config get", "who": "mon", "key": "mon_allow_pool_delete", "format": "json"}]=-22 unrecognized entity 'mon' v146) v1
2020-07-22 18:55:00.182 7fd19c149700  1 -- [v2:172.21.15.169:3300/0,v1:172.21.15.169:6789/0] --> 172.21.15.169:0/12022 -- mon_command_ack([{"prefix": "config get", "who": "mon", "key": "mon_allow_pool_delete", "format": "json"}]=-22 unrecognized entity 'mon' v146) v1 -- 0x55d4059e6240 con 0x55d401db0d00

@@ -98,12 +98,14 @@ def delete_fs_volume(self, volname, confirm):
"that is what you want, re-issue the command followed by " \
"--yes-i-really-mean-it.".format(volname)

ret, out, err = self.mgr.check_mon_command({
ret, out, err = self.mgr.mon_command({
Copy link
Contributor Author

@ajarr ajarr Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the test failure looks like the error could be due to this line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it locally. The config command is not taking 'mon', it is expecting specific mon, like mon.0.
It works with 'mon.*' though, so please change it and it should work.

@ajarr ajarr force-pushed the nautilus-mgr-fs-volumes-misc branch from 0174573 to 6a7efb1 Compare July 24, 2020 09:21
@ajarr
Copy link
Contributor Author

ajarr commented Jul 24, 2020

@ajarr
Copy link
Contributor Author

ajarr commented Jul 24, 2020

I see this error now here,
http://pulpito.front.sepia.ceph.com/rraja-2020-07-24_11:33:52-fs-ajarr-nautilus-testing-2020-07-24-distro-basic-smithi/5253495/teuthology.log

020-07-24T14:10:24.831 INFO:tasks.cephfs_test_runner:======================================================================
2020-07-24T14:10:24.832 INFO:tasks.cephfs_test_runner:ERROR: test_volume_rm_arbitrary_pool_removal (tasks.cephfs.test_volumes.TestVolumes)
2020-07-24T14:10:24.832 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-07-24T14:10:24.832 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-07-24T14:10:24.833 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_ajarr-nautilus-testing-2020-07-24/qa/tasks/cephfs/test_volumes.py", line 325, in test_volume_rm_arbitrary_pool_removal
2020-07-24T14:10:24.833 INFO:tasks.cephfs_test_runner:    vol_status = json.loads(self._fs_cmd("status", self.volname, "--format=json-pretty"))
2020-07-24T14:10:24.834 INFO:tasks.cephfs_test_runner:  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
2020-07-24T14:10:24.834 INFO:tasks.cephfs_test_runner:    return _default_decoder.decode(s)
2020-07-24T14:10:24.834 INFO:tasks.cephfs_test_runner:  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
2020-07-24T14:10:24.834 INFO:tasks.cephfs_test_runner:    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2020-07-24T14:10:24.835 INFO:tasks.cephfs_test_runner:  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
2020-07-24T14:10:24.835 INFO:tasks.cephfs_test_runner:    raise JSONDecodeError("Expecting value", s, err.value) from None
2020-07-24T14:10:24.835 INFO:tasks.cephfs_test_runner:json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)
2020-07-24T14:10:24.836 INFO:tasks.cephfs_test_runner:

And this earlier in teuthology.log,

2020-07-24T13:32:19.990 INFO:tasks.ceph.mgr.y.smithi154.stderr:Exception in thread puregejob.2:
2020-07-24T13:32:19.990 INFO:tasks.ceph.mgr.y.smithi154.stderr:Traceback (most recent call last):
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.run()
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/share/ceph/mgr/volumes/fs/async_job.py", line 61, in run
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.async_job.unregister_async_job(vol_job[0], vol_job[1], thread_id)
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/share/ceph/mgr/volumes/fs/async_job.py", line 152, in unregister_async_job
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:    log.debug("unregistering async job {0}.{1} from thread {2}".format(volname, job, thread_id))
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 1137, in debug
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self._log(DEBUG, msg, args, **kwargs)
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 1268, in _log
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.handle(record)
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 1278, in handle
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.callHandlers(record)
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 1318, in callHandlers
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:    hdlr.handle(record)
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 749, in handle
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.emit(record)
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/share/ceph/mgr/mgr_module.py", line 65, in emit
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self._module._ceph_log(ceph_level, self.format(record))
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:TypeError: must be string without null bytes, not str
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:

@kotreshhr
Copy link
Contributor

I see this error now here,
http://pulpito.front.sepia.ceph.com/rraja-2020-07-24_11:33:52-fs-ajarr-nautilus-testing-2020-07-24-distro-basic-smithi/5253495/teuthology.log

020-07-24T14:10:24.831 INFO:tasks.cephfs_test_runner:======================================================================
2020-07-24T14:10:24.832 INFO:tasks.cephfs_test_runner:ERROR: test_volume_rm_arbitrary_pool_removal (tasks.cephfs.test_volumes.TestVolumes)
2020-07-24T14:10:24.832 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-07-24T14:10:24.832 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-07-24T14:10:24.833 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_ajarr-nautilus-testing-2020-07-24/qa/tasks/cephfs/test_volumes.py", line 325, in test_volume_rm_arbitrary_pool_removal
2020-07-24T14:10:24.833 INFO:tasks.cephfs_test_runner:    vol_status = json.loads(self._fs_cmd("status", self.volname, "--format=json-pretty"))
2020-07-24T14:10:24.834 INFO:tasks.cephfs_test_runner:  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
2020-07-24T14:10:24.834 INFO:tasks.cephfs_test_runner:    return _default_decoder.decode(s)
2020-07-24T14:10:24.834 INFO:tasks.cephfs_test_runner:  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
2020-07-24T14:10:24.834 INFO:tasks.cephfs_test_runner:    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2020-07-24T14:10:24.835 INFO:tasks.cephfs_test_runner:  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
2020-07-24T14:10:24.835 INFO:tasks.cephfs_test_runner:    raise JSONDecodeError("Expecting value", s, err.value) from None
2020-07-24T14:10:24.835 INFO:tasks.cephfs_test_runner:json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)
2020-07-24T14:10:24.836 INFO:tasks.cephfs_test_runner:

This is because 'ceph fs status' cmd's json support patch is missing. I think this needs following two patches.

  1. 4da6381
  2. 138117f

And this earlier in teuthology.log,

2020-07-24T13:32:19.990 INFO:tasks.ceph.mgr.y.smithi154.stderr:Exception in thread puregejob.2:
2020-07-24T13:32:19.990 INFO:tasks.ceph.mgr.y.smithi154.stderr:Traceback (most recent call last):
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.run()
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/share/ceph/mgr/volumes/fs/async_job.py", line 61, in run
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.async_job.unregister_async_job(vol_job[0], vol_job[1], thread_id)
2020-07-24T13:32:19.991 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/share/ceph/mgr/volumes/fs/async_job.py", line 152, in unregister_async_job
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:    log.debug("unregistering async job {0}.{1} from thread {2}".format(volname, job, thread_id))
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 1137, in debug
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self._log(DEBUG, msg, args, **kwargs)
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 1268, in _log
2020-07-24T13:32:19.992 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.handle(record)
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 1278, in handle
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.callHandlers(record)
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 1318, in callHandlers
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:    hdlr.handle(record)
2020-07-24T13:32:19.993 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/lib64/python2.7/logging/__init__.py", line 749, in handle
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self.emit(record)
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:  File "/usr/share/ceph/mgr/mgr_module.py", line 65, in emit
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:    self._module._ceph_log(ceph_level, self.format(record))
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:TypeError: must be string without null bytes, not str
2020-07-24T13:32:19.994 INFO:tasks.ceph.mgr.y.smithi154.stderr:

Not sure about this.

@kotreshhr
Copy link
Contributor

I verified that both tests pass locally after pulling in 4da6381 and 138117f


2020-07-26 23:13:44,886.886 INFO:__main__:test_volume_rm_when_mon_delete_pool_false (tasks.cephfs.test_volumes.TestVolumes) ... ok
2020-07-26 23:13:44,887.887 INFO:__main__:Stopped test: test_volume_rm_when_mon_delete_pool_false (tasks.cephfs.test_volumes.TestVolumes) in 21.876021s
2020-07-26 23:13:44,887.887 INFO:__main__:
2020-07-26 23:13:44,888.888 INFO:__main__:----------------------------------------------------------------------
2020-07-26 23:13:44,888.888 INFO:__main__:Ran 1 test in 21.877s
2020-07-26 23:13:44,888.888 INFO:__main__:
2020-07-26 23:13:44,888.888 INFO:__main__:OK


2020-07-26 23:14:47,771.771 INFO:__main__:test_volume_rm_arbitrary_pool_removal (tasks.cephfs.test_volumes.TestVolumes) ... ok
2020-07-26 23:14:47,772.772 INFO:__main__:Stopped test: test_volume_rm_arbitrary_pool_removal (tasks.cephfs.test_volumes.TestVolumes) in 18.754503s
2020-07-26 23:14:47,772.772 INFO:__main__:
2020-07-26 23:14:47,773.773 INFO:__main__:----------------------------------------------------------------------
2020-07-26 23:14:47,773.773 INFO:__main__:Ran 1 test in 18.755s
2020-07-26 23:14:47,773.773 INFO:__main__:
2020-07-26 23:14:47,773.773 INFO:__main__:OK

@ajarr
Copy link
Contributor Author

ajarr commented Jul 27, 2020

I verified that both tests pass locally after pulling in 4da6381 and 138117f

Thanks, Kotresh. But testing 4da6381 would require running rados/mgr suite. Can we avoid that?

@kotreshhr
Copy link
Contributor

I verified that both tests pass locally after pulling in 4da6381 and 138117f

Thanks, Kotresh. But testing 4da6381 would require running rados/mgr suite. Can we avoid that?

Yes, we can workaround by using the 'ceph osd poo ls' with 'detail' option to validate whether any pool has volume name as metadata. I have sent you the patch. Please apply it and run the test. Hopefully, there are no further surprises this time :)

kotreshhr and others added 7 commits July 27, 2020 18:05
While volume deletion, the associated pools are not always
removed. The pools are removed only if the volume is created
using mgr plugin and not if created with custom osd pools.
This is because mgr plugin generates pool names with specific
pattern. Both create and delete volume relies on it. This
patch fixes the issue by identifying the pools of the volume
without relying on the pattern.

Fixes: https://tracker.ceph.com/issues/45910
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit d07ea8d)

Conflicts:
	src/pybind/mgr/volumes/fs/operations/volume.py:
- In nautilus, fs volume create doesn't have placement arg
	src/pybind/mgr/volumes/fs/volume.py:
- In nautilus, VolumeClient code not moved to mgr_util.py
This provides a generic framework for modifying Ceph configuration
changes in tests through the monitors rather than the asok interface or
local ceph.conf changes. Any changes are reverted during test teardown.

A future patch will convert existing tests manipulating the local
ceph.conf or admin socket.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 8729281)
Volume deletion wasn't validating mon_allow_pool_delete config
before destroying volume metadata. Hence when mon_allow_pool_delete
is set to false, it was deleting metadata but failed to delete pool
resulting in inconsistent state. This patch validates the config
before going ahead with deletion.

Fixes: https://tracker.ceph.com/issues/45662
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit e770bb9)
Loop logic would bail out if it first sees any file system that does not
match the volume it's looking for.

Fixes: https://tracker.ceph.com/issues/46277
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit be74a81)
... for python versions earlier than 3.5.

Signed-off-by: Ramana Raja <rraja@redhat.com>
... instead of 'check_mon_command' which is not in nautilus, and not
compatible with python2 and early versions of py3.

Signed-off-by: Ramana Raja <rraja@redhat.com>
…status'

Signed-off-by: Kotresh HR <khiremat@redhat.com>
@ajarr ajarr force-pushed the nautilus-mgr-fs-volumes-misc branch from 6a7efb1 to 3116a25 Compare July 27, 2020 13:26
@ajarr
Copy link
Contributor Author

ajarr commented Jul 27, 2020

@kotreshhr do we still need 138117f ?

@kotreshhr
Copy link
Contributor

@kotreshhr do we still need 138117f ?

Nope, we don't need for this now.

@ajarr ajarr merged commit c0d6614 into ceph:nautilus Jul 27, 2020
@ajarr
Copy link
Contributor Author

ajarr commented Jul 27, 2020

failures were unrelated.

@neha-ojha
Copy link
Member

neha-ojha commented Jul 29, 2020

Looks like #33325 is a follow-on fix for 8729281 based on #33325 (comment). Was it backported to nautilus with 9f323e0?

@batrick
Copy link
Member

batrick commented Jul 29, 2020

No, @ajarr please create a followup backport for #33325.

@yuriw
Copy link
Contributor

yuriw commented Jul 29, 2020

@yuriw
Copy link
Contributor

yuriw commented Jul 30, 2020

see #36377

@ajarr
Copy link
Contributor Author

ajarr commented Jul 31, 2020

@neha-ojha @batrick @yuriw thanks! missed that it was modifying ceph_test_case and needed rados suite testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System nautilus-batch-1 nautilus point releases wip-ajarr-testing
Projects
None yet
6 participants