New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa: move ceph-helpers-based make check tests to qa/standalone; run via teuthology #16513

Merged
merged 11 commits into from Jul 25, 2017

Conversation

Projects
None yet
6 participants
@liewegas
Member

liewegas commented Jul 23, 2017

  • make check is not short (3m is slowest task, shec)
  • those tests moved to qa/standalone, with readme
  • ceph_objecstore_tool.py disabled for now
  • lots of little fixes to make the tests actually work!

@liewegas liewegas requested a review from tchaikov Jul 23, 2017

@liewegas liewegas added this to the luminous milestone Jul 23, 2017

@liewegas

This comment has been minimized.

Member

liewegas commented Jul 23, 2017

These are reliably triggering a mon bug with the osdmap osd_state bits via osd-reuse-id.sh. I'm not sure why it's not triggering when running locally, but either way it's unrelated to this change so if i'm happy to merge this (and grow a teuthology failure) since it's an urgent bug anyway. See http://tracker.ceph.com/issues/20751

@wjwithagen

This comment has been minimized.

Contributor

wjwithagen commented Jul 24, 2017

@liewegas @tchaikov
I see lot of tests moving out.
It would be nice/handy if they moved to something like: LongTest.
FreeBSD does not have Theutology real soon, but I do like/need these tests to get at least a bit of QA.

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jul 24, 2017

@liewegas asides from the commits for debugging, the changes lgtm . also i checked the failure of crush-choose-args.sh. seems it was caused by 48a9023

#1  0x00007f5405c1fa26 in handle_fatal_signal(int) ()
#2  <signal handler called>
#3  0x00007f54019a61d7 in raise () from /lib64/libc.so.6
#4  0x00007f54019a78c8 in abort () from /lib64/libc.so.6
#5  0x00007f5405978814 in ceph::__ceph_assert_fail(char	const*,	char const*, int, char const*) ()
#6  0x00007f5405616252 in OSDMap::get_uuid(int)	const [clone .part.164] ()
#7  0x00007f5405a60923 in OSDMap::dump(ceph::Formatter*) const ()
#8  0x00007f54058afae4 in OSDMonitor::encode_pending(std::shared_ptr<MonitorDBStore::Transaction>) ()

so probably we can have another round of rados suite after it is removed?

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jul 24, 2017

@wjwithagen we can add them back with a launcher for those tests. but we don't get it for free: that means some more work. i can help explore some possible approaches to fix it in future. but it's not my first priority now.

@@ -381,7 +388,7 @@ function test_kill_daemons() {
# @param ... can be any option valid for ceph-mon
# @return 0 on success, 1 on error
#
function run_mon_no_pool() {
function run_mon() {

This comment has been minimized.

@tchaikov

tchaikov Jul 24, 2017

Contributor

this change breaks ceph-disk/tests/ceph-disk.sh. because it expects that run_mon() create rbd pool for it. see test_activate_dir() and read_write()

@wjwithagen

This comment has been minimized.

Contributor

wjwithagen commented Jul 24, 2017

@tchaikov @liewegas
Otherwise I have to put porting Teuthology on the top of my list, and I've broken my fingers on it already 2 times. So I'd rather do it from the tree, even if that needs some work, it'll allow me to run things from Jenkins as well.

What problems do we run into when we make it a separate target in CMake?
Would me need a teuthology "emulator"

@liewegas

This comment has been minimized.

Member

liewegas commented Jul 24, 2017

@wjwithagen you can still run the tests, they just don't run automatically with make check. Maybe a wrapper script like run-all-standalone.sh would be in order? It can pass all the right env args. I'll add it.

liewegas added some commits Jul 20, 2017

qa/tasks/workunit: allow alt basedir
Instead of 'qa/workunits' allow something like 'qa/standalone'.

Signed-off-by: Sage Weil <sage@redhat.com>
unittest_compression: speed it up
~4m -> ~25s

Signed-off-by: Sage Weil <sage@redhat.com>
deb,rpm: require socat for ceph-test
Used by qa/standalone/mon/mon-bind.sh.

Signed-off-by: Sage Weil <sage@redhat.com>
ceph: drop --admin-socket warning/error on ceph cli
'ceph daemon' has existed for a long time; this has outlived its
usefullness.

Signed-off-by: Sage Weil <sage@redhat.com>
@wjwithagen

This comment has been minimized.

Contributor

wjwithagen commented Jul 24, 2017

@liewegas
That would be great. I'd stick that in my Jenkins after the regular tests.
Slicing them in smaller bits would help in shortening the runtime, but that is not what this PR is about.

@dzafman

This comment has been minimized.

Member

dzafman commented Jul 24, 2017

@liewegas No yaml for osd-scrub-repair?

liewegas added some commits Jul 21, 2017

qa: move ceph-helpers and misc src/test/*.sh tests to qa/standalone
- stop running via make check
- add teuthology yamls to run them
- disable ceph_objecstore_tool.py for now (too slow for make check, and
we can't use vstart in teuthology via a package install)
- drop cephtool tests since those are already covered by other teuthology
tests
- leave a handful of (fast!) ceph-helpers tests for make check for minimal
integration tests.

Signed-off-by: Sage Weil <sage@redhat.com>
qa/standalone/ceph-helpers: factor rbd pool create out of run_mon
Signed-off-by: Sage Weil <sage@redhat.com>
qa/standalone/scrub: separate scrub/repair tests from rest of osd/
They are slow.  Run them separately.

Signed-off-by: Sage Weil <sage@redhat.com>
mds/MDSMap: init mds_features
This can lead to ceph-dencoder reencode failures.  Doesn't matter too
much in the real world since body decodes these ancient mds_info_t
structs.

Signed-off-by: Sage Weil <sage@redhat.com>
qa/run-standalone.sh: helper to run all standalone tests
Nothing fancy, but documents how these are run.

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas

This comment has been minimized.

Member

liewegas commented Jul 25, 2017

@tchaikov this one is pissing me off. i can't run teh ceph-disk test locally bc i get

7: py27 installed: alabaster==0.7.8,ansible==2.2.0.0,Babel==2.3.4,backports.ssl-match-hostname==3.5.0.1,Beaker==1.5.4,beautifulsoup4==4.5.1,ceph-detect-init==1.0.1,-e git+git@github.com:ceph/ceph@7c157863a80b1310b6cf85a432aaeb427a1b4f1e#egg=ceph_disk&subdirectory=src/ceph-disk,cffi==1.7.0,chardet==2.3.0,CherryPy==3.5.0,click==6.7,configobj==5.0.6,configparser==3.5.0,coverage==4.4.1,cryptography==1.5.3,cssselect==0.9.2,custodia==0.1.0,Cython==0.24.1,decorator==4.0.10,discover==0.4.0,dnspython==1.15.0,dockerfile-parse==0.0.5,docutils==0.12,ecdsa==0.13,enum34==1.0.4,extras==1.0.0,fedpkg==1.25,file-magic==0.3.0,fixtures==3.0.0,flake8==3.0.4,Flask==0.11.1,funcsigs==1.0.2,futures==3.0.4,gitdb==0.6.4,GitPython==1.0.1,gssapi==1.2.0,html5lib==0.999,httplib2==0.9.2,idna==2.0,imagesize==0.7.1,iniparse==0.4,ipaclient==4.4.2,ipaddress==1.0.16,ipalib==4.4.2,ipaplatform==4.4.2,ipapython==4.4.2,IPy==0.81,itsdangerous==0.24,Jinja2==2.8,jwcrypto==0.3.2,kerberos==1.2.5,kitchen==1.2.4,langtable==0.0.36,linecache2==1.0.0,lockfile==0.11.0,logutils==0.3.3,lxml==3.6.4,M2Crypto==0.25.1,Mako==1.0.6.dev0,MarkupSafe==0.23,mccabe==0.5.3,mercurial==3.8.1,mock==2.0.0,munch==2.0.4,netaddr==0.7.18,netifaces==0.10.4,nose==1.3.7,offtrac==0.1.0,openlmi==0.6.0,ordered-set==2.0.0,osbs-client==0.32,packagedb-cli==2.13,paramiko==2.0.0,Paste==2.0.3,pbr==1.10.0,pecan==1.1.2,Pillow==3.4.2,pluggy==0.3.1,ply==3.8,prettytable==0.7.2,py==1.4.31,pyasn1==0.1.9,pyasn1-modules==0.0.8,pycodestyle==2.0.0,pycparser==2.14,pycrypto==2.6.1,pycryptopp==0.6.0.1206569328141510525648634803928199668821045408958,pycurl==7.43.0,pyenchant==1.6.8,pyflakes==1.2.3,Pygments==2.1.3,pygobject==3.22.0,pygpgme==0.3,pykickstart==2.32,pyliblzma==0.5.3,pyOpenSSL==16.0.0,pyparsing==2.1.5,pyparted==3.10.7,PySocks==1.5.6,pytest==2.9.2,python-augeas==0.5.0,python-bugzilla==1.2.2,python-dateutil==2.6.0,python-dmidecode==3.12.2,python-fedora==0.8.0,python-keyczar==0.71rc0,python-ldap==2.4.25,python-mimeparse==1.6.0,python-nss==1.0.0,python-subunit==1.2.0,python-yubico==1.3.2,pytz==2016.6.1,pyudev==0.21.0,pyusb==1.0.0,pywbem==0.9.0,pyxattr==0.5.3,PyXB==1.2.4,PyYAML==3.11,qrcode==5.1,requests==2.10.0,requests-kerberos==0.10.0,rpkg==1.46,rpm-python==4.13.0,simplegeneric==0.8.1,simplejson==3.5.3,singledispatch==3.4.0.3,six==1.10.0,slip==0.6.4,slip.dbus==0.6.4,smmap==0.9.0,snowballstemmer==1.2.1,Sphinx==1.4.8,sphinx-rtd-theme==0.1.9,SSSDConfig==1.14.2,systemd-python==232,Tempita==0.5.1,testrepository==0.0.20,testtools==2.3.0,tox==2.3.1,traceback2==1.4.0,typing==3.5.2.2,unittest2==1.1.0,urlgrabber==3.10.1,urllib3==1.15.1,virtualenv==15.0.3,waitress==0.9.0,WebOb==1.6.1,WebTest==2.0.23,Werkzeug==0.11.10,yum-langpacks==0.4.5,yum-metadata-parser==1.1.4
7: py27 runtests: PYTHONHASHSEED='2459449752'
7: py27 runtests: commands[0] | coverage run --append --source=ceph_disk /nvm/src/ceph3/src/ceph-disk/.tox/py27/bin/py.test -vv /nvm/src/ceph3/src/ceph-disk/tests/test_main.py
7: No file to run: '/nvm/src/ceph3/src/ceph-disk/.tox/py27/bin/py.test'
7: ERROR: InvocationError: '/nvm/src/ceph3/src/ceph-disk/.tox/py27/bin/coverage run --append --source=ceph_disk /nvm/src/ceph3/src/ceph-disk/.tox/py27/bin/py.test -vv /nvm/src/ceph3/src/ceph-disk/tests/test_main.py'
7: ___________________________________ summary ____________________________________
7:   flake8: commands succeeded
7: ERROR:   py27: commands failed
1/1 Test #7: run-tox-ceph-disk ................***Failed    9.90 sec

and this time i pushed it i get a totally different set of errors.. :/

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jul 25, 2017

@liewegas ack. lemme take a look.

@@ -294,6 +294,7 @@ function test_reuse_osd_id() {
run_mon $dir a || return 1
run_mgr $dir x || return 1
create_rbd_pool
ceph osd pool create bar 128

This comment has been minimized.

@tchaikov

tchaikov Jul 25, 2017

Contributor

@liewegas why would you want to create more pgs here?
with following patch, the ceph-disk.sh test passes

diff --git a/src/ceph-disk/tests/ceph-disk.sh b/src/ceph-disk/tests/ceph-disk.sh
index 92ff163f6a..0995e8dd25 100755
--- a/src/ceph-disk/tests/ceph-disk.sh
+++ b/src/ceph-disk/tests/ceph-disk.sh
@@ -294,7 +294,7 @@ function test_reuse_osd_id() {
     run_mon $dir a || return 1
     run_mgr $dir x || return 1
     create_rbd_pool
-    ceph osd pool create bar 128
+    # ceph osd pool create bar 128

     test_activate $dir $dir/dir1 --osd-uuid $(uuidgen) || return 1

@@ -309,7 +309,7 @@ function test_reuse_osd_id() {
     #
     # make sure the OSD is in use by the PGs
     #
-    wait_osd_id_used_by_pgs $osd_id 6 || return 1
+    wait_osd_id_used_by_pgs $osd_id $PG_NUM || return 1
     read_write $dir SOMETHING || return 1

     #
@@ -294,6 +294,7 @@ function test_reuse_osd_id() {
run_mon $dir a || return 1
run_mgr $dir x || return 1
create_rbd_pool
ceph osd pool create bar 128

This comment has been minimized.

@tchaikov

tchaikov Jul 25, 2017

Contributor

@liewegas why would you want to add more pgs?

with following patch applied, this test passes:

diff --git a/src/ceph-disk/tests/ceph-disk.sh b/src/ceph-disk/tests/ceph-disk.sh
index 92ff163f6a..0995e8dd25 100755
--- a/src/ceph-disk/tests/ceph-disk.sh
+++ b/src/ceph-disk/tests/ceph-disk.sh
@@ -294,7 +294,7 @@ function test_reuse_osd_id() {
     run_mon $dir a || return 1
     run_mgr $dir x || return 1
     create_rbd_pool
-    ceph osd pool create bar 128
+    # ceph osd pool create bar 128

     test_activate $dir $dir/dir1 --osd-uuid $(uuidgen) || return 1

@@ -309,7 +309,7 @@ function test_reuse_osd_id() {
     #
     # make sure the OSD is in use by the PGs
     #
-    wait_osd_id_used_by_pgs $osd_id 6 || return 1
+    wait_osd_id_used_by_pgs $osd_id $PG_NUM || return 1
     read_write $dir SOMETHING || return 1

     #

This comment has been minimized.

@wjwithagen

wjwithagen Jul 25, 2017

Contributor

@liewegas
ceph-disk has a set of rather convoluted and intrincate ways to find out where its components and modules are. But it only assumes that it is either in a normal installed running environment: /usr/bin or /usr/local/bin, OR in ./bin when running from build.
Otherwise you need to set some vars, like:

export PYTHONPATH=./pybind:/usr/srcs/Ceph/work/ceph/src/pybind:/usr/srcs/Ceph/work/ceph/build/lib/cython_modules/lib.2:
export LD_LIBRARY_PATH=/usr/srcs/Ceph/work/ceph/build/lib
ceph-disk/tests/ceph-disk.sh: wait for right number of pgs
Signed-off-by: Sage Weil <sage@redhat.com>

@tchaikov tchaikov added the needs-qa label Jul 25, 2017

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jul 25, 2017

lgtm as long as the rados suite passes. and i will test run-standalone.sh locally.

@liewegas liewegas merged commit d3e1c6c into ceph:master Jul 25, 2017

3 of 4 checks passed

make check (arm64) running make check
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details

@liewegas liewegas deleted the liewegas:wip-standalone branch Jul 25, 2017

@wjwithagen

This comment has been minimized.

Contributor

wjwithagen commented Jul 25, 2017

@liewegas @tchaikov
Nice PR, reduces the basic build-and-test to 37 minutes (using ccache) versus 1,5 hour.

@dillaman

This comment has been minimized.

@wjwithagen

This comment has been minimized.

Contributor

wjwithagen commented Jul 26, 2017

@dillaman
Not good....
But I'm always having a serious problem in finding the exact spot where teuthology decides that it has receveid a "fatal" error. Let alone find the qa code that is actually responsable for the problem.

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jul 26, 2017

@dillaman @liewegas should be addressed by #16598

@liewegas

This comment has been minimized.

Member

liewegas commented Jul 26, 2017

@xiexingguo

This comment has been minimized.

Member

xiexingguo commented Jul 28, 2017

lgtm as long as the rados suite passes. and i will test run-standalone.sh locally.

@tchaikov @liewegas Is there anyway to run run-standalone.sh via vstart or ctest? I have a relevant issue #16623 to figure out..

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jul 28, 2017

@xiexingguo see #16646

cd build && ../qa/run-standalone.sh
@xiexingguo

This comment has been minimized.

Member

xiexingguo commented Jul 28, 2017

@xiexingguo see #16646

Great!
Thanks, kefu, will give a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment