Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: make check using cmake #10116

Merged
merged 9 commits into from Jul 7, 2016
Merged

Conversation

tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Jul 4, 2016

to test the "make check" with cmake, see #10111

@tchaikov tchaikov force-pushed the wip-cmake-make-check branch 8 times, most recently from eb42592 to 5a3bb18 Compare July 5, 2016 14:27
@tchaikov tchaikov changed the title [DNM] cmake make check cmake make check Jul 5, 2016
@tchaikov tchaikov changed the title cmake make check test: make check using cmake Jul 5, 2016
@tchaikov tchaikov assigned liewegas and alimaredia and unassigned liewegas and alimaredia Jul 5, 2016
@tchaikov tchaikov changed the title test: make check using cmake [DNM] test: make check using cmake Jul 5, 2016
@alimaredia
Copy link
Contributor

@tchaikov is this still DNM? Earlier today we talked about a PR that was ready to merge and I feel like I might have mixed up this PR and #10016 up since the numbers are so similar.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 6, 2016

timeout tests

  • test-ceph-helpers.sh
  • ceph_objectstore_tool.py

failed/segfault test

  • test_async_compressor

@tchaikov tchaikov force-pushed the wip-cmake-make-check branch 2 times, most recently from 0413d65 to f0f89ee Compare July 6, 2016 12:09
@tchaikov tchaikov changed the title [DNM] test: make check using cmake test: make check using cmake Jul 6, 2016
@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 6, 2016

@alimaredia i think it's ready to merge albeit some failures listed above. but the autotools build also times out.

the only mystery is test_async_compressor.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 6, 2016

changelog

  • rebase against master
  • remove ALL from 'ceph-disk' and 'ceph-detect-init'
  • run "make check" in run-make-check.sh for preparing for running "ctest".

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 6, 2016

seems we practically ran "make check" twice in https://jenkins.ceph.com/job/ceph-pull-requests/8638/console. in the first pass, all tests passed. in the second pass, 142/144 passed.

changelog

  • just run "CTEST_OUTPUT_ON_FAILURE=1 make check"
  • drop "ctest" call.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 7, 2016

retest this please.

Signed-off-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
create temp directory and files in $TMPDIR. the $TMPDIR is hard-wired to
/tmp before this change, we'd better respect the env variable $TMPDIR,
so it would be more consistent, and easier to do the cleanup if any.

Signed-off-by: Kefu Chai <kchai@redhat.com>
we should use the cmake function of add_ceph_test() to add
osd-copy-from.sh as a test. then we won't miss any env variables.
w/o this change, $CEPH_BUILD_VIRTUALENV is not passed to
osd-copy-from.sh.

Signed-off-by: Kefu Chai <kchai@redhat.com>
"rados -p rbd put foo rados" does not work if "rados" is not in current
path. so change it to "rados -p rbd put foo $(which rados)"

Signed-off-by: Kefu Chai <kchai@redhat.com>
This reverts commit 19c0731.

Signed-off-by: Kefu Chai <kchai@redhat.com>
"make all" does not offer "ceph-disk" and "ceph-detect-init" for
testing. as they are solely used for testing purpose. instead, these two
python command line packages are installed by the "install" target. so
we need to use "make check" to 1) prepare the test dependencies 2)
launch ctest to perform the test.

Signed-off-by: Kefu Chai <kchai@redhat.com>
@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 7, 2016

changelog

  • TEST_PARALLEL_LEVEL=$(get_processors) make check

@tchaikov tchaikov force-pushed the wip-cmake-make-check branch 3 times, most recently from 8e7199e to 79027eb Compare July 7, 2016 11:27
@wjwithagen
Copy link
Contributor

@tchaikov
Perhaps obvious, but
Another nice place to reduce time is to disable the bench tests in:
unittest_bufferlist (106 sec)
unittest_bufferlist.sh (111 sec)

another killertime is:
unittest_erasure_code_shec_all (405 sec)
erasure-decode-non-regression.sh (236 sec)

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 7, 2016

@wjwithagen i am trying to bisect the failure of timeout, not to reduce the time of running unit tests. maybe we can do it later.

@wjwithagen
Copy link
Contributor

@tchaikov
Anything I can do to help?

@wjwithagen
Copy link
Contributor

ceph-helpers has a few tests that do things like:
ceph --connect-timeout 60 status
you might want to cut that time, or even rewrite these tests.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 7, 2016

@wjwithagen there are surely some places we can improve. but i just want to fix the jenkins "make check" this time. if you could reproduce the timeout/failure, that would be great.

Signed-off-by: Kefu Chai <kchai@redhat.com>
@liewegas
Copy link
Member

liewegas commented Jul 7, 2016

Why is make check faster? Is it doing less? Or is ctest in parallel breaking because of tests interfering?

@liewegas
Copy link
Member

liewegas commented Jul 7, 2016

In any case, I have no problems with merging this now and fixing it up later, since it appears as though make check on autotools is also broken right now.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 7, 2016

Or is ctest in parallel breaking because of tests interfering?

yes, that's why "make check" with parallel level of 1 (the default value) is faster. just because it does not timeout. anyway, i put the // back to "make check" by adding "CTEST_PARALLEL_LEVEL" env var.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 7, 2016

so three failures:

  • qa/workunits/ceph-helpers.sh: after disabling test_pg_scrub(), it does not timeout anymore. // autotools build also suffers from this.
  • ceph_objectstore_tool.py: timeouts. not sure why. // autotools build also suffers from this.
  • unittest_async_compressor: segfault in "AsyncCompressorTest.SyntheticTest"

@alimaredia alimaredia merged commit 3ed34b5 into ceph:master Jul 7, 2016
@wjwithagen
Copy link
Contributor

@tchaikov
(Did not get around to posting this, since we had guests.)
I see what you mean, my large cmake pull also gets terminated after 2 hours and 1 minute.
Where as when I run it here it does not even come close to that.

And running ctest -j$(nproc) did not help, because you get errors?
My parallel runtime is actually determined by the runtime of test-ceph-helpers.sh.

Looking at the time distribution (on FreeBSD) I have:

real    1m8.570s user   0m3.683s Completed test: test_kill_daemons
real    1m8.541s user   0m3.740s Completed test: test_kill_daemon
real    0m20.751s user  0m8.724s Completed test: test_objectstore_tool
real    0m18.850s user  0m8.174s Completed test: test_pg_scrub
real    0m18.797s user  0m7.962s Completed test: test_repair
real    0m18.570s user  0m7.736s Completed test: test_wait_for_scrub
real    0m17.268s user  0m8.351s Completed test: test_get_not_primary
real    0m16.298s user  0m7.836s Completed test: test_get_osds
real    0m15.268s user  0m6.992s Completed test: test_wait_for_clean
real    0m15.174s user  0m7.957s Completed test: test_run_osd
real    0m13.851s user  0m6.713s Completed test: test_is_clean
real    0m12.756s user  0m5.642s Completed test: test_get_last_scrub_stamp
real    0m12.629s user  0m5.866s Completed test: test_get_primary
real    0m12.555s user  0m5.661s Completed test: test_get_num_pgs
real    0m12.549s user  0m5.689s Completed test: test_get_pg
real    0m12.516s user  0m5.787s Completed test: test_get_num_active_clean
real    0m9.601s user   0m5.077s Completed test: test_destroy_osd
real    0m8.764s user   0m4.167s Completed test: test_wait_for_osd
real    0m8.614s user   0m4.145s Completed test: test_run_mon
real    0m7.996s user   0m4.118s Completed test: test_activate_osd
real    0m6.163s user   0m3.206s Completed test: test_get_config
real    0m4.177s user   0m0.010s Completed test: test_wait_background
real    0m3.244s user   0m1.833s Completed test: test_erasure_code_plugin_exists
real    0m2.168s user   0m1.159s Completed test: test_set_config
real    0m2.107s user   0m1.226s Completed test: test_get_is_making_recovery_progress
real    0m1.945s user   0m0.894s Completed test: test_display_logs
real    0m1.066s user   0m0.009s Completed test: test_run_in_background
real    0m0.027s user   0m0.008s Completed test: test_expect_failure
real    0m0.019s user   0m0.002s Completed test: test_setup
real    0m0.013s user   0m0.003s Completed test: test_teardown
Total Test time (real) = 410.92 sec

@tchaikov tchaikov deleted the wip-cmake-make-check branch July 8, 2016 12:23
@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 8, 2016

And running ctest -j$(nproc) did not help, because you get errors?

but i don't get any error with "-jN" locally.

@wjwithagen
Copy link
Contributor

Oke, so it was just only timeout trouble.
I did notice that a serious amount of paging (in) occured when I was running the tests in parallel.
That made for a extra 10-20% run time on every test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants