qa: split the osd-scrub-repair tests up #15868

squidboylan · 2017-06-23T06:04:41Z

It's better to have more smaller jobs than one large one, the shorter
ones can be run concurrently and are therefor faster

Signed-off-by: Caleb Boylan calebboylan@gmail.com

tchaikov · 2017-06-23T06:13:34Z

src/test/osd/osd-scrub-repair-auto-repair.sh

@@ -0,0 +1,7 @@
+#!/bin/bash
+
+source $(dirname $0)/../detect-build-env-vars.sh


you don't need to have a dedicated sh script for wrapping src/test/osd/osd-scrub-repair.sh.

what you need is

add_ceph_test(osd-scrub-repair.sh-jerasure ${CMAKE_CURRENT_SOURCE_DIR}/osd-scrub-repair.sh TEST_auto_repair_erasure_coded_appends TEST_auto_repair_erasure_coded_overwrites)

Thanks for this! Didn't realize I could do this but it totally makes sense.

wjwithagen · 2017-06-23T08:49:37Z

Is this going to work in parallel?

I would expect all the tests to start clobbering the same test directory? Because vstart is going to create multiple clusters with the same naming?
And also log files are going to be mixed up.

And if the tests are really fast, then building/destroying a cluster is a large overhead.
Next to the fact that the overwrite tests do not (yet) work for FreeBSD. so those test are only vstart/vstop.
There is a PR to fix it in osd-scrub-repair.sh, (#15445) but that is not commited yet.
If you put this fix in, then I'd prefer to fix it at the CMakefile. So tests that do not do any work are not executed at all.

squidboylan · 2017-06-23T14:03:02Z

Each test will now use it's own directory so they won't conflict, and splitting the tests should not add anymore building/destroying of clusters, as I understand it a new cluster is already made for each new test so this shouldn't be adding more overhead there

It's better to have more smaller jobs than one large one, the shorter ones can be run concurrently and are therefor faster Signed-off-by: Caleb Boylan calebboylan@gmail.com

liewegas · 2017-06-27T03:44:56Z

Need to verify the port numbres won't conflict if there are two instances of the script (w/ different tests) running in parallel...

dzafman · 2017-06-27T04:48:59Z

src/test/osd/osd-scrub-repair.sh

@@ -32,7 +32,18 @@ function run() {
    local dir=$1
    shift

-    export CEPH_MON="127.0.0.1:7107" # git grep '\<7107\>' : there must be only one
+    # Grab a random port from 7109 - 7200, this should maybe be reworked


Ports 7110 and 7111 and maybe others in your 7109 - 7200 range are in use. That's why the comment tells you to grep for any port numbers you want to use. For example:

$ git grep '\<7111\>' ../src ../src/test/osd/osd-copy-from.sh: export CEPH_MON="127.0.0.1:7111" # git grep '\<7111\>' : there must be only one

I'd like to see a comment line "# git grep '<xxxx>' : there must be only one" for each port number you are using.

A port 7111 conflict is one of the failures of the make check that Jenkins ran.

dzafman · 2017-06-27T05:00:55Z

src/test/osd/osd-scrub-repair.sh

-    export CEPH_MON="127.0.0.1:7107" # git grep '\<7107\>' : there must be only one
+    # Grab a random port from 7109 - 7200, this should maybe be reworked
+    for port in `seq 7109 7200 | shuf` ; do
+        if ! $(nc -z localhost "$port" > /dev/null); then


This seems racey. If 2 processes run nc on the same port number and don't see each other, they would both launch a mon with the conflicting port number.

Maybe the caller must be pass a unique port number for each parallel subtest. There are 38 other tests that are setting CEPH_MON themselves. Maybe port numbers should be 8000 + (100 * test#) + subtest#. All other tests are using a unique port in the 7000 -7999 range. We would fix the 4 tests that use 17108-17110 and 17119.

yeah it is racey, thanks for the comments, I like your fix for using a port tied to the test number. Using a fixed port per script has problems when trying to run things concurrently, but if the port is based on the test #, each test # is unique and that avoids having problems. I assume the test number is probably exposed as a environment variable or something, I'll look into that. Thanks!

dzafman · 2017-06-29T15:02:28Z

@squidboylan This pull request fixes http://tracker.ceph.com/issues/20242

I'd assign it to you, but don't see your name there.
Please add "Fixes: http://tracker.ceph.com/issues/20242" to the commit comment.

dzafman · 2017-06-29T15:08:23Z

@squidboylan Shouldn't you remove the shell scripts you added before you changed the CMakeLists.txt to break the test up that way?

dzafman · 2017-06-29T15:08:43Z

retest this please

squidboylan · 2017-07-08T03:50:25Z

I've been meaning to take more looks at this, but classes have started again so my time is extremely limited, if this is something that someone else wants to take over to get it done sooner rather than later, they're welcome to. I would like to finish it but I won't have time for a while.

dzafman · 2017-07-19T21:41:15Z

I'm working on this in my repo.

tchaikov requested changes Jun 23, 2017

View reviewed changes

tchaikov added the tests label Jun 23, 2017

qa: split the osd-scrub-repair tests up

ef5d273

It's better to have more smaller jobs than one large one, the shorter ones can be run concurrently and are therefor faster Signed-off-by: Caleb Boylan calebboylan@gmail.com

squidboylan force-pushed the split-osd-scrub-repair-tests branch from 4fe29d4 to ef5d273 Compare June 23, 2017 23:10

dzafman reviewed Jun 27, 2017

View reviewed changes

dzafman suggested changes Jun 27, 2017

View reviewed changes

dzafman closed this Jul 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qa: split the osd-scrub-repair tests up #15868

qa: split the osd-scrub-repair tests up #15868

squidboylan commented Jun 23, 2017

tchaikov Jun 23, 2017

squidboylan Jun 23, 2017

wjwithagen commented Jun 23, 2017

squidboylan commented Jun 23, 2017 •

edited

liewegas commented Jun 27, 2017

dzafman Jun 27, 2017

dzafman Jun 27, 2017

dzafman Jun 27, 2017

squidboylan Jun 27, 2017 •

edited

dzafman commented Jun 29, 2017

dzafman commented Jun 29, 2017

dzafman commented Jun 29, 2017

squidboylan commented Jul 8, 2017

dzafman commented Jul 19, 2017

		@@ -0,0 +1,7 @@
		#!/bin/bash

		source $(dirname $0)/../detect-build-env-vars.sh

qa: split the osd-scrub-repair tests up #15868

qa: split the osd-scrub-repair tests up #15868

Conversation

squidboylan commented Jun 23, 2017

tchaikov Jun 23, 2017

Choose a reason for hiding this comment

squidboylan Jun 23, 2017

Choose a reason for hiding this comment

wjwithagen commented Jun 23, 2017

squidboylan commented Jun 23, 2017 • edited

liewegas commented Jun 27, 2017

dzafman Jun 27, 2017

Choose a reason for hiding this comment

dzafman Jun 27, 2017

Choose a reason for hiding this comment

dzafman Jun 27, 2017

Choose a reason for hiding this comment

squidboylan Jun 27, 2017 • edited

Choose a reason for hiding this comment

dzafman commented Jun 29, 2017

dzafman commented Jun 29, 2017

dzafman commented Jun 29, 2017

squidboylan commented Jul 8, 2017

dzafman commented Jul 19, 2017

squidboylan commented Jun 23, 2017 •

edited

squidboylan Jun 27, 2017 •

edited