Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa: split the osd-scrub-repair tests up #15868

Closed

Conversation

squidboylan
Copy link
Contributor

It's better to have more smaller jobs than one large one, the shorter
ones can be run concurrently and are therefor faster

Signed-off-by: Caleb Boylan calebboylan@gmail.com

@@ -0,0 +1,7 @@
#!/bin/bash

source $(dirname $0)/../detect-build-env-vars.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to have a dedicated sh script for wrapping src/test/osd/osd-scrub-repair.sh.

what you need is

add_ceph_test(osd-scrub-repair.sh-jerasure
  ${CMAKE_CURRENT_SOURCE_DIR}/osd-scrub-repair.sh
    TEST_auto_repair_erasure_coded_appends
    TEST_auto_repair_erasure_coded_overwrites)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Didn't realize I could do this but it totally makes sense.

@tchaikov tchaikov added the tests label Jun 23, 2017
@wjwithagen
Copy link
Contributor

Is this going to work in parallel?

I would expect all the tests to start clobbering the same test directory? Because vstart is going to create multiple clusters with the same naming?
And also log files are going to be mixed up.

And if the tests are really fast, then building/destroying a cluster is a large overhead.
Next to the fact that the overwrite tests do not (yet) work for FreeBSD. so those test are only vstart/vstop.
There is a PR to fix it in osd-scrub-repair.sh, (#15445) but that is not commited yet.
If you put this fix in, then I'd prefer to fix it at the CMakefile. So tests that do not do any work are not executed at all.

@squidboylan
Copy link
Contributor Author

squidboylan commented Jun 23, 2017

Each test will now use it's own directory so they won't conflict, and splitting the tests should not add anymore building/destroying of clusters, as I understand it a new cluster is already made for each new test so this shouldn't be adding more overhead there

It's better to have more smaller jobs than one large one, the shorter
ones can be run concurrently and are therefor faster

Signed-off-by: Caleb Boylan calebboylan@gmail.com
@liewegas
Copy link
Member

Need to verify the port numbres won't conflict if there are two instances of the script (w/ different tests) running in parallel...

@@ -32,7 +32,18 @@ function run() {
local dir=$1
shift

export CEPH_MON="127.0.0.1:7107" # git grep '\<7107\>' : there must be only one
# Grab a random port from 7109 - 7200, this should maybe be reworked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ports 7110 and 7111 and maybe others in your 7109 - 7200 range are in use. That's why the comment tells you to grep for any port numbers you want to use. For example:

$ git grep '\<7111\>' ../src
../src/test/osd/osd-copy-from.sh:    export CEPH_MON="127.0.0.1:7111" # git grep '\<7111\>' : there must be only one

I'd like to see a comment line "# git grep '<xxxx>' : there must be only one" for each port number you are using.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A port 7111 conflict is one of the failures of the make check that Jenkins ran.

export CEPH_MON="127.0.0.1:7107" # git grep '\<7107\>' : there must be only one
# Grab a random port from 7109 - 7200, this should maybe be reworked
for port in `seq 7109 7200 | shuf` ; do
if ! $(nc -z localhost "$port" > /dev/null); then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems racey. If 2 processes run nc on the same port number and don't see each other, they would both launch a mon with the conflicting port number.

Maybe the caller must be pass a unique port number for each parallel subtest. There are 38 other tests that are setting CEPH_MON themselves. Maybe port numbers should be 8000 + (100 * test#) + subtest#. All other tests are using a unique port in the 7000 -7999 range. We would fix the 4 tests that use 17108-17110 and 17119.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it is racey, thanks for the comments, I like your fix for using a port tied to the test number. Using a fixed port per script has problems when trying to run things concurrently, but if the port is based on the test #, each test # is unique and that avoids having problems. I assume the test number is probably exposed as a environment variable or something, I'll look into that. Thanks!

@dzafman
Copy link
Contributor

dzafman commented Jun 29, 2017

@squidboylan This pull request fixes http://tracker.ceph.com/issues/20242

I'd assign it to you, but don't see your name there.
Please add "Fixes: http://tracker.ceph.com/issues/20242" to the commit comment.

@dzafman
Copy link
Contributor

dzafman commented Jun 29, 2017

@squidboylan Shouldn't you remove the shell scripts you added before you changed the CMakeLists.txt to break the test up that way?

@dzafman
Copy link
Contributor

dzafman commented Jun 29, 2017

retest this please

@squidboylan
Copy link
Contributor Author

I've been meaning to take more looks at this, but classes have started again so my time is extremely limited, if this is something that someone else wants to take over to get it done sooner rather than later, they're welcome to. I would like to finish it but I won't have time for a while.

@dzafman
Copy link
Contributor

dzafman commented Jul 19, 2017

I'm working on this in my repo.

@dzafman dzafman closed this Jul 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants