New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qa: split the osd-scrub-repair tests up #15868
Conversation
@@ -0,0 +1,7 @@ | |||
#!/bin/bash | |||
|
|||
source $(dirname $0)/../detect-build-env-vars.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to have a dedicated sh script for wrapping src/test/osd/osd-scrub-repair.sh
.
what you need is
add_ceph_test(osd-scrub-repair.sh-jerasure
${CMAKE_CURRENT_SOURCE_DIR}/osd-scrub-repair.sh
TEST_auto_repair_erasure_coded_appends
TEST_auto_repair_erasure_coded_overwrites)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Didn't realize I could do this but it totally makes sense.
Is this going to work in parallel? I would expect all the tests to start clobbering the same test directory? Because vstart is going to create multiple clusters with the same naming? And if the tests are really fast, then building/destroying a cluster is a large overhead. |
Each test will now use it's own directory so they won't conflict, and splitting the tests should not add anymore building/destroying of clusters, as I understand it a new cluster is already made for each new test so this shouldn't be adding more overhead there |
It's better to have more smaller jobs than one large one, the shorter ones can be run concurrently and are therefor faster Signed-off-by: Caleb Boylan calebboylan@gmail.com
4fe29d4
to
ef5d273
Compare
Need to verify the port numbres won't conflict if there are two instances of the script (w/ different tests) running in parallel... |
@@ -32,7 +32,18 @@ function run() { | |||
local dir=$1 | |||
shift | |||
|
|||
export CEPH_MON="127.0.0.1:7107" # git grep '\<7107\>' : there must be only one | |||
# Grab a random port from 7109 - 7200, this should maybe be reworked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ports 7110 and 7111 and maybe others in your 7109 - 7200 range are in use. That's why the comment tells you to grep for any port numbers you want to use. For example:
$ git grep '\<7111\>' ../src
../src/test/osd/osd-copy-from.sh: export CEPH_MON="127.0.0.1:7111" # git grep '\<7111\>' : there must be only one
I'd like to see a comment line "# git grep '<xxxx>' : there must be only one" for each port number you are using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A port 7111 conflict is one of the failures of the make check that Jenkins ran.
export CEPH_MON="127.0.0.1:7107" # git grep '\<7107\>' : there must be only one | ||
# Grab a random port from 7109 - 7200, this should maybe be reworked | ||
for port in `seq 7109 7200 | shuf` ; do | ||
if ! $(nc -z localhost "$port" > /dev/null); then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems racey. If 2 processes run nc on the same port number and don't see each other, they would both launch a mon with the conflicting port number.
Maybe the caller must be pass a unique port number for each parallel subtest. There are 38 other tests that are setting CEPH_MON themselves. Maybe port numbers should be 8000 + (100 * test#) + subtest#. All other tests are using a unique port in the 7000 -7999 range. We would fix the 4 tests that use 17108-17110 and 17119.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it is racey, thanks for the comments, I like your fix for using a port tied to the test number. Using a fixed port per script has problems when trying to run things concurrently, but if the port is based on the test #, each test # is unique and that avoids having problems. I assume the test number is probably exposed as a environment variable or something, I'll look into that. Thanks!
@squidboylan This pull request fixes http://tracker.ceph.com/issues/20242 I'd assign it to you, but don't see your name there. |
@squidboylan Shouldn't you remove the shell scripts you added before you changed the CMakeLists.txt to break the test up that way? |
retest this please |
I've been meaning to take more looks at this, but classes have started again so my time is extremely limited, if this is something that someone else wants to take over to get it done sooner rather than later, they're welcome to. I would like to finish it but I won't have time for a while. |
I'm working on this in my repo. |
It's better to have more smaller jobs than one large one, the shorter
ones can be run concurrently and are therefor faster
Signed-off-by: Caleb Boylan calebboylan@gmail.com