New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix test-erasure-eio and osd-scrub-repair races (17830) #11926
Conversation
@dzafman, i will pull your PR in once i figure out why the jenkins "make check" still fails and fix it. |
a51edc1
to
e9d9a50
Compare
4102e3a
to
afcead2
Compare
Signed-off-by: David Zafman <dzafman@redhat.com>
Reduce size of log on timeout by doing a backoff so that we don't log 3000 loops at 1/10 second sleeps. Signed-off-by: David Zafman <dzafman@redhat.com>
Tests use objectstore_tool() which stops and starts OSDs, but may assume consistency of object locations. Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
998ae49
to
bb172fb
Compare
@tchaikov My changes in osd-scrub-repair.sh might fix the races. |
Jenkins passed, but let's try another attempt. I have a commit that fixes compiler warnings. |
@athanatos If you want to move these tests to teuthology we don't have to put this pull request in. It seems to fix the races I know about, but can't be sure there aren't others. |
@dzafman Meh, if it's reliable, it's not a particularly high priority to move it to teuthology. |
Looks like jenkins failed though. |
Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
Caused by: af720cc Signed-off-by: David Zafman <dzafman@redhat.com>
e11e8a9
to
006ab8d
Compare
I fixed something else in osd-scrub-repair. Let's try it again. retest this please |
Add comments about uniqueness of port number required Signed-off-by: David Zafman <dzafman@redhat.com>
This failed because test blew through 60 requests in 8 seceons before the scrub repair even started on Jenkins. Signed-off-by: David Zafman <dzafman@redhat.com>
@athanatos @tchaikov Seemingly unrelated unittest_journal failed. |
@@ -26,7 +26,7 @@ function run() { | |||
local dir=$1 | |||
shift | |||
|
|||
export CEPH_MON="127.0.0.1:17109" | |||
export CEPH_MON="127.0.0.1:17110" # git grep '\<17110\>' : there must be only one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my! good catch!
@@ -355,6 +355,7 @@ function TEST_list_missing_erasure_coded() { | |||
[ $i -lt 60 ] || return 1 | |||
matches=$(ceph pg $pg list_missing | egrep "MOBJ0|MOBJ1" | wc -l) | |||
[ $matches -eq 2 ] && break | |||
sleep 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use repair $pg || return 1
. as we do wait_for_scrub()
in repair()
. i will post another PR to do this cleanup.
I want Jenkins to run this. And probably have @tchaikov cherry-pick the test commit to #11914 and just close this pull request.