Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1 #37483

Merged
merged 2 commits into from
Oct 8, 2020

Conversation

dzafman
Copy link
Contributor

@dzafman dzafman commented Sep 29, 2020

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Signed-off-by: David Zafman <dzafman@redhat.com>
@dzafman dzafman changed the title Wip 46405 osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1 Sep 29, 2020
@dzafman
Copy link
Contributor Author

dzafman commented Sep 30, 2020

qa/standalone/osd/osd-rep-recov-eio.sh Outdated Show resolved Hide resolved
@batrick batrick added the core label Oct 1, 2020
@@ -224,6 +228,7 @@ function TEST_rados_repair_warning() {
rados_get $dir $poolname ${objbase}-$i || return 1
done

wait_for_clean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has the test been failing for lack of clean just at this stage or everywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a read that needs a repair, recovery is started. I never noticed on my build machine that the query could happen before the recovery is done, but teuthology did have this race. I did a sleep then Brad asked if there was some other way to tell of the repair is finished. So I realized that wait_for_clean() would do it. The only other possible race, is if recovery was async to the read repair. So if after the read the PG was still active+clean we would have a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just seen the test fail here so wondering whether we need to add wait_for_clean in all the other places or not, certainly doesn't hurt

@neha-ojha
Copy link
Member

@dzafman is this ready to merge?

@dzafman
Copy link
Contributor Author

dzafman commented Oct 8, 2020

@neha-ojha Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants