Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-mirror: image live-replay might become stuck #9282

Merged
merged 5 commits into from
May 25, 2016

Conversation

dillaman
Copy link

No description provided.

@trociny
Copy link
Contributor

trociny commented May 24, 2016

@dillaman observing this:

zhuzha:~/ceph/ceph.upstream/src% RBD_FEATURES=109 ./ceph_test_librbd --gtest_filter=TestJournalReplay.\*
Note: Google Test filter = TestJournalReplay.*
[==========] Running 13 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 13 tests from TestJournalReplay
[ RUN      ] TestJournalReplay.AioDiscardEvent
test/librbd/journal/test_Replay.cc:169: Failure
Value of: current_tag
  Actual: 1
Expected: initial_tag + 2
Which is: 2
[  FAILED  ] TestJournalReplay.AioDiscardEvent (4615 ms)
[ RUN      ] TestJournalReplay.AioWriteEvent
test/librbd/journal/test_Replay.cc:228: Failure
Value of: current_tag
  Actual: 1
Expected: initial_tag + 2
Which is: 2
[  FAILED  ] TestJournalReplay.AioWriteEvent (522 ms)
[ RUN      ] TestJournalReplay.AioFlushEvent
test/librbd/journal/test_Replay.cc:300: Failure
Value of: current_tag
  Actual: 1
Expected: initial_tag + 2
Which is: 2
[  FAILED  ] TestJournalReplay.AioFlushEvent (755 ms)
[ RUN      ] TestJournalReplay.SnapCreate
[       OK ] TestJournalReplay.SnapCreate (2758 ms)
[ RUN      ] TestJournalReplay.SnapProtect
[       OK ] TestJournalReplay.SnapProtect (2152 ms)
[ RUN      ] TestJournalReplay.SnapUnprotect
[       OK ] TestJournalReplay.SnapUnprotect (2271 ms)
[ RUN      ] TestJournalReplay.SnapRename
[       OK ] TestJournalReplay.SnapRename (1183 ms)
[ RUN      ] TestJournalReplay.SnapRollback
[       OK ] TestJournalReplay.SnapRollback (1053 ms)
[ RUN      ] TestJournalReplay.SnapRemove
[       OK ] TestJournalReplay.SnapRemove (4001 ms)
[ RUN      ] TestJournalReplay.Rename
[       OK ] TestJournalReplay.Rename (619 ms)
[ RUN      ] TestJournalReplay.Resize
[       OK ] TestJournalReplay.Resize (553 ms)
[ RUN      ] TestJournalReplay.Flatten
[       OK ] TestJournalReplay.Flatten (1682 ms)
[ RUN      ] TestJournalReplay.ObjectPosition
[       OK ] TestJournalReplay.ObjectPosition (445 ms)
[----------] 13 tests from TestJournalReplay (22610 ms total)

[----------] Global test environment tear-down
[==========] 13 tests from 1 test case ran. (23759 ms total)
[  PASSED  ] 10 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] TestJournalReplay.AioDiscardEvent
[  FAILED  ] TestJournalReplay.AioWriteEvent
[  FAILED  ] TestJournalReplay.AioFlushEvent

 3 FAILED TESTS

@dillaman
Copy link
Author

@trociny thanks -- should be better now.

@dillaman
Copy link
Author

@trociny actually, I just moved the unittest failure. fixing.

Jason Dillaman added 5 commits May 24, 2016 12:58
Clear the refetch required flag while scheduling the watch
and remove the stale object after the watch completes if still
empty. Previously, it was possible for the flag to become
out-of-sync with whether or not it was actually refreshed
and pruned.

Fixes: http://tracker.ceph.com/issues/15993
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Otherwise the recorded object positions might point to an older
object that doesn't contain the actual entry.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
It's possible that there might be additional entries to prune in
objects that haven't been prefetched yet. Keep the active tag
to allow these entries to be pruned after they have been loaded.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The randomized write sizes of the modified rbd-mirror stress
test results in a lot of journal object with few entries.
Immediately fetch objects when performing a refetch check prior
to closing an empty object.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
@trociny trociny merged commit 20de6de into ceph:master May 25, 2016
@dillaman dillaman deleted the wip-15993 branch May 25, 2016 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants