Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event") #20933

Merged
merged 1 commit into from Mar 22, 2018

Conversation

dzafman
Copy link
Contributor

@dzafman dzafman commented Mar 16, 2018

No description provided.

Signed-off-by: David Zafman <dzafman@redhat.com>
@dzafman
Copy link
Contributor Author

dzafman commented Mar 19, 2018

I ran a failing test run 20 times it passed every time. Not sure how hard it was to hit this:

http://pulpito.ceph.com/dzafman-2018-03-16_20:00:06-rados-wip-zafman-testing2-distro-basic-smithi/

@dzafman
Copy link
Contributor Author

dzafman commented Mar 19, 2018

Should we run a subset of rados/thrash suite?

@dzafman dzafman requested a review from liewegas March 19, 2018 16:38
@xiexingguo
Copy link
Member

xiexingguo commented Mar 20, 2018

I ran a failing test run 20 times it passed every time. Not sure how hard it was to hit this

@dzafman It might not be as rare as it is thought to be. I've hit the same issue a couple of weeks ago and posted a cure at #20837. It is actual a replica-backfill-revoke vs primary-backfill-finish race. Well, I'd confess this is a better fix:-)

@neha-ojha
Copy link
Member

@dzafman It appeared quite frequently for me when I ran rados runs with --subset 3/500. But yes, it was difficult to reproduce in a single test.

Copy link
Member

@jdurgin jdurgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, should be on the lookout for this reoccurring in future runs. Running it through a rados subset like neha was doing seems good enough for now

@dzafman
Copy link
Contributor Author

dzafman commented Mar 21, 2018

http://qa-proxy.ceph.com/teuthology/dzafman-2018-03-21_09:57:19-rados:thrash-wip-zafman-testing2-distro-basic-smithi/

2312022 DEAD Infrastructure
2312125 FAIL scrub stat mismatch
2312148 DEAD Thrasher kept going even after no more client activity

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dzafman dzafman merged commit 79ee949 into ceph:master Mar 22, 2018
@dzafman dzafman deleted the wip-22902 branch March 22, 2018 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants