New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: extend the CephFS troubleshooting guide #10458
Conversation
If you are experiencing apparent hung operations, the first task is to identify | ||
where the problem is occurring: in the client, the MDS, or the network connecting | ||
them. Start by looking to see if either side has stuck operations, and narrow it | ||
down from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does one look for stuck operations? (mostly rhetorical; I'm sure I could puzzle it out, but if a parenthetical makes the difference, adding one or two would be great)
Edit: or maybe this was intended as a header for the following sections. Reading on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - any pointer for how to diagnose this would be helpful, even if it's a link to another doc.
(Sorry, I commented on an outdated load of the page before dan commented) |
Needs rebase after conflict with #10374 |
If an operation is hung inside the MDS, it will eventually show up in "ceph health", | ||
identifying "slow requests are blocked". It may also identify clients as | ||
"failing to respond" or misbehaving in other ways. If the MDS identifies | ||
specific clients as misbehaving, you should investigate why they are doing so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How might one investigate that?
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
c6d1874
to
1c5778e
Compare
Rebased on top of the conflict and added some inter-doc references and more data to address questions. I think this is probably good now! |
Awesome, thanks! One suggestion: using rst refs would be nice as it would result in links between sections when the doc is rendered. |
Uh, that's probably true. Are there some examples I can look at @zmc? All I'm aware of are the links to a page that we have on index files. |
One example is: http://docs.ceph.com/teuthology/docs/README.html#overview - the source for that is here: https://raw.githubusercontent.com/ceph/teuthology/master/docs/README.rst Edit: referring to the "Introduction for Testers" link |
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Build failure appears to be in test code (or a Ceph bug), not my docs. Presumably pulling it into a working base will succeed properly. |
👍 thanks @gregsfortytwo! |
Add some troubleshooting steps based off of what we've seen in the teuthology lab.