Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: extend the CephFS troubleshooting guide #10458

Merged
merged 2 commits into from Aug 2, 2016

Conversation

gregsfortytwo
Copy link
Member

Add some troubleshooting steps based off of what we've seen in the teuthology lab.

@gregsfortytwo
Copy link
Member Author

I'd like some comments from @dmick and @zmc; this is based off situations they've run into.

If you are experiencing apparent hung operations, the first task is to identify
where the problem is occurring: in the client, the MDS, or the network connecting
them. Start by looking to see if either side has stuck operations, and narrow it
down from there.
Copy link
Member

@dmick dmick Jul 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does one look for stuck operations? (mostly rhetorical; I'm sure I could puzzle it out, but if a parenthetical makes the difference, adding one or two would be great)

Edit: or maybe this was intended as a header for the following sections. Reading on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - any pointer for how to diagnose this would be helpful, even if it's a link to another doc.

@jcsp
Copy link
Contributor

jcsp commented Jul 29, 2016

Looks good to me, any comments from @dmick @zmc ?

@jcsp
Copy link
Contributor

jcsp commented Jul 29, 2016

(Sorry, I commented on an outdated load of the page before dan commented)

@jcsp
Copy link
Contributor

jcsp commented Jul 29, 2016

Needs rebase after conflict with #10374

If an operation is hung inside the MDS, it will eventually show up in "ceph health",
identifying "slow requests are blocked". It may also identify clients as
"failing to respond" or misbehaving in other ways. If the MDS identifies
specific clients as misbehaving, you should investigate why they are doing so.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How might one investigate that?

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@gregsfortytwo
Copy link
Member Author

Rebased on top of the conflict and added some inter-doc references and more data to address questions. I think this is probably good now!

@zmc
Copy link
Member

zmc commented Jul 29, 2016

Awesome, thanks! One suggestion: using rst refs would be nice as it would result in links between sections when the doc is rendered.

@gregsfortytwo
Copy link
Member Author

Uh, that's probably true. Are there some examples I can look at @zmc? All I'm aware of are the links to a page that we have on index files.

@zmc
Copy link
Member

zmc commented Jul 29, 2016

One example is: http://docs.ceph.com/teuthology/docs/README.html#overview - the source for that is here: https://raw.githubusercontent.com/ceph/teuthology/master/docs/README.rst

Edit: referring to the "Introduction for Testers" link

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@gregsfortytwo
Copy link
Member Author

Build failure appears to be in test code (or a Ceph bug), not my docs. Presumably pulling it into a working base will succeed properly.

@zmc
Copy link
Member

zmc commented Aug 1, 2016

👍 thanks @gregsfortytwo!

@jcsp jcsp merged commit 48cd11f into ceph:master Aug 2, 2016
@gregsfortytwo gregsfortytwo deleted the wip-doc-troubleshooting branch October 4, 2016 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants