Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools/cephfs: add scan_links command which fixes linkages errors #11446

Merged
merged 1 commit into from Oct 28, 2016

Conversation

ukernel
Copy link
Contributor

@ukernel ukernel commented Oct 12, 2016

Signed-off-by: Yan, Zheng zyan@redhat.com

@ukernel ukernel added bug-fix cephfs Ceph File System labels Oct 12, 2016
@ukernel ukernel changed the title [DNM] tools/cephfs: add scan_links to fix linkages errors tools/cephfs: add scan_links command which fixes linkages errors Oct 13, 2016
@ukernel ukernel force-pushed the wip-cephfs-scan-links branch 2 times, most recently from 49bffd6 to 186c772 Compare October 13, 2016 13:52
Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so at a high level this:

  1. iterates through all metadata objects and creates an in-memory map of ino numbers in use.
  2. iterates through all metadata objects a second time and compares them to the in-memory map.
  3. Keeps enough data around to delete "bad" copies (either from un-reintegrated stray entries or incompletely-split/merged dirfrags) and write the inode into the "good" location.

But if we think we can hold it all in-memory, why do we need two disk passes?
Is there a situation in which we expect to need this tool? I know you wrote it in response to a user getting a bunch of un-reintegrated strays, but in that case it was the result of specific behavior patterns and it seems like you'd want to fix it more precisely. (eg, being able to scan a single directory and compare against the stray dir.)

I just don't quite get the usage model where this makes sense.

@@ -34,6 +34,7 @@ void DataScan::usage()
<< " cephfs-data-scan scan_extents [--force-pool] <data pool name>\n"
<< " cephfs-data-scan scan_inodes [--force-pool] [--force-corrupt] <data pool name>\n"
<< "\n"
<< " cephfs-data-scan scan_links\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be prior the extra new-line.

}

if (!valid_ino(dir_ino)) {
dout(10) << "Not a difrag (invalid ino): '" << oid << "'" << dendl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/difrag/dirfrag

@ukernel
Copy link
Contributor Author

ukernel commented Oct 18, 2016

Two passes design is for reduce memory usage. Scan metadata pool is fast in most case.

This tool is for repair linkage errors, such as incorrect inode link count, duplicated primary dentries. (Similar to pass 3/4 of fsck.ext4 -fv /dev/xxx) The online scrub can't fix this type of error easily.

@ukernel
Copy link
Contributor Author

ukernel commented Oct 18, 2016

I don't know what 'scan a single directory and compare against the stray dir' can do? find bad remote link,force stray reintegration? both can be done easily by scrub

@gregsfortytwo
Copy link
Member

You were suggesting this on the mailing list as a solution for reintegrating strays into their new locations, and I'm not sure this is a very efficient way of doing that.

The other cases aren't very likely but are definitely worth being able to repair. Please write some tests demonstrating this works.

@gregsfortytwo
Copy link
Member

gregsfortytwo commented Oct 19, 2016

Also I'm really not thrilled about the in-memory map of all inodes; that's going to put some serious limits on scalability. But it will do for getting on with (although we should also add a tracker ticket about it).

@gregsfortytwo gregsfortytwo removed their assignment Oct 19, 2016
@ukernel
Copy link
Contributor Author

ukernel commented Oct 19, 2016

On Wed, Oct 19, 2016 at 1:05 PM, Gregory Farnum notifications@github.com
wrote:

You were suggesting this on the mailing list as a solution for
reintegrating strays into their new locations, and I'm not sure this is a
very efficient way of doing that.

No, I didn't suggest the user to use this tool for reintegrating strays.
(The tool does not have this function either). The user's cephfs was
corrupted (there are duplicated primary dentries). I write this tool for
repairing the corruption

The other cases aren't very likely but are definitely worth being able to
repair. Please write some tests demonstrating this works.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#11446 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACpZr9GqNIR1j6fpPTy_C1zMGEGNi1rmks5q1aUsgaJpZM4KUxmx
.

@gregsfortytwo gregsfortytwo removed their assignment Oct 21, 2016
@gregsfortytwo
Copy link
Member

@ukernel, I will merge this when you add tests. I don't see any. :)

@ukernel
Copy link
Contributor Author

ukernel commented Oct 24, 2016

ceph/ceph-qa-suite#1219

Signed-off-by: Yan, Zheng <zyan@redhat.com>
@jcsp jcsp assigned ukernel and unassigned gregsfortytwo Oct 25, 2016
@ukernel
Copy link
Contributor Author

ukernel commented Oct 25, 2016

Test case fixed (only tested by vstart runner)

@jcsp jcsp merged commit 94f08a2 into ceph:master Oct 28, 2016
@ukernel ukernel deleted the wip-cephfs-scan-links branch January 12, 2017 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants