New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: make seeky readdir more efficiency #14317

Merged
merged 1 commit into from Apr 24, 2017

Conversation

Projects
None yet
3 participants
@ukernel
Member

ukernel commented Apr 4, 2017

Current cephfs client uses string to indicate tart position of
readdir. The string is last entry of previous readdir reply.
This approach does not work for seeky readdir because we can
not easily convert the new postion to a string. For seeky readdir,
mds needs to return dentries from the beginning. Client keeps
retrying if the reply does not contain the dentry it wants.

In current version of ceph, mds sorts CDentry in its cache in
hash order. Client also uses dentry hash to compose dir postion.
For seeky readdir, if client passes the hash part of dir postion
to mds. mds can avoid replying useless dentries.

Fixes: http://tracker.ceph.com/issues/19306
Signed-off-by: "Yan, Zheng" zyan@redhat.com

@ukernel ukernel requested review from jcsp and gregsfortytwo Apr 4, 2017

@markhpc markhpc added the performance label Apr 5, 2017

@@ -422,6 +423,7 @@ union ceph_mds_request_args_legacy {
__le32 max_entries; /* how many dentries to grab */
__le32 max_bytes;
__le16 flags;
__le32 offset_hash;

This comment has been minimized.

@jcsp

jcsp Apr 15, 2017

Contributor

Is the idea that this is safe to add because other items in the union already had a greater size, and request structures are written with zeros in old clients, so we will always get a zero here from older clients?

This comment has been minimized.

@ukernel
last_hash = ceph_frag_value(diri->hash_dentry_name(readdir_start));
} else if (flags & CEPH_READDIR_OFFSET_HASH) {
/* mds understands offset_hash */
assert(readdir_offset == 2);

This comment has been minimized.

@jcsp

jcsp Apr 15, 2017

Contributor

Am I right in thinking that this assertion is saying "if readdir_start is empty then readdir_offset should be 2"?

If so (and it has nothing to do with the new OFFSET_HASH flag), then perhaps move the assertion up to after we assign readdir start, and write it as assert(!readdir_start.empty() || readdir_offset == 2) so that it's clearer to read

@@ -7362,6 +7372,8 @@ int Client::_readdir_get_frag(dir_result_t *dirp)
req->head.args.readdir.flags = CEPH_READDIR_REPLY_BITFLAGS;
if (dirp->last_name.length()) {
req->path2.set_path(dirp->last_name.c_str());
} else if (dirp->hash_order()) {
req->head.args.readdir.offset_hash = dirp->offset_high();

This comment has been minimized.

@jcsp

jcsp Apr 15, 2017

Contributor

Is offset_hash getting converted to network byte order somewhere? I guess it should be the same mechanism as the rest of the request header, but I can't see where it's happening

This comment has been minimized.

@ukernel

ukernel Apr 18, 2017

Member

__le32 is short for struct ceph_le32. ceph_le32's constructor does the job

@jcsp

This comment has been minimized.

Contributor

jcsp commented Apr 15, 2017

Since the client only sees the CEPH_READDIR_OFFSET_HASH flag after at least on readdir request, is the idea that now we will do two readdirs (instead of N) for each of these seeks? Would we be able to do it in only one readdir request if we communicated the support for OFFSET_HASH in some other way?

@jcsp

This comment has been minimized.

Contributor

jcsp commented Apr 15, 2017

client: make seeky readdir more efficiency
Current cephfs client uses string to indicate start position of
readdir. The string is last entry of previous readdir reply.
This approach does not work for seeky readdir because we can
not easily convert the new postion to a string. For seeky readdir,
mds needs to return dentries from the beginning. Client keeps
retrying if the reply does not contain the dentry it wants.

In current version of ceph, mds sorts CDentry in its cache in
hash order. Client also uses dentry hash to compose dir postion.
For seeky readdir, if client passes the hash part of dir postion
to mds. mds can avoid replying useless dentries.

Fixes: http://tracker.ceph.com/issues/19306
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
@ukernel

This comment has been minimized.

Member

ukernel commented Apr 18, 2017

We know if HASH_OFFSET is supported from the seekdir position (the position is from previous readdir). No extra readdir request is required

@jcsp

jcsp approved these changes Apr 18, 2017

@jcsp

This comment has been minimized.

Contributor

jcsp commented Apr 18, 2017

I saw some dirfrag.sh failures on latest multimds runs that I want to make sure aren't related to this patch.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Apr 19, 2017

I tried to reproduce those failures on master and couldn't, so I think they could be from the kernel side of this change as they're happening on the kclient runs only.

Here are the failures:

/a/jspray-2017-04-16_16:07:43-multimds-wip-jcsp-testing-20170415b-multimds-testing-basic-smithi ['1032960', '1032880', '1032740', '1032660']

/a/jspray-2017-04-16_22:05:37-multimds-wip-jcsp-testing-20170415b-testing-basic-smithi ['1033920', '1033780', '1034000', '1033700']

@ukernel

This comment has been minimized.

Member

ukernel commented Apr 19, 2017

url of the multimds run?

@jcsp

This comment has been minimized.

@ukernel

This comment has been minimized.

Member

ukernel commented Apr 20, 2017

The failures were caused by slow gathering process of scatter lock. should be fixed by 3b3d653

@jcsp jcsp merged commit ca0b087 into ceph:master Apr 24, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@ukernel ukernel deleted the ukernel:wip-19306 branch May 11, 2017

ukernel added a commit to ukernel/ceph that referenced this pull request Sep 12, 2017

doc/cephfs/posix: remove stale information for seekdir
Current cephfs can support seekdir efficiently. The diverge was
fixed by ceph#14317

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment