New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/chain_xattr: cope with xfs limit of 254 bytes for inlined xattrs. #4070
Conversation
3100a37
to
64848e1
Compare
@mslovy how does this look? (untested, and really needs some tests in test/objectstore/chain_xattr.cc, if you have the time!) |
FAIL: the output of run-make-check.sh on centos-7 for 0839435 is http://paste2.org/pysp2VEE |
64848e1
to
75dc009
Compare
FAIL: the output of run-make-check.sh on centos-7 for f7d6fb4 is http://paste2.org/4hFvgYgm |
75dc009
to
78e75f4
Compare
added tests, passes make check |
FAIL: the output of run-make-check.sh on centos-7 for 9904310 is http://paste2.org/H0Jk4Gy9 |
78e75f4
to
7dc5cf4
Compare
SUCCESS: the output of run-make-check.sh on centos-7 for d4acd74 is http://paste2.org/MLWfEvGx |
@liewegas reasonable, I agree that it becomes awkward if it breaks into too many 250bytes attrs. Xfs does linear search if the number of xattr items is small. But too many xattr items will result in a BTree Extent Struct (which is also not inline) in order to improve the search speed. I remember the critical point it changes to a BTree Format is 10, but not sure now. So We may set CHAIN_XATTR_SHORT_LEN_THRESHOLD this value smaller (say, 1000 or 750)? |
I wonder whether 254 bytes or 10 items spill out value are configurated or hard-coded? It looks a little inflexible to let hardcoded a spill out value in ceph? maybe we can detect it via xfs command? |
7dc5cf4
to
a480ea5
Compare
changed it to 1000 (4x 250byte stripes) and repushed. i don't think it's a good idea to make this tunable, since the read path explicitly looks for multiples of the stripe length and we will fail to read old ones if the values change. it is annoying that this is working around an XFS thing, though. :/ but, that's what almost everyone is using with FileStore.. |
SUCCESS: the output of run-make-check.sh on centos-7 for 1465a67 is http://paste2.org/gIfBbb5D |
Oh, I think my meaning is whether 254 bytes and 10 items can be configurated in xfs side. So if we can improve 254 to 512 bytes, it should be enough to store ceph internal xattr in one call. |
oh. in xfs the limit is fixed because of the on-disk format. they are |
|
||
r = sys_getxattr(fn, raw_name, (char *)val + pos, chunk_size); | ||
if (i && r == -ENODATA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENODATA may be ENOATTR better
BSD, Linux, DARWIN all with return ENOATTR and ENOATTR is defined to be ENODATA in Linux
@athanatos I think we should run this through qa? |
please ignore the last bot failure (network problem) |
error: os/chain_xattr.cc:141:20: 'ENOATTR' was not declared in this scope |
@liewegas It seemed that we need to wrap ENOATTR or ENODATA at lower level:
Linux: ENOATTR is defined to be a synonym for ENODATA in <attr/xattr.h> Since only linux doesn't define ENOATTR and <attr/xattr.h> is included in attr package, maybe we define ENOATTR to ENODATA in common/xattr.c for linux? |
Can we just s/ENODATA/ENOATTR/ everywhere then?
2) return ENODATA, while ENOATTR is not defined: Linux
Linux does #define ENOATTR ENODATA, right?
|
Oh, i see. Yeah, let's do #ifndef ENOATTR define ENOATTR ENODATA#endif in our errno.h? |
Yeah, it looks ok. |
If we wrote an xattr that was a multiple of a chunk, we will try to read the next chunk and get ENODATA. If that happens bail out of the loop and assume we've read the whole thing. Backport: hammer, firefly Signed-off-by: Sage Weil <sage@redhat.com>
XFS has a hard limit of 255 (or 254?) bytes for xattrs to be inlined in the inode due to a single byte for the length in the on-disk format. If we have an xattr that is a bit bigger than that it will get kicked out into a separate extent, but if we stripe it across a few shorter xattrs it can be inlined when the xfs inode is e.g. 2K. Adjust the chain_xattr logic to capture this case. Note that we are doing this unconditionally regardless of fs type, but that is probably okay given that most users use XFS and the cost isn't huge. Reported-by: Ning Yao <yaoning@ruijie.com.cn> Signed-off-by: Sage Weil <sage@redhat.com>
ENOATTR == ENODATA on Linux; other systems return ENOATTR. Signed-off-by: Sage Weil <sage@redhat.com>
...if it isn't already defined. Haomai reported seeing this somewhere. Signed-off-by: Sage Weil <sage@redhat.com>
added the define and rebased |
os/chain_xattr: cope with xfs limit of 254 bytes for inlined xattrs. Reviewed-by: Haomai Wang <haomaiwang@gmail.com> Reviewed-by: Ning Yao <zay11022@gmail.com>
No description provided.