New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw_file: fix write error when the write offset overlaps #17809
Conversation
The nfs client writes data to system's page cache, and the page cache will be flushed to rgw_file by nfs-ganesha. The page cache is page size aligned, so the write offset may overlap in a page size. |
@tchaikov, would you please have a look at this? |
@yaozongyou i am not familiar with rgw, thus not qualified to review your change. |
@yaozongyou does this change attempt to address a behavior seen in Linux clients mounting -osync? |
@yaozongyou do you have a simple reproducer from the client side? |
src/rgw/rgw_file.cc
Outdated
@@ -1159,6 +1159,15 @@ namespace rgw { | |||
} | |||
} | |||
|
|||
int overlap = 0; | |||
if ((static_cast<off_t>(off) < f->write_req->real_ofs) && | |||
(static_cast<long>(f->write_req->real_ofs - off) < sysconf(_SC_PAGESIZE))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, the client's pagesize (like its mount options) is not knowable, right? if we unconditionally attempt this correction, do we do just as well?
@mattbenjamin a simple reproducer from the client side: |
@mattbenjamin can you reproduce it in your environment? |
@yaozongyou I'll attempt it--if my results are consistent, this should be ok (with the possible caveat about pagesize flagged above) |
@yaozongyou this doesn't error in my environment :( |
@yaozongyou so, we've discussed this here a bit; it's known that there may be kernels/clients that don't conform to our expectation that WRITEs will be strictly sequential even when a) the mount is -osync and b) the application i/o is strictly sequential--but the kernels we've explicitly tested do so. I'd be interested to know which distro and kernel version you're doing this, on what architecture. My inclination is not to apply this or previous suggestions like this, but rather to continue with our work in progress which will allow async and random i/o. Please note that iiuc, your posix write calls are not guaranteed to write all bytes in one invocation--you need to be parepared for short writes. In any case, I will expedite getting that finished up, becausel clearly it is an issue for users. Matt |
@mattbenjamin We are using CentOS Linux release 7.3.1611, the kernel version is 3.10.0-514.el7.x86_64. Export list for localhost: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the evidence presented, I want to merge this change, but I am concerned about the use of sysconf(_pagesize). It seems that this conditional clause an just be removed without risk?
Signed-off-by: Yao Zongyou <yaozongyou@vip.qq.com>
@mattbenjamin I have made some changes about the problem you mentioned. And I have tested in our environment, it works very well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per discussion, this change has been motivated and appears necessary given its ability to give a correct result for these observed cases, until full support for write at any offset and order is merged
I have hand-verified this change
fix bug in http://tracker.ceph.com/issues/21455
Signed-off-by: Yao Zongyou yaozongyou@vip.qq.com