Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw_file: fix write error when the write offset overlaps #17809

Merged
merged 1 commit into from Sep 25, 2017
Merged

rgw_file: fix write error when the write offset overlaps #17809

merged 1 commit into from Sep 25, 2017

Conversation

yaozongyou
Copy link
Contributor

fix bug in http://tracker.ceph.com/issues/21455

Signed-off-by: Yao Zongyou yaozongyou@vip.qq.com

@yaozongyou yaozongyou changed the title rgw_file: fix write error when the write offset overlap rgw_file: fix write error when the write offset overlaps Sep 19, 2017
@yaozongyou
Copy link
Contributor Author

The nfs client writes data to system's page cache, and the page cache will be flushed to rgw_file by nfs-ganesha. The page cache is page size aligned, so the write offset may overlap in a page size.

@yaozongyou
Copy link
Contributor Author

yaozongyou commented Sep 21, 2017

@tchaikov, would you please have a look at this?

@tchaikov
Copy link
Contributor

@yaozongyou i am not familiar with rgw, thus not qualified to review your change.

@mattbenjamin mattbenjamin self-assigned this Sep 21, 2017
@mattbenjamin
Copy link
Contributor

@yaozongyou does this change attempt to address a behavior seen in Linux clients mounting -osync?

@yaozongyou
Copy link
Contributor Author

image
yes, as above image shows, even mounted using -osync, the offset may overlap between two requests.

@mattbenjamin
Copy link
Contributor

@yaozongyou do you have a simple reproducer from the client side?

@@ -1159,6 +1159,15 @@ namespace rgw {
}
}

int overlap = 0;
if ((static_cast<off_t>(off) < f->write_req->real_ofs) &&
(static_cast<long>(f->write_req->real_ofs - off) < sysconf(_SC_PAGESIZE))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, the client's pagesize (like its mount options) is not knowable, right? if we unconditionally attempt this correction, do we do just as well?

@yaozongyou
Copy link
Contributor Author

@mattbenjamin a simple reproducer from the client side:
https://gist.github.com/yaozongyou/5f5ed0d362ccef97942a09496f4e2a8c

@yaozongyou
Copy link
Contributor Author

@mattbenjamin can you reproduce it in your environment?

@mattbenjamin
Copy link
Contributor

@yaozongyou I'll attempt it--if my results are consistent, this should be ok (with the possible caveat about pagesize flagged above)

@mattbenjamin
Copy link
Contributor

@yaozongyou this doesn't error in my environment :(

@mattbenjamin
Copy link
Contributor

@yaozongyou so, we've discussed this here a bit; it's known that there may be kernels/clients that don't conform to our expectation that WRITEs will be strictly sequential even when a) the mount is -osync and b) the application i/o is strictly sequential--but the kernels we've explicitly tested do so. I'd be interested to know which distro and kernel version you're doing this, on what architecture. My inclination is not to apply this or previous suggestions like this, but rather to continue with our work in progress which will allow async and random i/o. Please note that iiuc, your posix write calls are not guaranteed to write all bytes in one invocation--you need to be parepared for short writes. In any case, I will expedite getting that finished up, becausel clearly it is an issue for users.

Matt

@yaozongyou
Copy link
Contributor Author

@mattbenjamin We are using CentOS Linux release 7.3.1611, the kernel version is 3.10.0-514.el7.x86_64.
We have a script using the system command ftp downloading files from remote ftp server directly into local mounted rgw buckets. The ftp command gives error. After stracing, we make a simple reproducer as above.
We mount using mount -t nfs -o sync 127.0.0.1:/ /mnt and mount -t nfs -osync,vers=4.1 127.0.0.1:/ /mnt. Both of them didn't work.
In /etc/ganesha/ganesha.conf , the Path is the bucket name. the command showmount -e shows as follows:

Export list for localhost:
testbucket (everyone)

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the evidence presented, I want to merge this change, but I am concerned about the use of sysconf(_pagesize). It seems that this conditional clause an just be removed without risk?

Signed-off-by: Yao Zongyou <yaozongyou@vip.qq.com>
@yaozongyou
Copy link
Contributor Author

yaozongyou commented Sep 24, 2017

@mattbenjamin I have made some changes about the problem you mentioned. And I have tested in our environment, it works very well.

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per discussion, this change has been motivated and appears necessary given its ability to give a correct result for these observed cases, until full support for write at any offset and order is merged

I have hand-verified this change

@mattbenjamin mattbenjamin merged commit 37c8748 into ceph:master Sep 25, 2017
@yaozongyou yaozongyou deleted the fix-rgw-file-write-error branch September 26, 2017 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants