Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: posix_fallocate on ZFS returns EINVAL #20398

Merged
merged 1 commit into from Apr 15, 2018

Conversation

wjwithagen
Copy link
Contributor

@wjwithagen wjwithagen commented Feb 11, 2018

But even still it would not work on any COW FS.
So reorganised the code to have a common routine
that in the end will allocate a file on disk if needed

FileStore would not build when there was no HAVE_POSIX_FALLOCATE
other than on Apple. With ceph_posix_fallocate FileStore will also
fallback to manually allocating the required file.

Signed-off-by: Willem Jan Withagen wjw@digiware.nl

// On FreeBSD ZFS fallocate always fails since it is considered impossible to
// reserve space on a COW filesystem. It returns EINVAL
// Linux in this case already emulates it in glibc
// In which case it is allocated manually, and still that is not a real guarantee
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "it" in this line?

#include <unistd.h>
#include <errno.h>

// On FreeBSD ZFS fallocate always fails since it is considered impossible to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comma after FreeBSD.


// On FreeBSD ZFS fallocate always fails since it is considered impossible to
// reserve space on a COW filesystem. It returns EINVAL
// Linux in this case already emulates it in glibc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "it" in this line?

@@ -0,0 +1,65 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line.


#ifdef HAVE_POSIX_FALLOCATE
ret = posix_fallocate(fd, offset, len);
ret = -ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posix_fallocate returns errors in its return value. (0 is oke)
And as far as the original code goes these values were used negated.

}

int ceph_posix_fallocate(int fd, off_t offset, off_t len) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line

#if !defined(__FREEBSD__)
return ret;
#else
if ( ret != !EINVAL ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is what you intended to put.

@@ -492,6 +492,7 @@ set(libcommon_files
common/HeartbeatMap.cc
common/PluginRegistry.cc
common/ceph_fs.cc
common/ceph_posix_fallocate.c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add this implementation to compat.cc, and declare the func in compat.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be another option, but reason to take this route was that this code was sort of replicated on 2 locations. Where in the second case the APPLE version was missing.
And since it is a sort of ceph-wrapper around posix_fallocate I expect it to be more obvious it is stands out on its own.
I'm open for both, please confim if you want to persist with your suggestion.

ret = -errno;
}
#endif
if (ret < 0) {
Copy link
Contributor

@tchaikov tchaikov Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is dead code if HAVE_POSIX_FALLOCATE and not FreeBSD, also, the above __FREEBSD__ and __APPLE__ blocks are difficult to read. could you please consider restructuring them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right it is.
However trying to restructure this in the case that we do have HAVE_POSIX_FALLOCATE and __FreeBSD__ running on ZFS things start to get hairy because compile-time and run-time start to interact. But I'll do another attempt.

@wjwithagen
Copy link
Contributor Author

@tchaikov
Perhaps this is now better too your liking?

@wjwithagen
Copy link
Contributor Author

Jenkins retest please

@wjwithagen wjwithagen force-pushed the wip-posix_fallocate branch 2 times, most recently from 302e580 to 855a358 Compare February 21, 2018 23:02
@tchaikov
Copy link
Contributor

@wjwithagen it does not build.

@tchaikov tchaikov removed the needs-qa label Feb 22, 2018
@wjwithagen wjwithagen force-pushed the wip-posix_fallocate branch 2 times, most recently from bb711cf to 4e4f6f9 Compare February 22, 2018 09:04
@wjwithagen
Copy link
Contributor Author

@tchaikov
Some Linux <> FreeBSD include files issues
Looks like that is fixed now.

@wjwithagen
Copy link
Contributor Author

@tchaikov
I set the backport label, since this will have to go into Luminous as well once we have agreed upon the finals of the PR. But I have no clue if I need to do anything else.

#include "fcntl.h"

// The type-value for a ZFS FS in fstatfs.
#define FS_ZFS_TYPE 0xde
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please define this macro in compat.cc if it's not used elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjwithagen this is not addressed. please remove this line.

@@ -166,4 +170,16 @@
0; })
#endif


#ifdef __cplusplus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother adding extern "C"? ceph_posix_fallocate() is not exposed as a part of public API, and it is not exposed to any C program.

ret = -errno;
}
return ret;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to keep the fallback code handling the platform where posix_fallocate() is not available? see 75b0f7d. also, please note that FileStore does not compile if posix_fallocate() is not available or it targeting platform is not MacOS. while BlueStore has a fallback to emulate fallocate. your PR changes the behavior. please note it down in the commit message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov
Yes I noticed that FileStore did not have a fallback, but just blew up building.
So I considered this a bonus.
But I will put this in the commit message

r = ::posix_fallocate(fd, 0, size);
if (r) {
r = ::ceph_posix_fallocate(fd, 0, size);
if (r) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before your change r is the return code of posix_fallocate(), after your change, r is -(-errono). is this what you expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov
Good catch, if fumbled that one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjwithagen this is not addressed.

@wjwithagen
Copy link
Contributor Author

@tchaikov
Can you take another look?

@tchaikov tchaikov self-requested a review March 9, 2018 14:36
@wjwithagen
Copy link
Contributor Author

@tchaikov
This needs to go in before the mimic freeze if possible.
Can you take another look?

#include "fcntl.h"

// The type-value for a ZFS FS in fstatfs.
#define FS_ZFS_TYPE 0xde
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjwithagen this is not addressed. please remove this line.

r = ::posix_fallocate(fd, 0, size);
if (r) {
r = ::ceph_posix_fallocate(fd, 0, size);
if (r) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjwithagen this is not addressed.

@wjwithagen
Copy link
Contributor Author

@tchaikov
grmbl.... Sorry, I was sure that I did "fix" that.
Will fix and ping

return posix_fallocate(fd, offset, len);
}
#elif defined(__APPLE__)
int ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, move definition of ret to where it is used.

// To prevent this the written buffer needs to be loaded with random data.

int manual_fallocate(int fd, off_t offset, off_t len) {
// Try to manually allocate the buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment.


int manual_fallocate(int fd, off_t offset, off_t len) {
// Try to manually allocate the buffer
char data[1024*128];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the definition of data to where it is used. also, might want pre-populate data with something. like

// populate data with random bits.
memset(data, 0x42, sizeof(data));

otherwise the static analyzer will be complaining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov
I was also contemplation in filling it with random data.
In that sense it would really reserve space on a compressing ZFS partition. Otherwise 1024^2 baytes will be reduced to almost nothing, and it is nor really a reservation.
But I'll start with this

return errno;
for (off_t off = 0; off < len; off += sizeof(data)) {
if (off + sizeof(data) > len)
r = write(fd, data, len - off);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use safe_write() instead.


int on_zfs(int basedir_fd) {
struct statfs basefs;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, remove empty line.

}

int on_zfs(int basedir_fd) {
struct statfs basefs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to add struct in C++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang does not seem to agree:

/home/wjw/wip.patch/src/common/compat.cc:44:5: error: must use 'struct' tag to refer to type 'statfs' in this scope
    statfs basefs;


#ifdef HAVE_POSIX_FALLOCATE
if (on_zfs(fd)) {
// Preallocate the required space manually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment.

@@ -13,6 +13,7 @@
#define CEPH_COMPAT_H

#include "acconfig.h"
#include "fcntl.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this #include to the .cc file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov
It is already there, so I guess that there are reason why I put it there first place
But I'll remove, and see what happens in FreeBSD/Jenkins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov
Perhaps you have a suggestion, since this is an error on the Linux/GCC side of things.

/home/jenkins-build/build/workspace/ceph-pull-requests/src/include/compat.h:169:34: error: 'off_t' has not been declared
 int ceph_posix_fallocate(int fd, off_t offset, off_t len);
                                  ^~~~~
/home/jenkins-build/build/workspace/ceph-pull-requests/src/include/compat.h:169:48: error: 'off_t' has not been declared
 int ceph_posix_fallocate(int fd, off_t offset, off_t len);
                                                ^~~~~
src/CMakeFiles/crush_objs.dir/build.make:230: recipe for target 'src/CMakeFiles/crush_objs.dir/crush/CrushLocation.cc.o' failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please #include <sys/types.h> for off_t, see http://pubs.opengroup.org/onlinepubs/009696699/basedefs/sys/types.h.html .

@wjwithagen
Copy link
Contributor Author

@tchaikov
Other than the struct statfs declaration, all issues should be addressed.

// To prevent this the written buffer needs to be loaded with random data.

int manual_fallocate(int fd, off_t offset, off_t len) {
int r = lseek(fd, offset, SEEK_SET);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indent. please refer following settings:

// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
// vim: ts=8 sw=2 smarttab

// In which case it is allocated manually, and still that is not a real guarantee
// that a full buffer is allocated on disk, since it could be compressed.
// To prevent this the written buffer needs to be loaded with random data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please drop this empty line.

@wjwithagen wjwithagen force-pushed the wip-posix_fallocate branch 2 times, most recently from 48f77f7 to 459e519 Compare April 11, 2018 12:58
#endif
}

// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally we put this editor variable settings at the top of a source file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov
Oke, but it is sort a mess than there at the top.
Vim/Vi will also find it at the bottom. But I'll fix

@wjwithagen
Copy link
Contributor Author

'mmmm, why is the details output of the failing Jenkins run missing?
Something broke?

@tchaikov
Copy link
Contributor

retest this please.

@@ -15,6 +15,7 @@
#include "acconfig.h"

#if defined(__linux__)
#include <fnctl.h>
Copy link
Contributor

@tchaikov tchaikov Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov
missed that remark

@@ -15,6 +15,7 @@
#include "acconfig.h"

#if defined(__linux__)
#include <sys/types.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, i think we should move this out of #if defined(__linux__) as off_t is part of POSIX standard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov
Sure

But even still it would not work on any COW FS.
So reorganised the code to have a common routine
that in the end will allocate a file on disk if needed

FileStore would not build when there was no HAVE_POSIX_FALLOCATE
other than on Apple. With ceph_posix_fallocate FileStore will also
fallback to manually allocating the required file.

Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants