New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common: posix_fallocate on ZFS returns EINVAL #20398
Conversation
3d59ad1
to
fd3a22c
Compare
src/common/ceph_posix_fallocate.c
Outdated
// On FreeBSD ZFS fallocate always fails since it is considered impossible to | ||
// reserve space on a COW filesystem. It returns EINVAL | ||
// Linux in this case already emulates it in glibc | ||
// In which case it is allocated manually, and still that is not a real guarantee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "it" in this line?
src/common/ceph_posix_fallocate.c
Outdated
#include <unistd.h> | ||
#include <errno.h> | ||
|
||
// On FreeBSD ZFS fallocate always fails since it is considered impossible to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comma after FreeBSD.
src/common/ceph_posix_fallocate.c
Outdated
|
||
// On FreeBSD ZFS fallocate always fails since it is considered impossible to | ||
// reserve space on a COW filesystem. It returns EINVAL | ||
// Linux in this case already emulates it in glibc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "it" in this line?
src/common/ceph_posix_fallocate.c
Outdated
@@ -0,0 +1,65 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty line.
src/common/ceph_posix_fallocate.c
Outdated
|
||
#ifdef HAVE_POSIX_FALLOCATE | ||
ret = posix_fallocate(fd, offset, len); | ||
ret = -ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posix_fallocate returns errors in its return value. (0 is oke)
And as far as the original code goes these values were used negated.
src/common/ceph_posix_fallocate.c
Outdated
} | ||
|
||
int ceph_posix_fallocate(int fd, off_t offset, off_t len) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty line
src/common/ceph_posix_fallocate.c
Outdated
#if !defined(__FREEBSD__) | ||
return ret; | ||
#else | ||
if ( ret != !EINVAL ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this is what you intended to put.
src/CMakeLists.txt
Outdated
@@ -492,6 +492,7 @@ set(libcommon_files | |||
common/HeartbeatMap.cc | |||
common/PluginRegistry.cc | |||
common/ceph_fs.cc | |||
common/ceph_posix_fallocate.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add this implementation to compat.cc
, and declare the func in compat.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be another option, but reason to take this route was that this code was sort of replicated on 2 locations. Where in the second case the APPLE
version was missing.
And since it is a sort of ceph-wrapper around posix_fallocate I expect it to be more obvious it is stands out on its own.
I'm open for both, please confim if you want to persist with your suggestion.
src/common/ceph_posix_fallocate.c
Outdated
ret = -errno; | ||
} | ||
#endif | ||
if (ret < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is dead code if HAVE_POSIX_FALLOCATE and not FreeBSD, also, the above __FREEBSD__
and __APPLE__
blocks are difficult to read. could you please consider restructuring them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right it is.
However trying to restructure this in the case that we do have HAVE_POSIX_FALLOCATE
and __FreeBSD__
running on ZFS things start to get hairy because compile-time and run-time start to interact. But I'll do another attempt.
fd3a22c
to
aca1799
Compare
c01672a
to
b38b65c
Compare
@tchaikov |
Jenkins retest please |
302e580
to
855a358
Compare
@wjwithagen it does not build. |
bb711cf
to
4e4f6f9
Compare
@tchaikov |
@tchaikov |
src/include/compat.h
Outdated
#include "fcntl.h" | ||
|
||
// The type-value for a ZFS FS in fstatfs. | ||
#define FS_ZFS_TYPE 0xde |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please define this macro in compat.cc if it's not used elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjwithagen this is not addressed. please remove this line.
src/include/compat.h
Outdated
@@ -166,4 +170,16 @@ | |||
0; }) | |||
#endif | |||
|
|||
|
|||
#ifdef __cplusplus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bother adding extern "C"
? ceph_posix_fallocate()
is not exposed as a part of public API, and it is not exposed to any C program.
src/common/compat.cc
Outdated
ret = -errno; | ||
} | ||
return ret; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to keep the fallback code handling the platform where posix_fallocate() is not available? see 75b0f7d. also, please note that FileStore does not compile if posix_fallocate() is not available or it targeting platform is not MacOS. while BlueStore has a fallback to emulate fallocate. your PR changes the behavior. please note it down in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov
Yes I noticed that FileStore did not have a fallback, but just blew up building.
So I considered this a bonus.
But I will put this in the commit message
src/os/bluestore/BlueStore.cc
Outdated
r = ::posix_fallocate(fd, 0, size); | ||
if (r) { | ||
r = ::ceph_posix_fallocate(fd, 0, size); | ||
if (r) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before your change r
is the return code of posix_fallocate()
, after your change, r
is -(-errono)
. is this what you expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov
Good catch, if fumbled that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjwithagen this is not addressed.
4e4f6f9
to
22cd224
Compare
@tchaikov |
@tchaikov |
src/include/compat.h
Outdated
#include "fcntl.h" | ||
|
||
// The type-value for a ZFS FS in fstatfs. | ||
#define FS_ZFS_TYPE 0xde |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjwithagen this is not addressed. please remove this line.
src/os/bluestore/BlueStore.cc
Outdated
r = ::posix_fallocate(fd, 0, size); | ||
if (r) { | ||
r = ::ceph_posix_fallocate(fd, 0, size); | ||
if (r) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjwithagen this is not addressed.
@tchaikov |
2ef513f
to
11f6df0
Compare
src/common/compat.cc
Outdated
return posix_fallocate(fd, offset, len); | ||
} | ||
#elif defined(__APPLE__) | ||
int ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, move definition of ret
to where it is used.
src/common/compat.cc
Outdated
// To prevent this the written buffer needs to be loaded with random data. | ||
|
||
int manual_fallocate(int fd, off_t offset, off_t len) { | ||
// Try to manually allocate the buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment.
src/common/compat.cc
Outdated
|
||
int manual_fallocate(int fd, off_t offset, off_t len) { | ||
// Try to manually allocate the buffer | ||
char data[1024*128]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the definition of data
to where it is used. also, might want pre-populate data with something. like
// populate data with random bits.
memset(data, 0x42, sizeof(data));
otherwise the static analyzer will be complaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov
I was also contemplation in filling it with random data.
In that sense it would really reserve space on a compressing ZFS partition. Otherwise 1024^2 baytes will be reduced to almost nothing, and it is nor really a reservation.
But I'll start with this
src/common/compat.cc
Outdated
return errno; | ||
for (off_t off = 0; off < len; off += sizeof(data)) { | ||
if (off + sizeof(data) > len) | ||
r = write(fd, data, len - off); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use safe_write()
instead.
src/common/compat.cc
Outdated
|
||
int on_zfs(int basedir_fd) { | ||
struct statfs basefs; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, remove empty line.
src/common/compat.cc
Outdated
} | ||
|
||
int on_zfs(int basedir_fd) { | ||
struct statfs basefs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add struct
in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang does not seem to agree:
/home/wjw/wip.patch/src/common/compat.cc:44:5: error: must use 'struct' tag to refer to type 'statfs' in this scope
statfs basefs;
src/common/compat.cc
Outdated
|
||
#ifdef HAVE_POSIX_FALLOCATE | ||
if (on_zfs(fd)) { | ||
// Preallocate the required space manually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment.
src/include/compat.h
Outdated
@@ -13,6 +13,7 @@ | |||
#define CEPH_COMPAT_H | |||
|
|||
#include "acconfig.h" | |||
#include "fcntl.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move this #include
to the .cc file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov
It is already there, so I guess that there are reason why I put it there first place
But I'll remove, and see what happens in FreeBSD/Jenkins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov
Perhaps you have a suggestion, since this is an error on the Linux/GCC side of things.
/home/jenkins-build/build/workspace/ceph-pull-requests/src/include/compat.h:169:34: error: 'off_t' has not been declared
int ceph_posix_fallocate(int fd, off_t offset, off_t len);
^~~~~
/home/jenkins-build/build/workspace/ceph-pull-requests/src/include/compat.h:169:48: error: 'off_t' has not been declared
int ceph_posix_fallocate(int fd, off_t offset, off_t len);
^~~~~
src/CMakeFiles/crush_objs.dir/build.make:230: recipe for target 'src/CMakeFiles/crush_objs.dir/crush/CrushLocation.cc.o' failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please #include <sys/types.h>
for off_t
, see http://pubs.opengroup.org/onlinepubs/009696699/basedefs/sys/types.h.html .
@tchaikov |
11f6df0
to
1b026bb
Compare
src/common/compat.cc
Outdated
// To prevent this the written buffer needs to be loaded with random data. | ||
|
||
int manual_fallocate(int fd, off_t offset, off_t len) { | ||
int r = lseek(fd, offset, SEEK_SET); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong indent. please refer following settings:
// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
// vim: ts=8 sw=2 smarttab
src/common/compat.cc
Outdated
// In which case it is allocated manually, and still that is not a real guarantee | ||
// that a full buffer is allocated on disk, since it could be compressed. | ||
// To prevent this the written buffer needs to be loaded with random data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please drop this empty line.
48f77f7
to
459e519
Compare
src/common/compat.cc
Outdated
#endif | ||
} | ||
|
||
// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally we put this editor variable settings at the top of a source file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov
Oke, but it is sort a mess than there at the top.
Vim/Vi will also find it at the bottom. But I'll fix
'mmmm, why is the details output of the failing Jenkins run missing? |
459e519
to
8436735
Compare
retest this please. |
src/include/compat.h
Outdated
@@ -15,6 +15,7 @@ | |||
#include "acconfig.h" | |||
|
|||
#if defined(__linux__) | |||
#include <fnctl.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov
missed that remark
8436735
to
5354d54
Compare
src/include/compat.h
Outdated
@@ -15,6 +15,7 @@ | |||
#include "acconfig.h" | |||
|
|||
#if defined(__linux__) | |||
#include <sys/types.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, i think we should move this out of #if defined(__linux__)
as off_t
is part of POSIX standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov
Sure
But even still it would not work on any COW FS. So reorganised the code to have a common routine that in the end will allocate a file on disk if needed FileStore would not build when there was no HAVE_POSIX_FALLOCATE other than on Apple. With ceph_posix_fallocate FileStore will also fallback to manually allocating the required file. Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
5354d54
to
1230e69
Compare
But even still it would not work on any COW FS.
So reorganised the code to have a common routine
that in the end will allocate a file on disk if needed
FileStore would not build when there was no HAVE_POSIX_FALLOCATE
other than on Apple. With ceph_posix_fallocate FileStore will also
fallback to manually allocating the required file.
Signed-off-by: Willem Jan Withagen wjw@digiware.nl