-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android: SIGBUS due to unaligned access #356
Comments
Since |
@omor1 With work (e.g. you may need to use a define to quiet down some of the false positives due to https://www.spinics.net/lists/fio/msg05514.html ) you might be able to get fio to compile under clang's undefined behaviour sanitizer. That might in turn may give some clues... |
@sitsofe I'll take a look at trying that, maybe it will help figure out what exactly is going on. In any case, I also noticed that while |
I think |
I managed to get the undefined sanitizer on x86_64 Linux going with something like
Running this job
complained about unaligned access in |
The structs that go over the wire for client/server runs are packed. But while they are packed, the padding should ensure that the alignment of members is done appropriately. So I don't immediately see why this is happening for you. We have a few checks in libfio.c that check the alignment at compile time to avoid misaligned members. Usual requirements are aligning to the same size as the type itself. Eg a 64-bit type should be aligned on an 8 byte boundary. That should work, as long as the allocate structure is size-of-pointer aligned as well. Might be worth looking into. x86 doesn't care about alignment, but generally it's best to avoid misaligned data, as it is slower. |
I've just noticed this: https://android.googlesource.com/platform/external/fio/+/365a153cfe8a3601af5d7c6c87679c20e84314e5%5E%21/#F0 . Doesn't seem like a permanent solution and only mentions GCC... |
we're not seeing this because all our current hardware is 64-bit, but if i force fio to build as a 32-bit binary i can reproduce this: pid: 17607, tid: 17607, name: fio >>> fio <<< Stack Trace: the specific instruction that's failing is this: 1a896: f960 079f vld1.32 {d16}, [r0 :64] where the :64 is a promise that the address is 64-bit aligned (which it obviously isn't from the register dump). and indeed if you add the missing compile-time asserts, percentile_precision is insufficiently aligned in thread_options (thread_options_pack is probably fine because since it's packed the compiler should know it can't optimize accesses). |
OK I've been looking at the warnings generated by the undefined sanitizer and have come up with this: sitsofe@62d673e . It's a bit raw but I'd be curious to see if it changes the problem at all. The weird thing is the |
I think the problems stems from this pattern: #include <stdio.h>
#include <stdint.h>
struct inner {
uint64_t z;
};
struct middle {
uint64_t y;
struct inner i;
char a;
} __attribute__((packed));
struct outer {
char x;
struct middle m;
};
static void store_inner(volatile struct inner *i) {
i->z = 1;
}
int main(void) {
struct outer o;
store_inner(&o.m.i);
printf("%lu\n", o.m.i.z);
return 0;
} In the above, the packed attribute gets applied to each of |
Fix misaligned access related to struct thread_stat and struct jobs_eta seen when running a build produced by CC=~/clang-3.9/build/bin/clang ./configure --disable-optimizations \ --extra-cflags="-D__compiler_offsetof=__builtin_offsetof \ -fsanitize=undefined" and add to the compile time asserts to make these problem more visible. This should fix axboe#356 . Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Fix misaligned access related to struct thread_stat and struct jobs_eta seen when running a build produced by CC=~/clang-3.9/build/bin/clang ./configure --disable-optimizations \ --extra-cflags="-D__compiler_offsetof=__builtin_offsetof \ -fsanitize=undefined" and add to the compile time asserts to make these problem more visible. This should fix axboe#356 . Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Fix unaligned/misaligned accesses related to struct thread_stat and struct jobs_eta seen when running a build produced by CC=~/clang-3.9/build/bin/clang ./configure --disable-optimizations \ --extra-cflags="-D__compiler_offsetof=__builtin_offsetof \ -fsanitize=undefined" and add to the compile time asserts to make these problems more visible. This should fix axboe#356 . Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
@omor1 @enh I've made https://github.com/sitsofe/fio/tree/alignment to try and address some of the unaligned access problems. Do you know if it solves the reported issue? |
@enh I'm glad that this isn't something that's only occurring on my device. I've been running into it with @sitsofe I think it's on the right track, but that's not quite right. I'm trying to get a minimal working example so that we can trace what's going on from there. The situation in question only has two structures though: struct thread_stat {
/* stuff */
uint64_t clat_percentiles;
uint64_t percentile_precision;
/* stuff */
} __attribute__((packed));
struct thread_data {
/* stuff */
struct thread_stat ts;
/* stuff */
}; |
OK, I've figured out what's going on. This isn't a problem inherent in the structures—when I created a Android shared memory has been somewhat broken for a while (#352), but was fixed (for non Android-O) in #353. As it turns out, the shared memory replacements that use ashmem have a couple problems. It stores the size of the shared memory region in the first Note that |
Fixes: axboe#356 ("Android: SIGBUS due to unaligned access") Signed-off-by: Omri Mor <omor1@asu.edu>
Fixes: axboe#356 ("Android: SIGBUS due to unaligned access") Signed-off-by: Omri Mor <omor1@asu.edu>
@omor1 I was going to say: if you want to do the more complicated thing you could store the offset after you incremented the pointer in a |
Fix unaligned/misaligned accesses related to struct thread_stat and struct jobs_eta seen when running a build produced by CC=~/clang-3.9/build/bin/clang ./configure --disable-optimizations \ --extra-cflags="-D__compiler_offsetof=__builtin_offsetof \ -fsanitize=undefined" and add to the compile time asserts to make these problems more visible. This should fix axboe#356 . Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
@enh could you verify that the fix works on your end and that it doesn't cause problems on AArch64 devices? I only have a Nexus 5 to test with. |
Well, I'm back to report that while jobs start correctly, I'm still running into alignment issues later on—specifically, right after a job finishes. The stack trace indicates that it is line 1797 of backend.c. The job file I'm using is:
I'm also getting the following message: I'm currently trying reverting some of the structure changes, specifically dbf285f, to ensure that it wasn't accidentally introduced there. @axboe, could you reopen the issue since it isn't entirely solved? |
@omor1 does the following patch warn about misalignment with dbf285f in place? diff --git a/libfio.c b/libfio.c
index da22456..03626f3 100644
--- a/libfio.c
+++ b/libfio.c
@@ -353,6 +353,9 @@ int initialize_fio(char *envp[])
* can run into problems on archs that fault on unaligned fp
* access (ARM).
*/
+ compiletime_assert((offsetof(struct thread_data, ts.io_bytes[DDIR_READ]) % 8) == 0, "ts.io_bytes[DDIR_READ]");
+ compiletime_assert((offsetof(struct thread_data, io_bytes[DDIR_READ]) % 8) == 0, "io_bytes[DDIR_READ]");
+ compiletime_assert((offsetof(struct thread_stat, io_bytes) % 8) == 0, "io_bytes");
compiletime_assert((offsetof(struct thread_data, ts) % sizeof(void *)) == 0, "ts");
compiletime_assert((offsetof(struct thread_stat, percentile_list) % 8) == 0, "stat percentile_list");
compiletime_assert((offsetof(struct thread_stat, total_run_time) % 8) == 0, "total_run_time"); |
On a 32 bit x86 Linux some extra asserts I added triggered:
But an assert on an earlier variable doesn't: 356 compiletime_assert((offsetof(struct thread_data, last_usec) % 8) == 0, "last_usec"); I'm baffled as to why |
that prints 0 for both arm32 and arm64. btw, it looks like i gave the wrong line before. it's actually the line after that that's crashing; the one that assigns to
this time you have :128, promising that r4 is 128-bit aligned. r4 is actually 0xdf03c2e8, which is 64-bit aligned, but not 128-bit aligned. |
@enh does that mean in 1797 td->ts.io_bytes[DDIR_READ] = td->io_bytes[DDIR_READ]; it is actually |
Sorry, I've been busy with other things as well. @enh I had suspected that you had an off-by-one error there, as I had previously found it was the I suspect that fixing the issues caused by dbf285f would require the In any case, I'm not sure that dbf285f is necessary—it seems to be causing more issues than it solves. Also note that, as far as I know, to get the structure properly aligned requires the attribute to be declared on the type, not the variable. |
@omor1 It's true that dbf285f has caused a bunch of problems although the suspicious thing is they seem to revolve around ARM Android - I've tried time and again to make the problem happen on a rPI 3 with Rasbian (gcc 4.9.2 and clang 3.5.0) but haven't triggered it with fio. I tried to check if #define _GNU_SOURCE 1
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
#include <sys/mman.h>
struct inner {
uint8_t i0[8];
uint64_t i1[3];
uint32_t i2[2];
} __attribute__((packed));
struct outer {
uint8_t o1[1];
struct inner i __attribute__((aligned));
uint8_t o3[2];
uint64_t o4[3];
};
int main(void) {
struct outer *o;
void *p;
uint8_t *ptr = mmap(NULL, sizeof(struct outer) * 2, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
volatile uint64_t tmp;
tmp = 12;
p = (ptr + 4);
o = (struct outer *) p;
o->o4[0] = tmp;
o->o4[1] = tmp;
o->o4[2] = tmp;
o->i.i1[0] = o->o4[0];
o->i.i1[1] = o->o4[1];
o->i.i1[2] = o->o4[2];
printf("Address of p =%16p %% 16=%lu\n", &p, (uintptr_t) &p % 16);
printf("Address of o =%16p %% 16=%lu\n", &o, (uintptr_t) &o % 16);
printf("Address of o1 =%16p %% 16=%lu\n", &o->o1, (uintptr_t) &o->o1 % 16);
printf("Address of i =%16p %% 16=%lu\n", &o->i, (uintptr_t) &o->i % 16);
printf("Address of o->i.i1[0]=%16p %% 16=%lu\n", &(o->i.i1[0]), (uintptr_t) &o->i.i1[0] % 16);
return 0;
} |
@sitsofe as I said, I think it has to do with the (somewhat hacky) System V shared memory wrapper that is used on Android. I suspect that if configured with |
FWIW it looks like 8 is the max alignment of a scalar type on ARM: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472m/chr1359124981436.html . |
@sitsofe and yet it's clearly getting a 16-byte alignment from somewhere. Maybe Clang supports a 128-bit datatype on ARM somehow? The GCC manual states the following:
|
I assume the compiler presumes the stack is allocated on a 16 byte boundary so while no scalar actually requires it, if it's possible to deduce that you happen to have it and you find you need 128 bits then you can generate instructions that operate on 128 bits for a speed gain? |
@enh asked me to look at this issue. I checked on this a bit and found that for almost all platforms, Clang uses 16-bytes when As to why outer.o4 in the snippet a few comments earlier is aligned to 16 bytes, I believe Clang is aggressively translating alignments (forced or deduced) to other struct members. For instance, in the earlier example, outer.o1 also has a 16-byte alignment. As further evidence, consider the following snippet:
In the above example,
|
@pirama-arumuga-nainar Thanks for posting that. OK so Is it the case that if clang notices when you are accessing two consecutive 64 bit variables and if "knows" the first variable is aligned to 16 bytes then it goes on to use 128 bit operations (that might explain where the |
So I was looking at differences between The optimization is actually reasonable, from the point of view of the compiler. Within In short, we violate the compiler's assumptions by misaligning the memory, which makes it angry. |
I've managed to replicate the error on desktop ARM Linux by wrapping /* In sys/shm.h in fio source directory, found before /usr/include/sys/shm.h */
#ifndef WRAP_SYS_SHM_H
#define WRAP_SYS_SHM_H
#warning "Using <sys/shm.h> wrapper"
#include_next <sys/shm.h>
#include <stdint.h>
typedef uint64_t wrap_t;
static inline int wrap_shmget(key_t key, size_t size, int shmflg)
{
return shmget(key, size + sizeof(wrap_t), shmflg);
}
static inline void *wrap_shmat(int shmid, const void *shmaddr, int shmflg)
{
wrap_t *ptr = shmat(shmid, shmaddr, shmflg);
*ptr = 0xf10;
return ptr + 1;
}
static inline int wrap_shmdt(const void *shmaddr)
{
return shmdt((wrap_t *)shmaddr - 1);
}
#define shmget wrap_shmget
#define shmat wrap_shmat
#define shmdt wrap_shmdt
#endif Compiled with |
And as expected, changing |
How about just changing the android shmat() to just pad 2 uint64_t? That'll satisfy the (odd and buggy) 16-byte alignment restriction, and it won't really hurt anything. |
Like the below. I can't believe github doesn't have a nice adhoc way to share a diff, instead of this web craziness.
|
@axboe that probably works, as likely would changing it to If there's no actual reason to force alignment there, I think it's probably better to let the compiler do its own thing. And if there is a reason for manually aligning the struct member, it's probably best to specify the alignment instead of leaving it up to the compiler (and thus getting this incompatibilities between them)—using Side note, @axboe I think that using a Gist for that works decently, though I don't think it can be embedded into the comment, only linked. Side note 2, I'm thinking it's probably worthwhile to bring up the Clang treatment of |
@axboe As previously mentioned I think the closest you get are gists (https://gist.github.com/ ). For example https://gist.github.com/sitsofe/320e2fbbab3eae854da63cddf7fc5909 .
In the end it was to resolve unrelated alignment warnings from the undefined behavior sanitizer on x86 platforms. I suppose there could be another platform that fio runs on that could suffer from this behaviour but x86 is definitely able to fix things up in hardware (although there's debate as to whether there's a speed hit to doing this). One thing that I have noticed is that the real |
@omor1 I agree that explicit alignment is probably best (I suppose we could use the |
@sitsofe I'm pretty sure that it's page-aligned (i.e. aligned to 4KiB on i386, or additionally 2MiB or 1GiB on x86_64 if hugepages are enabled). |
@omor1 - OK my mistake... |
@sitsofe note in the man page (emphasis mine):
This implies (though isn't outright stated) that the address it gives (with |
Good point on alignment, shmat() will be page size aligned. Fio already queries the page size, so we should just add a page and align forward a page. Android needs the alignment for storing the mmap() size, so it needs some space there. It only needs 32-bit, but we can waste a page per shm map without worrying too much. Alternatively, we could put the size pad at the end. But I generally dislike doing that, since it's more fragile in terms of bugs. |
I'd rather not waste a page for no reason—it's much simpler to just use explicit alignment, as well as ensuring that different compilers aren't choosing different alignment values (as they currently are doing). I've confirmed that it fixes the problem (using the diff --git a/fio.h b/fio.h
index 963cf034..6c06a0cd 100644
--- a/fio.h
+++ b/fio.h
@@ -149,7 +149,7 @@ struct thread_data {
unsigned int thread_number;
unsigned int subjob_number;
unsigned int groupid;
- struct thread_stat ts __attribute__ ((aligned));
+ struct thread_stat ts __attribute__ ((aligned(8)));
int client_type;
Based on |
Agree, that is a better fix. I'll get it committed. |
Pushed, please close this issue if resolved. |
@omor1 @enh @pirama-arumuga-nainar thanks for the clarifications and helping to see this one through to the end! I never knew (mis)alignment and attributes (that I previously thought were hints) could trigger such intricate issues... |
@omor1 do you know if this issue has been resolved on real Android (it's fixed for me when I use your |
I just tested, and it appears to work. Thanks for all the help! Tracking this one down took a while—I'm glad we managed to nail it. I'll go ahead and close it now. |
Description
Running
fio
on Android (on an ARM device) with any job results inSIGBUS
being sent to the process. By usingndk-stack
, I have determined that it is due toBUS_ADRALN
, or an unaligned memory access.I have tracked this down to line 1354 in init.c. Removing the access still results in
SIGBUS
, but this time from line 1615 in stat.c. Removing that access results in a working executable.However, ensuring alignment of the structure by using the
aligned(8)
attribute did not change anything—it still crashed due toSIGBUS
.Furthermore, accessing the fields separately also works. That is, instead of
td->ts.clat_percentiles = o->clat_percentiles;
, using the following:Without the arithmetic, it fails—I assume that the variable is simply optimized out of existence (even at
-O0
).I am truly mystified as to what's going on here.
Environment
Android 6.0.1, Linux 3.4.0 (with numerous device-specific patches, of course), ARMv7a
ndk-r14, Clang 3.8
fio git master
The text was updated successfully, but these errors were encountered: