Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android: SIGBUS due to unaligned access #356

Closed
omor1 opened this issue May 4, 2017 · 60 comments
Closed

Android: SIGBUS due to unaligned access #356

omor1 opened this issue May 4, 2017 · 60 comments

Comments

@omor1
Copy link
Contributor

omor1 commented May 4, 2017

Description

Running fio on Android (on an ARM device) with any job results in SIGBUS being sent to the process. By using ndk-stack, I have determined that it is due to BUS_ADRALN, or an unaligned memory access.
I have tracked this down to line 1354 in init.c. Removing the access still results in SIGBUS, but this time from line 1615 in stat.c. Removing that access results in a working executable.

However, ensuring alignment of the structure by using the aligned(8) attribute did not change anything—it still crashed due to SIGBUS.

Furthermore, accessing the fields separately also works. That is, instead of td->ts.clat_percentiles = o->clat_percentiles;, using the following:

/* arithmetic to prevent `myvar' from being optimized out */
uint64_t myvar = o->clat_percentiles+1;
td->ts.clat_percentiles = myvar-1;

Without the arithmetic, it fails—I assume that the variable is simply optimized out of existence (even at -O0).

I am truly mystified as to what's going on here.

Environment

Android 6.0.1, Linux 3.4.0 (with numerous device-specific patches, of course), ARMv7a
ndk-r14, Clang 3.8
fio git master

@omor1
Copy link
Contributor Author

omor1 commented May 4, 2017

Since struct thread_options *o = &td->o;, I wonder if this somehow has to do with race on reading and writing from the same pointer? Since it's equivalent to td->ts.clat_percentiles = td->o.clat_percentiles;.

@sitsofe
Copy link
Collaborator

sitsofe commented May 4, 2017

@omor1 With work (e.g. you may need to use a define to quiet down some of the false positives due to https://www.spinics.net/lists/fio/msg05514.html ) you might be able to get fio to compile under clang's undefined behaviour sanitizer. That might in turn may give some clues...

@omor1
Copy link
Contributor Author

omor1 commented May 4, 2017

@sitsofe I'll take a look at trying that, maybe it will help figure out what exactly is going on.

In any case, I also noticed that while struct thread_stat is packed, struct thread_data is not. Declaring it with __attribute__((packed)) as well somehow fixes the problem too. This just gets more and more bizarre (struct thread_options is not packed, but struct thread_options_pack is, as its name suggests).

@sitsofe
Copy link
Collaborator

sitsofe commented May 4, 2017

I think thread_options_pack is only used in earnest when fio is running in client/server mode. I can't offer any clues for the other pieces...

@sitsofe
Copy link
Collaborator

sitsofe commented May 4, 2017

I managed to get the undefined sanitizer on x86_64 Linux going with something like

CC=~/clang-3.9/build/bin/clang ./configure --extra-cflags="-D__compiler_offsetof=__builtin_offsetof -fsanitize=undefined"  

Running this job

./fio --ioengine=posixaio --iodepth=32 --size=100M --bs=4k --verify=crc32c --rw=randwrite --name=go

complained about unaligned access in crc/crc32c-intel.c, stat.c and eta.c but was silent in relation to init.c.

@axboe
Copy link
Owner

axboe commented May 4, 2017

The structs that go over the wire for client/server runs are packed. But while they are packed, the padding should ensure that the alignment of members is done appropriately. So I don't immediately see why this is happening for you. We have a few checks in libfio.c that check the alignment at compile time to avoid misaligned members. Usual requirements are aligning to the same size as the type itself. Eg a 64-bit type should be aligned on an 8 byte boundary.

That should work, as long as the allocate structure is size-of-pointer aligned as well. Might be worth looking into. x86 doesn't care about alignment, but generally it's best to avoid misaligned data, as it is slower.

@sitsofe
Copy link
Collaborator

sitsofe commented May 20, 2017

I've just noticed this: https://android.googlesource.com/platform/external/fio/+/365a153cfe8a3601af5d7c6c87679c20e84314e5%5E%21/#F0 . Doesn't seem like a permanent solution and only mentions GCC...

@enh
Copy link
Contributor

enh commented May 20, 2017

we're not seeing this because all our current hardware is 64-bit, but if i force fio to build as a 32-bit binary i can reproduce this:

pid: 17607, tid: 17607, name: fio >>> fio <<<
signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0xd8eb783c
r0 d8eb783c r1 d8eb7844 r2 00000140 r3 00000021
r4 d8eb7004 r5 00000000 r6 00000000 r7 ff8830b8
r8 d8eb7014 r9 ffffffff sl ee899148 fp 00000001
ip aaffee88 sp ff881018 lr aafbb17f pc aafae896 cpsr 600f0030

Stack Trace:
RELADDR FUNCTION FILE:LINE
0001a896 add_job+2406 external/fio/init.c:1354
0001ce73 parse_cmd_line+3102 external/fio/init.c:2720 (discriminator 3)
0001d977 parse_options+46 external/fio/init.c:2765
00017df1 main+36 external/fio/fio.c:47
00017d45 __libc_init+48 bionic/libc/bionic/libc_init_dynamic.cpp:114
000066c0 _start+96 libgcc2.c:?

the specific instruction that's failing is this:

1a896: f960 079f vld1.32 {d16}, [r0 :64]

where the :64 is a promise that the address is 64-bit aligned (which it obviously isn't from the register dump).

and indeed if you add the missing compile-time asserts, percentile_precision is insufficiently aligned in thread_options (thread_options_pack is probably fine because since it's packed the compiler should know it can't optimize accesses).

@sitsofe
Copy link
Collaborator

sitsofe commented May 20, 2017

OK I've been looking at the warnings generated by the undefined sanitizer and have come up with this: sitsofe@62d673e . It's a bit raw but I'd be curious to see if it changes the problem at all. The weird thing is the __attribute__ ((aligned)) on ts inside struct thread_data but without it ts starts unaligned and the compiletime_asserts on ts's members are meaningless (as they seem to assume it at least starts aligned).

@sitsofe
Copy link
Collaborator

sitsofe commented May 21, 2017

I think the problems stems from this pattern:

#include <stdio.h>
#include <stdint.h>

struct inner {
        uint64_t z;
};

struct middle {
        uint64_t y;
        struct inner i;
        char a;
} __attribute__((packed));

struct outer {
        char x;
        struct middle m;
};

static void store_inner(volatile struct inner *i) {
        i->z = 1;
}

int main(void) {
        struct outer o;

        store_inner(&o.m.i);
        printf("%lu\n", o.m.i.z);

        return 0;
}

In the above, the packed attribute gets applied to each of struct middle's members telling the compiler it doesn't need to generate any special alignments for any of them at layout time so m is unaligned (and by extension y, i and z). When accessed as o.m.i.z in main there's no problem as the compiler is clever enough to know it needs to fix up all the unaligned accesses. The twist comes when store_inner() goes to set the z member because the value of the pointer it was passed is unaligned but this information isn't known inside the function. The compiler goes on to presume no special fixups are required and both the deference of i and the setting of z are done without special care.

sitsofe added a commit to sitsofe/fio that referenced this issue May 21, 2017
Fix misaligned access related to struct thread_stat and struct jobs_eta
seen when running a build produced by
CC=~/clang-3.9/build/bin/clang ./configure --disable-optimizations \
   --extra-cflags="-D__compiler_offsetof=__builtin_offsetof \
   -fsanitize=undefined"

and add to the compile time asserts to make these problem more visible.
This should fix axboe#356 .

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
sitsofe added a commit to sitsofe/fio that referenced this issue May 21, 2017
Fix misaligned access related to struct thread_stat and struct jobs_eta
seen when running a build produced by
CC=~/clang-3.9/build/bin/clang ./configure --disable-optimizations \
   --extra-cflags="-D__compiler_offsetof=__builtin_offsetof \
   -fsanitize=undefined"

and add to the compile time asserts to make these problem more visible.
This should fix axboe#356 .

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
sitsofe added a commit to sitsofe/fio that referenced this issue May 21, 2017
Fix unaligned/misaligned accesses related to struct thread_stat and
struct jobs_eta seen when running a build produced by
CC=~/clang-3.9/build/bin/clang ./configure --disable-optimizations \
   --extra-cflags="-D__compiler_offsetof=__builtin_offsetof \
   -fsanitize=undefined"

and add to the compile time asserts to make these problems more visible.
This should fix axboe#356 .

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
@sitsofe
Copy link
Collaborator

sitsofe commented May 21, 2017

@omor1 @enh I've made https://github.com/sitsofe/fio/tree/alignment to try and address some of the unaligned access problems. Do you know if it solves the reported issue?

@omor1
Copy link
Contributor Author

omor1 commented May 22, 2017

@enh I'm glad that this isn't something that's only occurring on my device. I've been running into it with clat_percentiles, but I can see percentile_precision being a problem too, depending on what the compiler does. Both fail compile-time asserts.

@sitsofe I think it's on the right track, but that's not quite right. I'm trying to get a minimal working example so that we can trace what's going on from there. The situation in question only has two structures though:

struct thread_stat {
    /* stuff */
    uint64_t clat_percentiles;
    uint64_t percentile_precision;
    /* stuff */
}  __attribute__((packed));

struct thread_data {
    /* stuff */
    struct thread_stat ts;
    /* stuff */
};

@omor1
Copy link
Contributor Author

omor1 commented May 22, 2017

OK, I've figured out what's going on.

This isn't a problem inherent in the structures—when I created a struct thread_data on the stack and played around with it, I had no problems. However, fio gets the memory from either the heap (malloc) or from shared memory. When configured with --disable-shm, fio works fine—there must be a problem in the shared memory code.

Android shared memory has been somewhat broken for a while (#352), but was fixed (for non Android-O) in #353. As it turns out, the shared memory replacements that use ashmem have a couple problems. It stores the size of the shared memory region in the first sizeof(size_t) bytes of the shared memory (so that it can later be retrieved and unmapped). Since size_t is 4 bytes on 32-bit platforms, this misaligns the resulting pointer and later causes the SIGBUS. As expected, changing it to a 64-bit integer fixes the problem, though it isn't the nicest solution. Is there something better than just changing size_t to uint64_t?

Note that shmat() maps size + sizeof(uint64_t) despite shmget() having called the ASHMEM_SET_SIZE ioctl with only size. I think this needs to be fixed too, though it would only cause problems when using a large number of threads. I'm not all too familiar with ashmem—is this correct, @enh?

omor1 pushed a commit to omor1/fio that referenced this issue May 22, 2017
Fixes: axboe#356 ("Android: SIGBUS due to unaligned access")

Signed-off-by: Omri Mor <omor1@asu.edu>
omor1 pushed a commit to omor1/fio that referenced this issue May 22, 2017
Fixes: axboe#356 ("Android: SIGBUS due to unaligned access")

Signed-off-by: Omri Mor <omor1@asu.edu>
@sitsofe
Copy link
Collaborator

sitsofe commented May 22, 2017

@omor1 I was going to say: if you want to do the more complicated thing you could store the offset after you incremented the pointer in a uintptr_t, round upwards to the nearest sizeof(void *) and do the reverse (but rounding downwards). It doesn't sound as elegant as your solution though...

sitsofe added a commit to sitsofe/fio that referenced this issue May 22, 2017
Fix unaligned/misaligned accesses related to struct thread_stat and
struct jobs_eta seen when running a build produced by
CC=~/clang-3.9/build/bin/clang ./configure --disable-optimizations \
   --extra-cflags="-D__compiler_offsetof=__builtin_offsetof \
   -fsanitize=undefined"

and add to the compile time asserts to make these problems more visible.
This should fix axboe#356 .

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
@omor1
Copy link
Contributor Author

omor1 commented May 22, 2017

@enh could you verify that the fix works on your end and that it doesn't cause problems on AArch64 devices? I only have a Nexus 5 to test with.

@omor1
Copy link
Contributor Author

omor1 commented May 23, 2017

Well, I'm back to report that while jobs start correctly, I'm still running into alignment issues later on—specifically, right after a job finishes. The stack trace indicates that it is line 1797 of backend.c.

The job file I'm using is:

[global]
ioengine=sync
size=256m
bssplit=4b/30:1k:40:4k:15:8k:15
thread=1
directory=/sdcard/
loops=3

[rand-read]
rw=randread
stonewall

[rand-write]
rw=randwrite
stonewall

[rand-rw]
rw=randrw
stonewall

I'm also getting the following message: fio: posix_fallocate fails: Operation not supported on transport endpoint. Is posix_fallocate() not supported on Android?

I'm currently trying reverting some of the structure changes, specifically dbf285f, to ensure that it wasn't accidentally introduced there.

@axboe, could you reopen the issue since it isn't entirely solved?

@omor1
Copy link
Contributor Author

omor1 commented May 23, 2017

So reverting dbf285f and 25e1ccc appears to fix the problem for me. @enh, can you confirm that this works when you test it?

I don't know if it's better to revert the protocol version (since there hasn't been a release yet) or bump it again instead.

@axboe axboe reopened this May 23, 2017
@sitsofe
Copy link
Collaborator

sitsofe commented May 23, 2017

@omor1 does the following patch warn about misalignment with dbf285f in place?

diff --git a/libfio.c b/libfio.c
index da22456..03626f3 100644
--- a/libfio.c
+++ b/libfio.c
@@ -353,6 +353,9 @@ int initialize_fio(char *envp[])
         * can run into problems on archs that fault on unaligned fp
         * access (ARM).
         */
+       compiletime_assert((offsetof(struct thread_data, ts.io_bytes[DDIR_READ]) % 8) == 0, "ts.io_bytes[DDIR_READ]");
+       compiletime_assert((offsetof(struct thread_data, io_bytes[DDIR_READ]) % 8) == 0, "io_bytes[DDIR_READ]");
+       compiletime_assert((offsetof(struct thread_stat, io_bytes) % 8) == 0, "io_bytes");
        compiletime_assert((offsetof(struct thread_data, ts) % sizeof(void *)) == 0, "ts");
        compiletime_assert((offsetof(struct thread_stat, percentile_list) % 8) == 0, "stat percentile_list");
        compiletime_assert((offsetof(struct thread_stat, total_run_time) % 8) == 0, "total_run_time");

@sitsofe
Copy link
Collaborator

sitsofe commented May 23, 2017

On a 32 bit x86 Linux some extra asserts I added triggered:

    CC libfio.o
libfio.c:357:2: error: static_assert failed "total_io_size"
        compiletime_assert((offsetof(struct thread_data, total_io_size) % 8) == 0, "total_io_size");
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                           ^              ~~~~~~~~~
libfio.c:359:2: error: static_assert failed "io_bytes[DDIR_READ]"
        compiletime_assert((offsetof(struct thread_data, io_bytes[DDIR_READ]) % 8) == 0, "io_bytes[DDIR_READ]");
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./compiler/compiler.h:39:44: note: expanded from macro 'compiletime_assert'
#define compiletime_assert(condition, msg) _Static_assert(condition, msg)
                                           ^              ~~~~~~~~~
2 errors generated.

But an assert on an earlier variable doesn't:

356         compiletime_assert((offsetof(struct thread_data, last_usec) % 8) == 0, "last_usec");

I'm baffled as to why total_io_size wouldn't be automatically aligned to 8 though. Perhaps it's accessed as two 32 bit integers and thus doesn't need it?

@enh
Copy link
Contributor

enh commented Jun 16, 2017

that prints 0 for both arm32 and arm64.

btw, it looks like i gave the wrong line before. it's actually the line after that that's crashing; the one that assigns to io_bytes. the instruction generated is:

    8f68:       f964 0aef       vld1.64 {d16-d17}, [r4 :128]

this time you have :128, promising that r4 is 128-bit aligned. r4 is actually 0xdf03c2e8, which is 64-bit aligned, but not 128-bit aligned.

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 16, 2017

@enh does that mean in

1797	td->ts.io_bytes[DDIR_READ] = td->io_bytes[DDIR_READ];

it is actually td->io_bytes that is the thing that is only aligned to 64 bits? Does that mean the compiler is expecting the entire outer struct to start on a 128 bit boundary?

@omor1
Copy link
Contributor Author

omor1 commented Jun 17, 2017

Sorry, I've been busy with other things as well.

@enh I had suspected that you had an off-by-one error there, as I had previously found it was the io_bytes line as well.

I suspect that fixing the issues caused by dbf285f would require the shmat() wrapper to return a 16-byte aligned pointer (instead of 8-byte aligned as it does now). It appears that __attribute__((aligned)) makes a 16-byte alignment under Clang for ARM. I wonder if using __attribute__((aligned(8))) instead would also fix it.

In any case, I'm not sure that dbf285f is necessary—it seems to be causing more issues than it solves. Also note that, as far as I know, to get the structure properly aligned requires the attribute to be declared on the type, not the variable.

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 17, 2017

@omor1 It's true that dbf285f has caused a bunch of problems although the suspicious thing is they seem to revolve around ARM Android - I've tried time and again to make the problem happen on a rPI 3 with Rasbian (gcc 4.9.2 and clang 3.5.0) but haven't triggered it with fio.

I tried to check if__attribute__((aligned)) forced 16 byte alignment but the result wasn't clear:

#define _GNU_SOURCE 1
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
#include <sys/mman.h>

struct inner {
        uint8_t  i0[8];
        uint64_t i1[3];
        uint32_t i2[2];
} __attribute__((packed));

struct outer {
        uint8_t  o1[1];
        struct inner i __attribute__((aligned));
	uint8_t  o3[2];
	uint64_t o4[3];
};

int main(void) {
        struct outer *o;
	void *p;
	uint8_t *ptr = mmap(NULL, sizeof(struct outer) * 2, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
	volatile uint64_t tmp;

	tmp = 12;
	p = (ptr + 4);
        o = (struct outer *) p;
	o->o4[0] = tmp;
	o->o4[1] = tmp;
	o->o4[2] = tmp;
        o->i.i1[0] = o->o4[0];
        o->i.i1[1] = o->o4[1];
        o->i.i1[2] = o->o4[2];

        printf("Address of p         =%16p %% 16=%lu\n", &p, (uintptr_t) &p % 16);
        printf("Address of o         =%16p %% 16=%lu\n", &o, (uintptr_t) &o % 16);
        printf("Address of o1        =%16p %% 16=%lu\n", &o->o1, (uintptr_t) &o->o1 % 16);
        printf("Address of i         =%16p %% 16=%lu\n", &o->i, (uintptr_t) &o->i % 16);
        printf("Address of o->i.i1[0]=%16p %% 16=%lu\n", &(o->i.i1[0]), (uintptr_t) &o->i.i1[0] % 16);

        return 0;
}

@omor1
Copy link
Contributor Author

omor1 commented Jun 17, 2017

@sitsofe as I said, I think it has to do with the (somewhat hacky) System V shared memory wrapper that is used on Android. I suspect that if configured with --disable-shm, dbf285f will work fine. Unfortunately, I'm not sure that there's a better way of wrapping the shared memory interfaces. @enh do you have any ideas?

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 17, 2017

FWIW it looks like 8 is the max alignment of a scalar type on ARM: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472m/chr1359124981436.html .

@omor1
Copy link
Contributor Author

omor1 commented Jun 17, 2017

@sitsofe and yet it's clearly getting a 16-byte alignment from somewhere. Maybe Clang supports a 128-bit datatype on ARM somehow?

The GCC manual states the following:

Whenever you leave out the alignment factor in an aligned attribute specification, the compiler automatically sets the alignment for the type to the largest alignment that is ever used for any data type on the target machine you are compiling for.

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 17, 2017

I assume the compiler presumes the stack is allocated on a 16 byte boundary so while no scalar actually requires it, if it's possible to deduce that you happen to have it and you find you need 128 bits then you can generate instructions that operate on 128 bits for a speed gain?

@pirama-arumuga-nainar
Copy link

@enh asked me to look at this issue. I checked on this a bit and found that for almost all platforms, Clang uses 16-bytes when __attribute__((aligned)) is specified. r235397 is the patch that added support for this attribute and I couldn't find any subsequent patch that changes this for ARM/AArch64.

As to why outer.o4 in the snippet a few comments earlier is aligned to 16 bytes, I believe Clang is aggressively translating alignments (forced or deduced) to other struct members. For instance, in the earlier example, outer.o1 also has a 16-byte alignment.

As further evidence, consider the following snippet:

#include <inttypes.h>
struct s1 {
  int16_t a[7] __attribute__((aligned));
  int32_t b;
};

struct s2{
  int16_t a[3] __attribute__((aligned(8)));
  int32_t b;
};

struct s3{
  int64_t a __attribute__((aligned(8)));
  int32_t b;
};

struct s4{
  int64_t a;
  int32_t b;
};

int32_t s1_b(struct s1 *p) {
  return p->b;
}

int32_t s2_b(struct s2 *p) {
  return p->b;
}

int32_t s3_b(struct s3 *p) {
  return p->b;
}

int32_t s4_b(struct s4 *p) {
  return p->b;
}

In the above example,

  • s1.b is 16 byte aligned (default for __attribute__((aligned)))
  • s2.b is 8 byte aligned based on s2.a (and s2 itself)
  • s3.b is 8 byte aligned based on s3.a (and s2 itself)
  • s4.b is just 8 byte aligned on ARM because int64 is 8-byte aligned on ARM

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 18, 2017

@pirama-arumuga-nainar Thanks for posting that. OK so __attribute__((aligned)) explains where the belief of 16 byte alignment came from. If someone can test @omor1's suggestion of attribute((aligned(8))) atop of dbf285f we might be able to squeeze by for now...

Is it the case that if clang notices when you are accessing two consecutive 64 bit variables and if "knows" the first variable is aligned to 16 bytes then it goes on to use 128 bit operations (that might explain where the :128 hint arrived from) when available? If so, this is a very aggressive optimization given that you can't know what the base address was unless you allocated the memory yourself...

@omor1
Copy link
Contributor Author

omor1 commented Jun 18, 2017

So I was looking at differences between __attribute__((aligned)) and __attribute__((aligned(__BIGGEST_ALIGNMENT__))). On GCC both resulted in an 8-byte alignment, while Clang, as noted, aligned the former to 16 bytes, but the latter to 8. This seems to be a Clang bug.

The optimization is actually reasonable, from the point of view of the compiler. Within struct thread_data, ts is supposed to be 16-byte aligned. The compiler sees an opportunity to optimize the loading by utilizing the NEON load instruction to load two of the 64-bit array elements into SIMD registers, which, based on its knowledge of the layout of struct thread_data and struct thread_stat, it knows is supposed to be 16-byte aligned. However, the pointer given to struct thread_data is not actually 16-byte aligned (because of the Android shared memory workaround), which throws off the compiler's assumptions.

In short, we violate the compiler's assumptions by misaligning the memory, which makes it angry.

@omor1
Copy link
Contributor Author

omor1 commented Jun 19, 2017

I've managed to replicate the error on desktop ARM Linux by wrapping <sys/shm.h> to do the same thing we do in os/os-android.h:

/* In sys/shm.h in fio source directory, found before /usr/include/sys/shm.h */
#ifndef WRAP_SYS_SHM_H
#define WRAP_SYS_SHM_H

#warning "Using <sys/shm.h> wrapper"

#include_next <sys/shm.h>
#include <stdint.h>

typedef uint64_t wrap_t;

static inline int wrap_shmget(key_t key, size_t size, int shmflg)
{
	return shmget(key, size + sizeof(wrap_t), shmflg);
}

static inline void *wrap_shmat(int shmid, const void *shmaddr, int shmflg)
{
	wrap_t *ptr = shmat(shmid, shmaddr, shmflg);
	*ptr = 0xf10;
	return ptr + 1;
}

static inline int wrap_shmdt(const void *shmaddr)
{
	return shmdt((wrap_t *)shmaddr - 1);
}

#define shmget wrap_shmget
#define shmat wrap_shmat
#define shmdt wrap_shmdt

#endif

Compiled with LDFLAGS="-pie" ./configure --cc="clang" --extra-cflags="-fPIE -O2", using Clang 4.0.0. The same problem does not happen when using GCC—this is exclusively a Clang problem.

@omor1
Copy link
Contributor Author

omor1 commented Jun 19, 2017

And as expected, changing wrap_t to a 16-byte structure (struct { uint64_t _pad[2]; } and removing *ptr = 0xf10) fixes it as well. This really does come down to how Clang treats __attribute__((aligned)).

@axboe
Copy link
Owner

axboe commented Jun 19, 2017

How about just changing the android shmat() to just pad 2 uint64_t? That'll satisfy the (odd and buggy) 16-byte alignment restriction, and it won't really hurt anything.

@axboe
Copy link
Owner

axboe commented Jun 19, 2017

Like the below. I can't believe github doesn't have a nice adhoc way to share a diff, instead of this web craziness.

diff --git a/os/os-android.h b/os/os-android.h
index c56d6827109a..6eeb1e4c39a6 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -98,7 +98,7 @@ static inline int shmget(key_t __key, size_t __size, int __shmflg)
 		goto error;
 
 	/* Stores size in first 8 bytes, allocate extra space */
-	ret = ioctl(fd, ASHMEM_SET_SIZE, __size + sizeof(uint64_t));
+	ret = ioctl(fd, ASHMEM_SET_SIZE, __size + 2 * sizeof(uint64_t));
 	if (ret < 0)
 		goto error;
 
@@ -116,13 +116,13 @@ static inline void *shmat(int __shmid, const void *__shmaddr, int __shmflg)
 	uint64_t *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, __shmid, 0);
 	/* Save size at beginning of buffer, for use with munmap */
 	*ptr = size;
-	return ptr + 1;
+	return ptr + 2;
 }
 
 static inline int shmdt (const void *__shmaddr)
 {
 	/* Find mmap size which we stored at the beginning of the buffer */
-	uint64_t *ptr = (uint64_t *)__shmaddr - 1;
+	uint64_t *ptr = (uint64_t *)__shmaddr - 2;
 	size_t size = *ptr;
 	return munmap(ptr, size);
 }``

@omor1
Copy link
Contributor Author

omor1 commented Jun 19, 2017

@axboe that probably works, as likely would changing it to __attribute__((aligned(8))). Before we do that, I'd like to figure out—what was the reason for adding the alignment to struct thread_stat ts in the first place (in dbf285f)? I thought it was an attempt to solve the SIGBUS issue, which ended up being due to a different problem, or was it to resolve unrelated alignment warnings from the undefined behavior sanitizer?

If there's no actual reason to force alignment there, I think it's probably better to let the compiler do its own thing. And if there is a reason for manually aligning the struct member, it's probably best to specify the alignment instead of leaving it up to the compiler (and thus getting this incompatibilities between them)—using __attribute__((aligned(alignment))) or better, if we allow use of C11 (do we, or is fio sticking to C99/GNU99?), using alignas.

Side note, @axboe I think that using a Gist for that works decently, though I don't think it can be embedded into the comment, only linked.

Side note 2, I'm thinking it's probably worthwhile to bring up the Clang treatment of __attribute__((aligned)) on either their mailing list or filing a bug report—probably the former, and the latter if I get a response indicating that it's a bug (which it probably is, as it deviates unexpectedly from GCC).

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 19, 2017

@axboe As previously mentioned I think the closest you get are gists (https://gist.github.com/ ). For example https://gist.github.com/sitsofe/320e2fbbab3eae854da63cddf7fc5909 .

I thought it was an attempt to solve the SIGBUS issue, which ended up being due to a different problem, or was it to resolve unrelated alignment warnings from the undefined behavior sanitizer?

In the end it was to resolve unrelated alignment warnings from the undefined behavior sanitizer on x86 platforms. I suppose there could be another platform that fio runs on that could suffer from this behaviour but x86 is definitely able to fix things up in hardware (although there's debate as to whether there's a speed hit to doing this).

One thing that I have noticed is that the real shmat() seems to return an address which has stricter alignment than __BIGGEST_ALIGNMENT__. This is only alluded to through references to SHMLBA in the man page - http://man7.org/linux/man-pages/man2/shmop.2.html .

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 19, 2017

@omor1 I agree that explicit alignment is probably best (I suppose we could use the __alignof__(uint64_t) if we're wedded to GCC compatible compilers). We are already trying to manually ensure that the packed struct members have the correct alignment so it doesn't seem too much of stretch to extended that manual work to setting the alignment on the overall struct too.

@omor1
Copy link
Contributor Author

omor1 commented Jun 19, 2017

@sitsofe I'm pretty sure that it's page-aligned (i.e. aligned to 4KiB on i386, or additionally 2MiB or 1GiB on x86_64 if hugepages are enabled).

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 19, 2017

@omor1 - OK my mistake...

@omor1
Copy link
Contributor Author

omor1 commented Jun 19, 2017

@sitsofe note in the man page (emphasis mine):

SHMLBA: Segment low boundary address multiple. When explicitly specifying an attach address in a call to shmat(), the caller should ensure that the address is a multiple of this value. This is necessary on some architectures, in order either to ensure good CPU cache performance or to ensure that different attaches of the same segment have consistent views within the CPU cache. SHMLBA is normally some multiple of the system page size (on many Linux architectures, it is the same as the system page size).

This implies (though isn't outright stated) that the address it gives (with shmaddr == NULL) is at least page-aligned. This will indeed be a stricter alignment than __BIGGEST_ALIGNMENT__ or alignof(max_align_t). I'm pretty sure mmap() (used in the Android wrapper) also gives page-aligned memory.

@axboe
Copy link
Owner

axboe commented Jun 19, 2017

Good point on alignment, shmat() will be page size aligned. Fio already queries the page size, so we should just add a page and align forward a page. Android needs the alignment for storing the mmap() size, so it needs some space there. It only needs 32-bit, but we can waste a page per shm map without worrying too much.

Alternatively, we could put the size pad at the end. But I generally dislike doing that, since it's more fragile in terms of bugs.

@omor1
Copy link
Contributor Author

omor1 commented Jun 19, 2017

I'd rather not waste a page for no reason—it's much simpler to just use explicit alignment, as well as ensuring that different compilers aren't choosing different alignment values (as they currently are doing). I've confirmed that it fixes the problem (using the <sys/shm.h> header above).

diff --git a/fio.h b/fio.h
index 963cf034..6c06a0cd 100644
--- a/fio.h
+++ b/fio.h
@@ -149,7 +149,7 @@ struct thread_data {
        unsigned int thread_number;
        unsigned int subjob_number;
        unsigned int groupid;
-       struct thread_stat ts __attribute__ ((aligned));
+       struct thread_stat ts __attribute__ ((aligned(8)));
 
        int client_type;
 

Based on Makefile, it appears that fio (officially) uses gnu99, though it appears that both GCC and Clang accept _Alignas() in gnu99 mode, which is strange.

@axboe
Copy link
Owner

axboe commented Jun 19, 2017

Agree, that is a better fix. I'll get it committed.

@axboe
Copy link
Owner

axboe commented Jun 19, 2017

Pushed, please close this issue if resolved.

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 20, 2017

@omor1 @enh @pirama-arumuga-nainar thanks for the clarifications and helping to see this one through to the end! I never knew (mis)alignment and attributes (that I previously thought were hints) could trigger such intricate issues...

@sitsofe
Copy link
Collaborator

sitsofe commented Jun 25, 2017

@omor1 do you know if this issue has been resolved on real Android (it's fixed for me when I use your sys/shm.h header on a rPi 3 with clang-3.9) and if so can you close it?

@omor1
Copy link
Contributor Author

omor1 commented Jun 27, 2017

I just tested, and it appears to work. Thanks for all the help! Tracking this one down took a while—I'm glad we managed to nail it. I'll go ahead and close it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants