New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version 3.18 sets stonewall incorrectly on s390x #1065
Comments
Hi @tdi2004, Could you:
I tried with 84abeef and the following but could not reproduce your issue:
|
Hi @sitsofe ,
The following output line pointed me to the conclusion, that fio now starts the jobs on various directories sequentially (instead of parallel, like it did before):
=> almost 60s = 2*30s |
@tdi2004 , I still can't reproduce what you're seeing but I think the job can be minimised further. Let's check a few things:
|
Hi @tdi2004, unfortunately I'm also not able to reproduce the behaviour you're describing. Using git tag fio-3.21 I get the following output:
Runtime is, as you expect, roughly 30s, so the jobs do infact run in parallel. What I noticed is that your jobs run in different groups (notice the g= in parentheses at the beginning of each line), where your jobs run in different groups. I have no idea why that is … yet :) Edit: I also tested current master g5c79. Result: "Works for me". (Although that probably doesn't help much.) |
ran a job on /tmp/1 for 5s |
|
Not here. That's the culprit. For me it is 60s. My architecture is s390x - I can't imagine, that the forking stuff is architecture dependent.
|
Ah, I was about to ask about your compiler/architecture. I was expecting Linux/x86_64. @sitsofe |
@hannesweisbach good sleuthing. The type of stonewall is FIO_OPT_STR_SET and the type of exit_what is FIO_OPT_STR too. I wonder if the option parsing is writing over the adjacent variable? |
Comping with
then running
stopped with the following error:
So that's a starting point... which turned out to be a dead end. |
The issue should be located in |
I guess it has another member alignment or it has less impact because it's little endian. Do you want me to open a pull request? |
Sure, go ahead. Although I'm not a maintainer. |
Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>
@hannesweisbach x86/amd64 is notorious for quietly fixing up alignment issues at a minor speed penalty (https://stackoverflow.com/a/7517370 ). |
@sitsofe Sure, |
Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>
Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>
@tdi2004 thanks for persevering, @hannesweisbach, @XeS0r thanks for sorting this one! Out of idle curiosity I checked what happened before this fix by running the following
This showed the jobs going into different groups and After the fix in fd56c23 the above no longer causes two different groups to be made and there's no stonewall output when looking through the debug. I had hoped that additional tooling could help detect this case but it doesn't look like it: #include <stdio.h>
#include <stddef.h>
struct st {
unsigned long long l1;
unsigned short s1;
unsigned short s2;
unsigned int i1;
} comp = {.l1 = 0, .s1 = 1, .s2 = 2, .i1 = 3};
int main(void) {
unsigned int *ip;
void *p;
p = (char *) &comp + offsetof(struct st, s1);
ip = p;
*ip = 444 << 16 | 777;
printf("Address of .s1: %p .s2: %p\n", &(comp.s1), &(comp.s2));
printf("Value of .l1: %lld .s1: %d .s2: %d i1: %d\n", comp.l1, comp.s1,
comp.s2, comp.i1);
} https://github.com/google/sanitizers/wiki/AddressSanitizerIntraObjectOverflow suggests only limited cases are catered for when it comes to intra object overflow (e.g. within a struct). Because it also defeats the type system I suspect this would be even tougher to detect statically. |
Jup. I though I was being smart using uint16_ts … going 32-bit for those options is probably the right call. |
Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>
Hi @hannesweisbach,
assume the following jobfile:
with commit 64402a8 a change in behaviour was introduced.
While older fio versions started the fio parent PID and one fio child PID per each directory to run fio on several mountpoints in parallel, this is now broken. fio runs only on the first directory specified:
It shows only one job! Previous version showed five in my case.
Same for pidof: pidof fio shows only 2 (parent and one child) instead of 6 (parent and five children)
killing the parent pid shows in fio, that only LV1 was used:
The text was updated successfully, but these errors were encountered: