Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

version 3.18 sets stonewall incorrectly on s390x #1065

Closed
tdi2004 opened this issue Aug 13, 2020 · 18 comments · Fixed by #1066
Closed

version 3.18 sets stonewall incorrectly on s390x #1065

tdi2004 opened this issue Aug 13, 2020 · 18 comments · Fixed by #1066

Comments

@tdi2004
Copy link

tdi2004 commented Aug 13, 2020

Hi @hannesweisbach,
assume the following jobfile:

[global]
blocksize_range=4k-32k
rw=randrw
percentage_random=50
rwmixread=50
iodepth=4
ioengine=libaio
verify=crc32c
verify_fatal=1
verify_dump=1
verify_backlog=100000
time_based
runtime=14d
exitall_on_error=1
numjobs=1

[LV1]
size=133722k
directory=/mnt1/LV1 

[LV2]
size=124315k
directory=/mnt1/LV2

[LV3]
size=124873k
directory=/mnt1/LV3

[LV4]
size=133380k
directory=/mnt1/LV4

[LV5]
size=168411k
directory=/mnt1/LV5

with commit 64402a8 a change in behaviour was introduced.
While older fio versions started the fio parent PID and one fio child PID per each directory to run fio on several mountpoints in parallel, this is now broken. fio runs only on the first directory specified:

root@t83lp32:~/fio-fio-3.18# ./fio jobfile.fio 
LV1: (g=0): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
LV2: (g=1): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
LV3: (g=2): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
LV4: (g=3): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
LV5: (g=4): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
fio-3.18
Starting 5 processes
Jobs: 1 (f=1): [m(1),P(4)][0.0%][r=95.0MiB/s,w=48.1MiB/s][r=5726,w=2854 IOPS][eta 69d:23h:59m:46s]    

It shows only one job! Previous version showed five in my case.
Same for pidof: pidof fio shows only 2 (parent and one child) instead of 6 (parent and five children)
killing the parent pid shows in fio, that only LV1 was used:

fio: terminating on signal 15
Jobs: 0 (f=0)
LV1: (groupid=0, jobs=1): err= 0: pid=647846: Thu Aug 13 17:47:45 2020
  read: IOPS=3206, BW=53.7MiB/s (56.3MB/s)(10.3GiB/196577msec)
    slat (nsec): min=0, max=42050M, avg=257430.22, stdev=73773012.21
    clat (usec): min=6, max=42097k, avg=787.08, stdev=127816.71
     lat (usec): min=18, max=42098k, avg=1044.69, stdev=147600.51
    clat percentiles (usec):
     |  1.00th=[   59],  5.00th=[  135], 10.00th=[  178], 20.00th=[  231],
     | 30.00th=[  277], 40.00th=[  322], 50.00th=[  388], 60.00th=[  461],
     | 70.00th=[  498], 80.00th=[  529], 90.00th=[  562], 95.00th=[  586],
     | 99.00th=[  644], 99.50th=[  668], 99.90th=[  775], 99.95th=[ 6194],
     | 99.99th=[25560]
   bw (  KiB/s): min=10717, max=55065, per=85.28%, avg=46924.46, stdev=5540.73, samples=223
   iops        : min=  647, max= 3191, avg=2741.35, stdev=323.20, samples=223
  write: IOPS=5899, BW=99.1MiB/s (104MB/s)(5422MiB/54706msec); 0 zone resets
    slat (usec): min=11, max=248, avg=51.57, stdev=26.86
    clat (nsec): min=1000, max=30439k, avg=259321.41, stdev=311162.57
     lat (usec): min=14, max=30486, avg=310.93, stdev=312.50
    clat percentiles (usec):
     |  1.00th=[   40],  5.00th=[   95], 10.00th=[  131], 20.00th=[  176],
     | 30.00th=[  210], 40.00th=[  237], 50.00th=[  260], 60.00th=[  281],
     | 70.00th=[  306], 80.00th=[  334], 90.00th=[  371], 95.00th=[  396],
     | 99.00th=[  445], 99.50th=[  465], 99.90th=[  553], 99.95th=[  766],
     | 99.99th=[18220]
   bw (  KiB/s): min=11713, max=57782, per=48.67%, avg=49396.36, stdev=5828.41, samples=223
   iops        : min=  683, max= 3342, avg=2870.70, stdev=339.23, samples=223
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.04%, 20=0.21%, 50=0.63%
  lat (usec)   : 100=2.76%, 250=27.72%, 500=49.29%, 750=19.22%, 1000=0.04%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.02%, 50=0.02%
  lat (msec)   : 2000=0.01%, >=2000=0.01%
  cpu          : usr=14.74%, sys=6.26%, ctx=652295, majf=0, minf=57
  IO depths    : 1=0.1%, 2=0.1%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=630291,322712,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=53.7MiB/s (56.3MB/s), 53.7MiB/s-53.7MiB/s (56.3MB/s-56.3MB/s), io=10.3GiB (11.1GB), run=196577-196577msec
  WRITE: bw=99.1MiB/s (104MB/s), 99.1MiB/s-99.1MiB/s (104MB/s-104MB/s), io=5422MiB (5685MB), run=54706-54706msec

Run status group 1 (all jobs):

Run status group 2 (all jobs):

Run status group 3 (all jobs):

Run status group 4 (all jobs):

Disk stats (read/write):
    dm-42: ios=1501888/711229, merge=0/0, ticks=298820/328030, in_queue=626850, util=90.00%, aggrios=752390/356614, aggrmerge=0/0, aggrticks=149070/164570, aggrin_queue=313640, aggrutil=89.56%
    dm-1: ios=752172/355222, merge=0/0, ticks=86620/193730, in_queue=280350, util=89.56%, aggrios=510346/127313, aggrmerge=241826/227891, aggrticks=60705/108070, aggrin_queue=64310, aggrutil=89.04%
    dm-0: ios=510346/127313, merge=241826/227891, ticks=60705/108070, in_queue=64310, util=89.04%, aggrios=127608/31826, aggrmerge=0/0, aggrticks=60663/10801, aggrin_queue=47185, aggrutil=75.46%
  sdag: ios=128399/31784, merge=0/0, ticks=40480/10901, in_queue=26670, util=69.87%
  sdq: ios=126825/31380, merge=0/0, ticks=50803/10817, in_queue=37360, util=75.46%
  sda: ios=126817/31945, merge=0/0, ticks=76774/10718, in_queue=63620, util=69.88%
  sdaw: ios=128392/32198, merge=0/0, ticks=74598/10768, in_queue=61090, util=69.30%
    dm-3: ios=752609/358007, merge=0/0, ticks=211520/135410, in_queue=346930, util=89.31%, aggrios=516773/124292, aggrmerge=235836/233689, aggrticks=144446/44814, aggrin_queue=84340, aggrutil=89.31%
    dm-2: ios=516773/124292, merge=235836/233689, ticks=144446/44814, in_queue=84340, util=89.31%, aggrios=129215/31071, aggrmerge=0/0, aggrticks=65874/10714, aggrin_queue=52195, aggrutil=75.55%
  sdb: ios=128423/31267, merge=0/0, ticks=93050/10612, in_queue=79690, util=75.55%
  sdax: ios=130008/31426, merge=0/0, ticks=79287/10692, in_queue=65580, util=69.38%
  sdah: ios=130003/30966, merge=0/0, ticks=39890/10773, in_queue=25880, util=69.42%
  sdr: ios=128426/30627, merge=0/0, ticks=51272/10781, in_queue=37630, util=75.50%
root@t83lp32:~/fio-fio-3.18# 
@sitsofe
Copy link
Collaborator

sitsofe commented Aug 13, 2020

Hi @tdi2004,

Could you:

  • Use a master version of fio
  • Minimise the fio to the smallest number of lines that still show the issue (i.e. run with a few jobs as possible (e.g. 2), remove each global line in turn and see if it has an impact)?

I tried with 84abeef and the following but could not reproduce your issue:

fio --rwmixread=50 --numjobs=1 --rw=randrw --exitall_on_error=1 --runtime=1d --time_based=1 --size=10M --name=1 --directory=1 --name=2 --directory=2

@sitsofe sitsofe added the needreporterinfo Waiting on information from the issue reporter label Aug 13, 2020
@tdi2004
Copy link
Author

tdi2004 commented Aug 13, 2020

Hi @sitsofe ,

  1. yes, I tried that with 3.19 and 3.21
  2. no impact:
# ~/fio-fio-3.21/fio --rwmixread=50 --numjobs=1 --exitall_on_error=1 --runtime=30s --time_based=1 --size=10M --name=LV1 --directory=/mnt1/LV1 --name=LV2 --directory=/mnt1/LV2
LV1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
LV2: (g=1): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.21
Starting 2 processes

The following output line pointed me to the conclusion, that fio now starts the jobs on various directories sequentially (instead of parallel, like it did before):

Jobs: 1 (f=1): [R(1),P(1)][4.9%][r=1270MiB/s][r=325k IOPS][eta 00m:58s]

=> almost 60s = 2*30s
and after 60s both directories had been treated by fio for 30 seconds.
So, the parallel execution doesn't work with a commandline invocation either.

@sitsofe
Copy link
Collaborator

sitsofe commented Aug 14, 2020

@tdi2004 ,

I still can't reproduce what you're seeing but I think the job can be minimised further. Let's check a few things:

  • Can you clone the fio source from GithHub using git and then do your build
  • With the fio you built above can you reproduce the problem with the following exact job (and if not what is the bare minimum you need to add to this job to the issue appear)?
fio --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2
  • Add --debug=process --output-format=json to the above and attach the full output that is produced to this issue, for example:
fio --debug=process --output-format=json --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2

@hannesweisbach
Copy link
Contributor

hannesweisbach commented Aug 14, 2020

Hi @tdi2004,

unfortunately I'm also not able to reproduce the behaviour you're describing. Using git tag fio-3.21 I get the following output:

time fio --rwmixread=50 --numjobs=1 --exitall_on_error=1 --runtime=30s --time_based=1 --size=10M --name=1   --directory=/tmp/1   --name=2 --directory=/tmp/2 --rw=randrw
1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
2: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1

Runtime is, as you expect, roughly 30s, so the jobs do infact run in parallel. What I noticed is that your jobs run in different groups (notice the g= in parentheses at the beginning of each line), where your jobs run in different groups. I have no idea why that is … yet :)

Edit: I also tested current master g5c79. Result: "Works for me". (Although that probably doesn't help much.)

@tdi2004
Copy link
Author

tdi2004 commented Aug 14, 2020

# ~/fio-fio-3.21/fio --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2

ran a job on /tmp/1 for 5s
when finished, it ran a another job on /tmp/2 for 5s.
Jobs were not executed in parallel.

@tdi2004
Copy link
Author

tdi2004 commented Aug 14, 2020

# ~/fio-fio-3.21/fio --debug=process --output-format=json --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2 > fio.log
attaching fio.log
fio.log

@tdi2004
Copy link
Author

tdi2004 commented Aug 14, 2020

Runtime is, as you expect, roughly 30s,

Not here. That's the culprit. For me it is 60s. My architecture is s390x - I can't imagine, that the forking stuff is architecture dependent.

time ~/fio-fio-3.21/fio --rwmixread=50 --numjobs=1 --exitall_on_error=1 --runtime=30s --time_based=1 --size=10M --name=1   --directory=/tmp/1   --name=2 --directory=/tmp/2 --rw=randrw --output=/dev/null
1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
Jobs: 1 (f=1): [_(1),m(1)][100.0%][r=20.1MiB/s,w=20.4MiB/s][r=5152,w=5222 IOPS][eta 00m:00s]
real	1m1.059s
user	0m1.391s
sys	0m5.880s

@hannesweisbach
Copy link
Contributor

Ah, I was about to ask about your compiler/architecture. I was expecting Linux/x86_64.

@sitsofe
The log showed that stonewall is set, which shouldn't be set. I suspect that the issue has to do with cconv, because the value is 16 bit. There might be something funky going on. 64402a8 changed the offset/alignment of top->stonewall.

@sitsofe
Copy link
Collaborator

sitsofe commented Aug 14, 2020

@hannesweisbach good sleuthing. The type of stonewall is FIO_OPT_STR_SET and the type of exit_what is FIO_OPT_STR too. I wonder if the option parsing is writing over the adjacent variable?

@sitsofe sitsofe changed the title version 3.18 does not start processes on more than one directory version 3.18 sets stonewall incorrectly on s390 Aug 14, 2020
@sitsofe
Copy link
Collaborator

sitsofe commented Aug 14, 2020

Comping with

CC=clang ./configure --disable-optimizations --extra-cflags="-fsanitize=memory -fno-omit-frame-pointer -g" && make -j 8

then running

./fio --debug=all --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2```

stopped with the following error:

fio: set all debug options
==9921==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x54212c in parse_cmd_line /fio/init.c:2659:35
    #1 0x549cfa in parse_options /fio/init.c:2975:14
    #2 0x8b9b9d in main /fio/fio.c:42:6
    #3 0x7ff3af8cb0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #4 0x42698d in _start (/fio/fio+0x42698d)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /fio/init.c:2659:35 in parse_cmd_line
Exiting

So that's a starting point... which turned out to be a dead end.

@XeS0r
Copy link
Contributor

XeS0r commented Aug 14, 2020

The issue should be located in int __handle_option().
It writes to an int pointer when it handles FIO_OPT_STR for the exit_what default value. It could also be an issue that FIO_OPT_STR_SET for stonewall now writes to other fields as well because it also uses an int pointer. If you also think this is the issue, then I can send you the patch I've already written and tested for this issue.

@hannesweisbach
Copy link
Contributor

hannesweisbach commented Aug 14, 2020

@sitsofe @XeS0r

I agree. Call stack is:

__handle_option()
val_store()
td_var()

the latter creates in int * which points to an unsigned short. val_store writes to this pointer of wrong size.
I have no idea, why this isn't an issue on x86/_64.

@XeS0r
Copy link
Contributor

XeS0r commented Aug 14, 2020

I guess it has another member alignment or it has less impact because it's little endian. Do you want me to open a pull request?

@hannesweisbach
Copy link
Contributor

Sure, go ahead. Although I'm not a maintainer.

XeS0r added a commit to XeS0r/fio that referenced this issue Aug 14, 2020
Fixes: 64402a8 ("Expand choices for exitall")
Fixes: axboe#1065
Signed-off-by: André Wild <wild.andre.ae@gmail.com>
@sitsofe
Copy link
Collaborator

sitsofe commented Aug 14, 2020

@hannesweisbach x86/amd64 is notorious for quietly fixing up alignment issues at a minor speed penalty (https://stackoverflow.com/a/7517370 ).

@tdi2004 tdi2004 changed the title version 3.18 sets stonewall incorrectly on s390 version 3.18 sets stonewall incorrectly on s390x Aug 14, 2020
@hannesweisbach
Copy link
Contributor

hannesweisbach commented Aug 14, 2020

@sitsofe Sure, int has no alignment requirement on x86, but it's also a write of the wrong size.

XeS0r added a commit to XeS0r/fio that referenced this issue Aug 14, 2020
Fixes: 64402a8 ("Expand choices for exitall")
Fixes: axboe#1065
Signed-off-by: André Wild <wild.andre.ae@gmail.com>
XeS0r added a commit to XeS0r/fio that referenced this issue Aug 14, 2020
Fixes: 64402a8 ("Expand choices for exitall")
Fixes: axboe#1065
Signed-off-by: André Wild <wild.andre.ae@gmail.com>
@axboe axboe closed this as completed in fd56c23 Aug 14, 2020
@sitsofe sitsofe removed the needreporterinfo Waiting on information from the issue reporter label Aug 15, 2020
@sitsofe
Copy link
Collaborator

sitsofe commented Aug 15, 2020

@tdi2004 thanks for persevering, @hannesweisbach, @XeS0r thanks for sorting this one!

Out of idle curiosity I checked what happened before this fix by running the following

fio --exit_what=all --ioengine=null --size=1T --name=1 --name=2

This showed the jobs going into different groups and --debug=process showed stonewall was active. To me this proves @XeS0r's endianness theory from #1065 (comment) because a setting of of all inserts a value of -1 (which sets all bits to 1 in an unsigned int) and causes stonewall to become set too due to overrun. Normally exit_what defaults to 1 and on a big endian machine like s390x (where the least significant bit ends up in the highest address) this means it would have set exit_what to 0 and stonewall to 1 due to the overrun.

After the fix in fd56c23 the above no longer causes two different groups to be made and there's no stonewall output when looking through the debug.

I had hoped that additional tooling could help detect this case but it doesn't look like it:

#include <stdio.h> 
#include <stddef.h>
struct st {
        unsigned long long l1;
        unsigned short s1;
        unsigned short s2;
        unsigned int i1;
} comp = {.l1 = 0, .s1 = 1, .s2 = 2, .i1 = 3};


int main(void) {
        unsigned int *ip;
        void *p;

        p = (char *) &comp + offsetof(struct st, s1);
        ip = p;
        *ip = 444 << 16 | 777;
        printf("Address of .s1: %p .s2: %p\n", &(comp.s1), &(comp.s2));
        printf("Value of .l1: %lld .s1: %d .s2: %d i1: %d\n", comp.l1, comp.s1,
               comp.s2, comp.i1);
}

https://github.com/google/sanitizers/wiki/AddressSanitizerIntraObjectOverflow suggests only limited cases are catered for when it comes to intra object overflow (e.g. within a struct). Because it also defeats the type system I suspect this would be even tougher to detect statically.

@hannesweisbach
Copy link
Contributor

Jup. I though I was being smart using uint16_ts … going 32-bit for those options is probably the right call.

rbates0119 pushed a commit to rbates0119/fio that referenced this issue Oct 28, 2020
Fixes: 64402a8 ("Expand choices for exitall")
Fixes: axboe#1065
Signed-off-by: André Wild <wild.andre.ae@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants