version 3.18 sets stonewall incorrectly on s390x #1065

tdi2004 · 2020-08-13T15:49:41Z

Hi @hannesweisbach,
assume the following jobfile:

[global]
blocksize_range=4k-32k
rw=randrw
percentage_random=50
rwmixread=50
iodepth=4
ioengine=libaio
verify=crc32c
verify_fatal=1
verify_dump=1
verify_backlog=100000
time_based
runtime=14d
exitall_on_error=1
numjobs=1

[LV1]
size=133722k
directory=/mnt1/LV1 

[LV2]
size=124315k
directory=/mnt1/LV2

[LV3]
size=124873k
directory=/mnt1/LV3

[LV4]
size=133380k
directory=/mnt1/LV4

[LV5]
size=168411k
directory=/mnt1/LV5

with commit 64402a8 a change in behaviour was introduced.
While older fio versions started the fio parent PID and one fio child PID per each directory to run fio on several mountpoints in parallel, this is now broken. fio runs only on the first directory specified:

root@t83lp32:~/fio-fio-3.18# ./fio jobfile.fio 
LV1: (g=0): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
LV2: (g=1): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
LV3: (g=2): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
LV4: (g=3): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
LV5: (g=4): rw=randrw, bs=(R) 4096B-32.0KiB, (W) 4096B-32.0KiB, (T) 4096B-32.0KiB, ioengine=libaio, iodepth=4
fio-3.18
Starting 5 processes
Jobs: 1 (f=1): [m(1),P(4)][0.0%][r=95.0MiB/s,w=48.1MiB/s][r=5726,w=2854 IOPS][eta 69d:23h:59m:46s]

It shows only one job! Previous version showed five in my case.
Same for pidof: pidof fio shows only 2 (parent and one child) instead of 6 (parent and five children)
killing the parent pid shows in fio, that only LV1 was used:

fio: terminating on signal 15
Jobs: 0 (f=0)
LV1: (groupid=0, jobs=1): err= 0: pid=647846: Thu Aug 13 17:47:45 2020
  read: IOPS=3206, BW=53.7MiB/s (56.3MB/s)(10.3GiB/196577msec)
    slat (nsec): min=0, max=42050M, avg=257430.22, stdev=73773012.21
    clat (usec): min=6, max=42097k, avg=787.08, stdev=127816.71
     lat (usec): min=18, max=42098k, avg=1044.69, stdev=147600.51
    clat percentiles (usec):
     |  1.00th=[   59],  5.00th=[  135], 10.00th=[  178], 20.00th=[  231],
     | 30.00th=[  277], 40.00th=[  322], 50.00th=[  388], 60.00th=[  461],
     | 70.00th=[  498], 80.00th=[  529], 90.00th=[  562], 95.00th=[  586],
     | 99.00th=[  644], 99.50th=[  668], 99.90th=[  775], 99.95th=[ 6194],
     | 99.99th=[25560]
   bw (  KiB/s): min=10717, max=55065, per=85.28%, avg=46924.46, stdev=5540.73, samples=223
   iops        : min=  647, max= 3191, avg=2741.35, stdev=323.20, samples=223
  write: IOPS=5899, BW=99.1MiB/s (104MB/s)(5422MiB/54706msec); 0 zone resets
    slat (usec): min=11, max=248, avg=51.57, stdev=26.86
    clat (nsec): min=1000, max=30439k, avg=259321.41, stdev=311162.57
     lat (usec): min=14, max=30486, avg=310.93, stdev=312.50
    clat percentiles (usec):
     |  1.00th=[   40],  5.00th=[   95], 10.00th=[  131], 20.00th=[  176],
     | 30.00th=[  210], 40.00th=[  237], 50.00th=[  260], 60.00th=[  281],
     | 70.00th=[  306], 80.00th=[  334], 90.00th=[  371], 95.00th=[  396],
     | 99.00th=[  445], 99.50th=[  465], 99.90th=[  553], 99.95th=[  766],
     | 99.99th=[18220]
   bw (  KiB/s): min=11713, max=57782, per=48.67%, avg=49396.36, stdev=5828.41, samples=223
   iops        : min=  683, max= 3342, avg=2870.70, stdev=339.23, samples=223
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.04%, 20=0.21%, 50=0.63%
  lat (usec)   : 100=2.76%, 250=27.72%, 500=49.29%, 750=19.22%, 1000=0.04%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.02%, 50=0.02%
  lat (msec)   : 2000=0.01%, >=2000=0.01%
  cpu          : usr=14.74%, sys=6.26%, ctx=652295, majf=0, minf=57
  IO depths    : 1=0.1%, 2=0.1%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=630291,322712,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=53.7MiB/s (56.3MB/s), 53.7MiB/s-53.7MiB/s (56.3MB/s-56.3MB/s), io=10.3GiB (11.1GB), run=196577-196577msec
  WRITE: bw=99.1MiB/s (104MB/s), 99.1MiB/s-99.1MiB/s (104MB/s-104MB/s), io=5422MiB (5685MB), run=54706-54706msec

Run status group 1 (all jobs):

Run status group 2 (all jobs):

Run status group 3 (all jobs):

Run status group 4 (all jobs):

Disk stats (read/write):
    dm-42: ios=1501888/711229, merge=0/0, ticks=298820/328030, in_queue=626850, util=90.00%, aggrios=752390/356614, aggrmerge=0/0, aggrticks=149070/164570, aggrin_queue=313640, aggrutil=89.56%
    dm-1: ios=752172/355222, merge=0/0, ticks=86620/193730, in_queue=280350, util=89.56%, aggrios=510346/127313, aggrmerge=241826/227891, aggrticks=60705/108070, aggrin_queue=64310, aggrutil=89.04%
    dm-0: ios=510346/127313, merge=241826/227891, ticks=60705/108070, in_queue=64310, util=89.04%, aggrios=127608/31826, aggrmerge=0/0, aggrticks=60663/10801, aggrin_queue=47185, aggrutil=75.46%
  sdag: ios=128399/31784, merge=0/0, ticks=40480/10901, in_queue=26670, util=69.87%
  sdq: ios=126825/31380, merge=0/0, ticks=50803/10817, in_queue=37360, util=75.46%
  sda: ios=126817/31945, merge=0/0, ticks=76774/10718, in_queue=63620, util=69.88%
  sdaw: ios=128392/32198, merge=0/0, ticks=74598/10768, in_queue=61090, util=69.30%
    dm-3: ios=752609/358007, merge=0/0, ticks=211520/135410, in_queue=346930, util=89.31%, aggrios=516773/124292, aggrmerge=235836/233689, aggrticks=144446/44814, aggrin_queue=84340, aggrutil=89.31%
    dm-2: ios=516773/124292, merge=235836/233689, ticks=144446/44814, in_queue=84340, util=89.31%, aggrios=129215/31071, aggrmerge=0/0, aggrticks=65874/10714, aggrin_queue=52195, aggrutil=75.55%
  sdb: ios=128423/31267, merge=0/0, ticks=93050/10612, in_queue=79690, util=75.55%
  sdax: ios=130008/31426, merge=0/0, ticks=79287/10692, in_queue=65580, util=69.38%
  sdah: ios=130003/30966, merge=0/0, ticks=39890/10773, in_queue=25880, util=69.42%
  sdr: ios=128426/30627, merge=0/0, ticks=51272/10781, in_queue=37630, util=75.50%
root@t83lp32:~/fio-fio-3.18#

The text was updated successfully, but these errors were encountered:

sitsofe · 2020-08-13T19:10:04Z

Hi @tdi2004,

Could you:

Use a master version of fio
Minimise the fio to the smallest number of lines that still show the issue (i.e. run with a few jobs as possible (e.g. 2), remove each global line in turn and see if it has an impact)?

I tried with 84abeef and the following but could not reproduce your issue:

fio --rwmixread=50 --numjobs=1 --rw=randrw --exitall_on_error=1 --runtime=1d --time_based=1 --size=10M --name=1 --directory=1 --name=2 --directory=2

tdi2004 · 2020-08-13T21:13:19Z

Hi @sitsofe ,

yes, I tried that with 3.19 and 3.21
no impact:

# ~/fio-fio-3.21/fio --rwmixread=50 --numjobs=1 --exitall_on_error=1 --runtime=30s --time_based=1 --size=10M --name=LV1 --directory=/mnt1/LV1 --name=LV2 --directory=/mnt1/LV2
LV1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
LV2: (g=1): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.21
Starting 2 processes

The following output line pointed me to the conclusion, that fio now starts the jobs on various directories sequentially (instead of parallel, like it did before):

Jobs: 1 (f=1): [R(1),P(1)][4.9%][r=1270MiB/s][r=325k IOPS][eta 00m:58s]

=> almost 60s = 2*30s
and after 60s both directories had been treated by fio for 30 seconds.
So, the parallel execution doesn't work with a commandline invocation either.

sitsofe · 2020-08-14T06:25:40Z

@tdi2004 ,

I still can't reproduce what you're seeing but I think the job can be minimised further. Let's check a few things:

Can you clone the fio source from GithHub using git and then do your build
With the fio you built above can you reproduce the problem with the following exact job (and if not what is the bare minimum you need to add to this job to the issue appear)?

fio --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2

Add --debug=process --output-format=json to the above and attach the full output that is produced to this issue, for example:

fio --debug=process --output-format=json --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2

hannesweisbach · 2020-08-14T06:30:46Z

Hi @tdi2004,

unfortunately I'm also not able to reproduce the behaviour you're describing. Using git tag fio-3.21 I get the following output:

time fio --rwmixread=50 --numjobs=1 --exitall_on_error=1 --runtime=30s --time_based=1 --size=10M --name=1   --directory=/tmp/1   --name=2 --directory=/tmp/2 --rw=randrw
1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
2: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1

Runtime is, as you expect, roughly 30s, so the jobs do infact run in parallel. What I noticed is that your jobs run in different groups (notice the g= in parentheses at the beginning of each line), where your jobs run in different groups. I have no idea why that is … yet :)

Edit: I also tested current master g5c79. Result: "Works for me". (Although that probably doesn't help much.)

tdi2004 · 2020-08-14T07:38:48Z

# ~/fio-fio-3.21/fio --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2

ran a job on /tmp/1 for 5s
when finished, it ran a another job on /tmp/2 for 5s.
Jobs were not executed in parallel.

tdi2004 · 2020-08-14T07:43:39Z

# ~/fio-fio-3.21/fio --debug=process --output-format=json --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2 > fio.log
attaching fio.log
fio.log

tdi2004 · 2020-08-14T07:49:48Z

Runtime is, as you expect, roughly 30s,

Not here. That's the culprit. For me it is 60s. My architecture is s390x - I can't imagine, that the forking stuff is architecture dependent.

time ~/fio-fio-3.21/fio --rwmixread=50 --numjobs=1 --exitall_on_error=1 --runtime=30s --time_based=1 --size=10M --name=1   --directory=/tmp/1   --name=2 --directory=/tmp/2 --rw=randrw --output=/dev/null
1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
Jobs: 1 (f=1): [_(1),m(1)][100.0%][r=20.1MiB/s,w=20.4MiB/s][r=5152,w=5222 IOPS][eta 00m:00s]
real	1m1.059s
user	0m1.391s
sys	0m5.880s

hannesweisbach · 2020-08-14T07:56:03Z

Ah, I was about to ask about your compiler/architecture. I was expecting Linux/x86_64.

@sitsofe
The log showed that stonewall is set, which shouldn't be set. I suspect that the issue has to do with cconv, because the value is 16 bit. There might be something funky going on. 64402a8 changed the offset/alignment of top->stonewall.

sitsofe · 2020-08-14T08:15:23Z

@hannesweisbach good sleuthing. The type of stonewall is FIO_OPT_STR_SET and the type of exit_what is FIO_OPT_STR too. I wonder if the option parsing is writing over the adjacent variable?

sitsofe · 2020-08-14T09:56:42Z

Comping with

CC=clang ./configure --disable-optimizations --extra-cflags="-fsanitize=memory -fno-omit-frame-pointer -g" && make -j 8

then running

./fio --debug=all --runtime=5 --time_based=1 --size=10M --name=1 --directory=/tmp/1 --name=2 --directory=/tmp/2```

stopped with the following error:

fio: set all debug options
==9921==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x54212c in parse_cmd_line /fio/init.c:2659:35
    #1 0x549cfa in parse_options /fio/init.c:2975:14
    #2 0x8b9b9d in main /fio/fio.c:42:6
    #3 0x7ff3af8cb0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #4 0x42698d in _start (/fio/fio+0x42698d)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /fio/init.c:2659:35 in parse_cmd_line
Exiting

So that's a starting point... which turned out to be a dead end.

XeS0r · 2020-08-14T12:17:52Z

The issue should be located in int __handle_option().
It writes to an int pointer when it handles FIO_OPT_STR for the exit_what default value. It could also be an issue that FIO_OPT_STR_SET for stonewall now writes to other fields as well because it also uses an int pointer. If you also think this is the issue, then I can send you the patch I've already written and tested for this issue.

hannesweisbach · 2020-08-14T12:40:18Z

@sitsofe @XeS0r

I agree. Call stack is:

__handle_option()
val_store()
td_var()

the latter creates in int * which points to an unsigned short. val_store writes to this pointer of wrong size.
I have no idea, why this isn't an issue on x86/_64.

XeS0r · 2020-08-14T13:15:44Z

I guess it has another member alignment or it has less impact because it's little endian. Do you want me to open a pull request?

hannesweisbach · 2020-08-14T13:36:01Z

Sure, go ahead. Although I'm not a maintainer.

Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>

sitsofe · 2020-08-14T15:21:24Z

@hannesweisbach x86/amd64 is notorious for quietly fixing up alignment issues at a minor speed penalty (https://stackoverflow.com/a/7517370 ).

hannesweisbach · 2020-08-14T15:25:15Z

@sitsofe Sure, int has no alignment requirement on x86, but it's also a write of the wrong size.

Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>

sitsofe · 2020-08-15T11:15:34Z

@tdi2004 thanks for persevering, @hannesweisbach, @XeS0r thanks for sorting this one!

Out of idle curiosity I checked what happened before this fix by running the following

fio --exit_what=all --ioengine=null --size=1T --name=1 --name=2

This showed the jobs going into different groups and --debug=process showed stonewall was active. To me this proves @XeS0r's endianness theory from #1065 (comment) because a setting of of all inserts a value of -1 (which sets all bits to 1 in an unsigned int) and causes stonewall to become set too due to overrun. Normally exit_what defaults to 1 and on a big endian machine like s390x (where the least significant bit ends up in the highest address) this means it would have set exit_what to 0 and stonewall to 1 due to the overrun.

After the fix in fd56c23 the above no longer causes two different groups to be made and there's no stonewall output when looking through the debug.

I had hoped that additional tooling could help detect this case but it doesn't look like it:

#include <stdio.h> 
#include <stddef.h>
struct st {
        unsigned long long l1;
        unsigned short s1;
        unsigned short s2;
        unsigned int i1;
} comp = {.l1 = 0, .s1 = 1, .s2 = 2, .i1 = 3};


int main(void) {
        unsigned int *ip;
        void *p;

        p = (char *) &comp + offsetof(struct st, s1);
        ip = p;
        *ip = 444 << 16 | 777;
        printf("Address of .s1: %p .s2: %p\n", &(comp.s1), &(comp.s2));
        printf("Value of .l1: %lld .s1: %d .s2: %d i1: %d\n", comp.l1, comp.s1,
               comp.s2, comp.i1);
}

https://github.com/google/sanitizers/wiki/AddressSanitizerIntraObjectOverflow suggests only limited cases are catered for when it comes to intra object overflow (e.g. within a struct). Because it also defeats the type system I suspect this would be even tougher to detect statically.

hannesweisbach · 2020-08-15T15:27:25Z

Jup. I though I was being smart using uint16_ts … going 32-bit for those options is probably the right call.

Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>

sitsofe added the needreporterinfo Waiting on information from the issue reporter label Aug 13, 2020

sitsofe changed the title ~~version 3.18 does not start processes on more than one directory~~ version 3.18 sets stonewall incorrectly on s390 Aug 14, 2020

XeS0r added a commit to XeS0r/fio that referenced this issue Aug 14, 2020

Use unsigned short type for exit_what and stonewall

0ece906

Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>

XeS0r mentioned this issue Aug 14, 2020

Use unsigned short type for exit_what and stonewall #1066

Merged

tdi2004 changed the title ~~version 3.18 sets stonewall incorrectly on s390~~ version 3.18 sets stonewall incorrectly on s390x Aug 14, 2020

XeS0r added a commit to XeS0r/fio that referenced this issue Aug 14, 2020

Use unsigned int type for exit_what and stonewall

b05a315

Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>

XeS0r added a commit to XeS0r/fio that referenced this issue Aug 14, 2020

thread_options: Use unsigned int type for exit_what and stonewall

cf9f7d3

Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>

axboe closed this as completed in fd56c23 Aug 14, 2020

axboe closed this as completed in #1066 Aug 14, 2020

sitsofe removed the needreporterinfo Waiting on information from the issue reporter label Aug 15, 2020

rbates0119 pushed a commit to rbates0119/fio that referenced this issue Oct 28, 2020

thread_options: Use unsigned int type for exit_what and stonewall

d7cbef2

Fixes: 64402a8 ("Expand choices for exitall") Fixes: axboe#1065 Signed-off-by: André Wild <wild.andre.ae@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

version 3.18 sets stonewall incorrectly on s390x #1065

version 3.18 sets stonewall incorrectly on s390x #1065

tdi2004 commented Aug 13, 2020

sitsofe commented Aug 13, 2020

tdi2004 commented Aug 13, 2020 •

edited

sitsofe commented Aug 14, 2020 •

edited

hannesweisbach commented Aug 14, 2020 •

edited

tdi2004 commented Aug 14, 2020

tdi2004 commented Aug 14, 2020

tdi2004 commented Aug 14, 2020

hannesweisbach commented Aug 14, 2020

sitsofe commented Aug 14, 2020

sitsofe commented Aug 14, 2020 •

edited

XeS0r commented Aug 14, 2020

hannesweisbach commented Aug 14, 2020 •

edited

XeS0r commented Aug 14, 2020

hannesweisbach commented Aug 14, 2020

sitsofe commented Aug 14, 2020

hannesweisbach commented Aug 14, 2020 •

edited

sitsofe commented Aug 15, 2020

hannesweisbach commented Aug 15, 2020

version 3.18 sets stonewall incorrectly on s390x #1065

version 3.18 sets stonewall incorrectly on s390x #1065

Comments

tdi2004 commented Aug 13, 2020

sitsofe commented Aug 13, 2020

tdi2004 commented Aug 13, 2020 • edited

sitsofe commented Aug 14, 2020 • edited

hannesweisbach commented Aug 14, 2020 • edited

tdi2004 commented Aug 14, 2020

tdi2004 commented Aug 14, 2020

tdi2004 commented Aug 14, 2020

hannesweisbach commented Aug 14, 2020

sitsofe commented Aug 14, 2020

sitsofe commented Aug 14, 2020 • edited

XeS0r commented Aug 14, 2020

hannesweisbach commented Aug 14, 2020 • edited

XeS0r commented Aug 14, 2020

hannesweisbach commented Aug 14, 2020

sitsofe commented Aug 14, 2020

hannesweisbach commented Aug 14, 2020 • edited

sitsofe commented Aug 15, 2020

hannesweisbach commented Aug 15, 2020

tdi2004 commented Aug 13, 2020 •

edited

sitsofe commented Aug 14, 2020 •

edited

hannesweisbach commented Aug 14, 2020 •

edited

sitsofe commented Aug 14, 2020 •

edited

hannesweisbach commented Aug 14, 2020 •

edited

hannesweisbach commented Aug 14, 2020 •

edited