Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fio hangs #1626

Closed
1 task
microyahoo opened this issue Sep 11, 2023 · 2 comments
Closed
1 task

fio hangs #1626

microyahoo opened this issue Sep 11, 2023 · 2 comments

Comments

@microyahoo
Copy link

microyahoo commented Sep 11, 2023

Please acknowledge the following before creating a ticket

Description of the bug:

fio hangs for long time, I'm not sure if it has anything to do with OOM.
20230911-195147

[root@bd-hdd03-node02 ~]# ps -ef | grep fio
root       74918       1  6 14:31 ?        00:18:15 ./fio-benchmark --output-file out --render-format html --config-file conf.yaml --chart-file chart --dryrun=false
root      635797   74918  2 17:35 ?        00:02:08 fio --name randrw-fdb1c598-2ff2-46a5-8acd-6c054e976447 --filename /dev/sdj --numjobs 64 --time_based --ioengine libaio --bs 4K --rw randrw --direct 1 --group_reporting --iodepth 8 --runtime 120s --output-format json
root      635799   74918  2 17:35 ?        00:02:31 fio --name randrw-700e7392-f94b-4377-aee2-93bb33e9226a --filename /dev/sdh --numjobs 64 --time_based --ioengine libaio --bs 4K --rw randrw --direct 1 --group_reporting --iodepth 8 --runtime 120s --output-format json
root      635800   74918  2 17:35 ?        00:02:46 fio --name randrw-b1c89708-3c57-4b60-9032-65edaf49ad29 --filename /dev/sdp --numjobs 64 --time_based --ioengine libaio --bs 4K --rw randrw --direct 1 --group_reporting --iodepth 8 --runtime 120s --output-format json
root      637821   74918  3 17:36 ?        00:03:35 fio --name randwrite-7a24aac0-793e-4247-9c76-99f4640839d9 --filename /dev/sdq --numjobs 64 --time_based --ioengine libaio --bs 4K --rw randwrite --direct 1 --group_reporting --iodepth 32 --runtime 120s --output-format json
root      637908   74918  2 17:36 ?        00:03:04 fio --name randwrite-d6d7cf8d-428e-4f67-bc23-821efce4842b --filename /dev/sdi --numjobs 64 --time_based --ioengine libaio --bs 4K --rw randwrite --direct 1 --group_reporting --iodepth 32 --runtime 120s --output-format json
root      638185   74918  1 17:36 ?        00:02:02 fio --name randread-c374304c-6c06-40ed-8c92-7fd3dde85518 --filename /dev/sdd --numjobs 64 --time_based --ioengine libaio --bs 4K --rw randread --direct 1 --group_reporting --iodepth 32 --runtime 120s --output-format json
root      638189   74918  2 17:36 ?        00:02:17 fio --name randread-371ca4c6-aa53-4c83-8e36-1cbe56b8947e --filename /dev/sdb --numjobs 64 --time_based --ioengine libaio --bs 4K --rw randread --direct 1 --group_reporting --iodepth 32 --runtime 120s --output-format json
root      639134   74918  2 17:36 ?        00:02:12 fio --name randwrite-f55c0d4a-c2e8-4172-b309-cff3d9f0c656 --filename /dev/sdm --numjobs 64 --time_based --ioengine libaio --bs 4K --rw randwrite --direct 1 --group_reporting --iodepth 32 --runtime 120s --output-format json
root      794155  786882  0 19:20 pts/0    00:00:00 grep --color=auto fio

strace with fio process

[root@bd-hdd03-node02 ~]# strace -f -p 637908
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                    
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000},  <unfinished ...>
[pid 639308] <... select resumed>)      = 0 (Timeout)                                                                                        
[pid 639308] read(3, 0x7f8ea0cc362f, 1) = -1 EAGAIN (Resource temporarily unavailable)
[pid 639308] read(3, 0x7f8ea0cc362f, 1) = -1 EAGAIN (Resource temporarily unavailable)           
[pid 639308] select(1, [], NULL, [], {tv_sec=0, tv_usec=250000} <unfinished ...>
[pid 637908] <... nanosleep resumed>NULL) = 0                                                                                                
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                             
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                             
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                             
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                              
 [pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                                    
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                              
 [pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                                    
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                                  
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                                  
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                                    
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0        
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                                            
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                                            
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0                                                                               
[pid 637908] stat("/tmp/fio-dump-status", 0x7ffdf9035820) = -1 ENOENT (No such file or directory)                                            
[pid 637908] nanosleep({tv_sec=0, tv_nsec=10000000}, ^Cstrace: Process 637908 detached           
 <detached ...>                                                       
strace: Process 639308 detached 

Environment:
CentOS Linux release 8.5.2111
Linux bd-hdd03-node03 4.18.0-348.el8.x86_64 #1 SMP Tue Oct 19 15:14:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

fio version:
fio-3.19

Reproduction steps

@vincentkfu
Copy link
Collaborator

I'm able to run the following similar job without issue:

vincent@localhost:~$ sudo fio --name=test --filename=/dev/nvme0n1 --numjobs 512 --time_based --ioengine libaio --bs 4K --rw randread --direct 1 --group_reporting --iodepth 32 --runtime 120s
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, (C) 4096B-4096B, ioengine=libaio, iodepth=32
...
fio-3.35-15-gd0da
Starting 512 processes
Jobs: 481 (f=453): [r(2),f(1),r(3),f(1),r(2),_(1),r(6),_(1),r(2),f(1),r(5),f(1),r(11),_(2),r(2),_(1),r(1),_(2),r(1),_(1),f(1),_(3),f(1),r(2),_(2),f(1),r(1),f(1),r(1),_(1),f(1),r(1),_(3),f(2),_(1),f(1),r(1),_(3),r(1),_(1),r(1),_(2),r(1),_(3),r(1),_(2),r(2),_(1),f(2),r(1),_(1),r(9),f(1),Jobs: 369 (f=183): [_(1),f(1),_(1),f(1),r(1),_(3),f(1),_(1),f(6),_(1),f(1),_(2),f(1),_(1),E(1),f(2),_(1),f(1),_(1),f(3),r(1),_(1),f(4),_(2),f(1),r(1),_(11),r(1),f(1),_(3),r(1),_(1),f(1),_(2),r(1),_(7),f(1),_(8),r(1),_(7),r(1),_(3),r(1),_(1),f(1),r(8),_(1),r(1),_(1),r(1),_(1),f(1),_(1),r(5),f(1),r(2),f(1),r(2),f(1),r(3),f(2),r(1),_(1),f(1),r(1),_(1),f(1),r(1),f(1),r(1),f(1),_(2),r(1),f(4),r(1),f(1),_(1),f(2),r(3),f(2),r(3),f(1),r(1),f(1),r(2),f(1),r(2),f(1),_(2),f(1),r(1),_(1),r(6),f(5),_(2),f(1),r(1),_(1),f(2),r(1),f(1),r(2),_(2),r(1),f(1),_(2),f(3),_(1),f(2),_(2),f(1),_(1),r(2),_(1),f(2),r(1),f(1),r(1),_(3),r(3),f(1),_(1),f(2),_(1),r(3),f(1),_(2),r(1),_(1),r(1),_(1),f(1),_(3),f(1),r(2),f(2),_(1),r(1),f(1),r(3),_(1),f(1),r(1),_(1),f(1),r(1),f(2),r(2),_(1),r(1),f(1),_(1),f(2),r(1),f(6),r(2),_(1),r(1),f(3),_(1),r(1),f(1),r(1),f(2),_(1),f(1),_(1),f(1),_(1),f(1),r(1),f(5),r(1),f(1),r(2),f(1),r(2),_(1),r(1),f(4),r(2),f(1),_(2),r(1),_(1),f(1),r(1),f(2),_(1),r(2),f(7),r(3),f(1),_(1),f(2),r(3),f(2),_(1),f(1),r(1),f(3),r(1),f(1),_(1),f(3),r(1),f(1),r(1),f(1),r(6),_(1),f(2),r(1),_(1),r(2),_(3),f(1),r(1),f(1),r(1),_(1),r(3),f(1),_(1),f(1),r(1),f(1),r(1),f(2),r(2),f(1),r(1),_(1),r(1),_(1),r(1),_(2),r(2),f(3),r(3),f(1),r(1),f(1),r(1),_(1),r(1),f(1),r(1),_(1),f(1),_(2),f(1),r(1),f(1),r(3),f(1),r(2),f(1),r(1),f(1),r(4),_(1),r(1),_(1),f(1),r(2),f(2),r(2),_(2),r(1),f(1),r(1),_(2),f(2),r(2),f(1),_(1),r(2),f(2),r(2),_(1),r(2)Jobs: 176 (f=0): [_(1),f(1),_(2),E(1),_(5),f(2),_(14),f(1),_(6),f(1),_(18),f(1),_(4),f(1),_(29),f(1),_(6),f(5),_(1),f(1),_(2),f(1),_(1),f(1),_(1),f(1),_(1),f(1),_(2),f(1),_(2),f(1),_(3),f(1),_(2),f(1),_(3),f(1),_(1),f(1),_(3),f(1),_(2),f(1),_(4),f(2),E(1),_(1),f(1),_(3),f(1),_(1),f(1),_(1),f(2),_(1),f(3),_(1),E(1),f(1),_(1),f(3),_(2),f(1),_(2),f(1),_(3),f(1),_(1),f(1),_(9),f(1),_(2),f(2),_(3),f(2),_(2),f(3),_(2),f(1),_(4),f(2),_(2),f(1),_(1),f(1),E(1),_(3),f(3),_(2),f(2),_(1),f(4),_(4),f(1),_(9),f(1),_(2),f(1),E(1),f(2),_(1),f(2),_(3),f(2),E(1),f(1),_(2),f(1),_(2),f(5),_(3),f(2),_(2),f(1),_(3),f(1),_(1),f(2),_(6),f(1),E(1),f(1),_(2),f(1),_(3),f(1),_(2),f(2),_(3),f(1),_(1),f(3),_(3),f(1),_(1),f(1),_(2),f(1),_(1),f(1),_(6),E(1),_(1),f(4),_(2),f(1),_(1),f(1),_(1),f(1),_(5),f(2),_(1),f(1),_(4),f(2),_(2),f(2),_(3),f(1),_(2),f(2),_(2),f(1),_(4),f(1),_(4),f(2),_(6),f(1),_(1),f(2),_(8),f(1),_(1),f(5),_(1),f(2),_(1),f(1),_(2),f(2),_(1),f(1),_(6),f(7),_(1),f(1),_(1),f(2),_(1),f(1),_(1),f(1),_(1),f(1),_(3),f(1),_(2),f(2),_(3),f(2),_(1),E(1),_(2),f(4),_(3),f(1),_(1),f(1),_(1),f(2),_(9),f(1),_(3),f(5),_(2),f(1),_(1),f(1),_(2),f(1),_(4),f(1),_(1),f(1),_(2),f(1)][100.0%][r=923MiB/s][r=236k IOPS][eta 00m:00s]                                                                                                                                                                                                                          
test: (groupid=0, jobs=512): err= 0: pid=792474: Thu Sep 14 10:51:40 2023
  read: IOPS=688k, BW=2689MiB/s (2820MB/s)(315GiB/120102msec)
    slat (nsec): min=1075, max=51861k, avg=16439.93, stdev=209257.38
    clat (usec): min=4, max=204796, avg=23736.71, stdev=20168.59
     lat (usec): min=15, max=204800, avg=23753.15, stdev=20198.67
    clat percentiles (usec):
     |  1.00th=[  1237],  5.00th=[  3458], 10.00th=[  5473], 20.00th=[  8979],
     | 30.00th=[ 12256], 40.00th=[ 15401], 50.00th=[ 19006], 60.00th=[ 22676],
     | 70.00th=[ 27395], 80.00th=[ 33424], 90.00th=[ 44827], 95.00th=[ 60556],
     | 99.00th=[107480], 99.50th=[109577], 99.90th=[112722], 99.95th=[114820],
     | 99.99th=[121111]
   bw (  MiB/s): min=  715, max= 8726, per=99.86%, avg=2685.48, stdev= 1.87, samples=122351
   iops        : min=183168, max=2233869, avg=687356.81, stdev=478.89, samples=122351
  lat (usec)   : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.02%, 250=0.07%
  lat (usec)   : 500=0.15%, 750=0.20%, 1000=0.26%
  lat (msec)   : 2=1.43%, 4=4.15%, 10=16.85%, 20=29.74%, 50=39.48%
  lat (msec)   : 100=5.24%, 250=2.41%
  cpu          : usr=0.71%, sys=1.33%, ctx=77406759, majf=0, minf=17933372
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
   issued rwtcs: total=82684914,0,0,0,0 short=0,0,0,0,0 dropped=0,0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=2689MiB/s (2820MB/s), 2689MiB/s-2689MiB/s (2820MB/s-2820MB/s), io=315GiB (339GB), run=120102-120102msec

Disk stats (read/write):
  nvme0n1: ios=82535567/0, merge=0/0, ticks=1955861148/0, in_queue=1955861148, util=99.28%
vincent@localhost:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

During the run free -h reported max memory usage of 69GiB.

@microyahoo
Copy link
Author

hi @vincentkfu, thanks for your quick response, I'm not sure if it has anything to do with OOM-killer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants