New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fio hangs when doing randwrite with io_uring #1195
Comments
I'm pretty sure the sqpoll handling is broken in fio. Does it work if you remove |
Yes. 100/100 pass rate if without this option. |
Can't reproduce this, but I'm running a newer kernel, so who knows... When it's stuck, can you try and do:
where is any PID of fio on the system. If you check top, is fio spinning 100%? If you check top, do you see any io_uring-sq threads? |
Yes
Not visible in top or htop, but I can see it in |
@axboe any suggestions on how to fix or work around that? I could really use some help with that and I'd rather not give up on sqthread_poll option in config file, because it had quite an impact on performance if I recall correctly. |
Those were specific to t/io_uring, it's not used by fio at all. So definitely won't make a difference :-) It might be a kernel issue, I haven't been able to reproduce but I'm also one a drastically newer kernel than you are... |
Hey @axboe how about reproducing it like this? I see this reproduces as well in a virtual machine which kernel is very similar to mine: Assuming you're using Vagrant for work. If not, let me know, I'll prepare steps with plain Qemu. Put Vagrantfile somewhere in your FS (doesn't have to be on NVMe, I'm using Crucial MX500 for this repro)
Create backing file for emulated nvme drive:
Run the VM
SSH into the VM with
Create the job config file:
And run the test:
In my case it hanged on second attempt, same /proc stack as in previous commit. |
@karlatec I don't think Fedora 32 is supported any more... can you reproduce this with Fedora 34's kernel? |
@sitsofe Fedora34 is available in Beta version only at the moment, and Fedora 32 is still supported. I will try later with F33. |
It's probably no big deal as 32/33/34 all have a 5.11 kernel - https://bodhi.fedoraproject.org/updates/?packages=kernel . Could you try with that? |
@sitsofe I tested with VM's running:
and wasn't able to reproduce the issue with previously provided reproduction steps. Must have been something in the Kernel. I guess we can close this issue. Thanks! |
Thanks for following up @karlatec . |
Please acknowledge the following before creating a ticket
Description of the bug:
Fio hangs / fails to finish randwrite job when runnin with io_uring engine.
After starting fio "ETA" line is displayed (as it should be)
Jobs: 1 (f=1): [w(1)][27.3%][w=524MiB/s][w=134k IOPS][eta 00m:08s]
At some point bandwidth and throughput stats are lost:
Jobs: 1 (f=1): [w(1)][72.7%][eta 00m:05s]
After reaching 100% the counter resets back to 0%, "eta" timer shows abnormally high value and fio fails to end. SIGINT and SIGTERM are ignored, and SIGKILL must be used.
Not reproducible 100% of times.
I've tried reproducing this with
--debug=io,file
, but no luck. There's no related messages in dmesg.Environment:
Fedora 33 5.10.19-200.fc33.x86_64
gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)
fio 3.25
liburing from DNF repo - 0.7-3.fc33 (also tested with https://github.com/axboe/liburing/releases/tag/liburing-0.7, reproduces as well)
fio version: 3.25
Reproduction steps
Run fio with following config in a loop, as the reproducibility is not 100%. Something like
for i in $(seq 1 25); do sudo fio config.fio; done
should do the trick.The text was updated successfully, but these errors were encountered: