Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIO core dump at >> td_io_queue (td=td@entry=0x7fd1ff6b18a0, io_u=io_u@entry=0x201f140) at ioengines.c:342 #1408

Closed
1 task done
chamarthy opened this issue Jun 11, 2022 · 4 comments

Comments

@chamarthy
Copy link

chamarthy commented Jun 11, 2022

Please acknowledge the following before creating a ticket

Description of the bug:
On RHEL8.6, i am receiving core dump of fio when executing with following options

fio --name=nfs --directory=/mnt/nfs1 --ioengine=libaio --direct=1 --size=1GiB --bsrange=4k-256k --nrfiles=16 --create_on_open=1 --refill_buffers=0 --time_based=1 --ru
ntime=99999 --lockfile=none --file_service_type=random:8 --norandommap --file_append=1 --iodepth=64 --exitall_on_error --
numjobs=16 --rw=randrw
...
fio-3.30-48-g26fa
Starting 16 processes
fio: ioengines.c:342: td_io_queue: Assertion `fio_file_open(io_u->file)' failed.9,w=730 IOPS][eta 01d:03h:45m:37s]
fio: pid=32956, got signal=6m(1),_(1),m(10)][0.1%][r=96.2MiB/s,w=89.8MiB/s][r=748,w=716 IOPS][eta 01d:03h:45m:36s]
Jobs: 2 (f=4): [_(1),m(1),_(2),K(1),_(4),m(1),_(6)][0.1%][r=86.8MiB/s,w=87.8MiB/s][r=666,w=678 IOPS][eta 01d:03h:45m:08s]
....
....
free(): double free detected in tcache 2
Aborted (core dumped)

Environment:
RHEL8.6
Linux 4.18.0-372.9.1.el8.x86_64 #1 SMP Fri Apr 15 22:12:19 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

fio version:
fio --version
fio-3.30-48-g26fa

Reproduction steps
On NFS4 mount, execute the following FIO job
fio --name=nfs --directory=/mnt/nfs1 --ioengine=libaio --direct=1 --size=1GiB --bsrange=4k-256k --nrfiles=16 --create_on_open=1 --refill_buffers=0 --time_based=1 --runtime=99999 --lockfile=none --file_service_type=random:8 --norandommap --file_append=1 --iodepth=64 --exitall_on_error --numjobs=16 --rw=randrw

Once the following asset is received, cancel the job.
fio: ioengines.c:342: td_io_queue: Assertion `fio_file_open(io_u->file)' failed.9,w=730 IOPS][eta 01d:03h:45m:37s]

Backtrace from core dump:

#0  0x00007fd1fe1eaa4f in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-189.1.el8.x86_64 libaio-0.3.112-1.el8.x86_64 zlib-1.2.11-18.el8_5.x86_64
(gdb) bt
#0  0x00007fd1fe1eaa4f in raise () from /lib64/libc.so.6
#1  0x00007fd1fe1bddb5 in abort () from /lib64/libc.so.6
#2  0x00007fd1fe1bdc89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007fd1fe1e33a6 in __assert_fail () from /lib64/libc.so.6
#4  0x000000000041d99b in td_io_queue (td=td@entry=0x7fd1ff6b18a0, io_u=io_u@entry=0x201f140) at ioengines.c:342
#5  0x000000000046d95d in io_u_submit (io_u=0x201f140, td=0x7fd1ff6b18a0) at backend.c:587
#6  do_io (bytes_done=0x7ffd8d0026a0, td=0x7fd1ff6b18a0) at backend.c:1073
#7  thread_main (data=data@entry=0x200f420) at backend.c:1850
#8  0x000000000046fce1 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2445
#9  0x000000000046fdc9 in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2593
#10 0x000000000040e28f in main (argc=20, argv=0x7ffd8d00a9e8, envp=<optimized out>) at fio.c:60
@chamarthy
Copy link
Author

chamarthy commented Jun 14, 2022

I have enabled debug and see the following at the end

io       43973 declare unneeded cache /mnt/nfs1/nfs.7.6: 436694176/62500000
file     43973 get_next_file_rand: 0x7f218b9e7a50
file     43973 get_next_file: 0x7f218b9e7a50 [/mnt/nfs1/nfs.7.6]
file     43973 get file /mnt/nfs1/nfs.7.6, ref=1
io       43973 get_next_offset: offset 444972192 >= size 436694176
io       43973 io_u 0x13ca5c0, failed getting offset
file     43973 put file /mnt/nfs1/nfs.7.6, ref=2
file     43973 put file /mnt/nfs1/nfs.7.6, ref=1
file     43973 fd close /mnt/nfs1/nfs.7.6
file     43973 /mnt/nfs1/nfs.7.6: is done (13 of 16)
file     43973 fd open /mnt/nfs1/nfs.7.14
file     43973 file not found in hash /mnt/nfs1/nfs.7.14
file     43973 get file /mnt/nfs1/nfs.7.14, ref=0
io       43973 declare unneeded cache /mnt/nfs1/nfs.7.14: 437480608/62500000
file     43973 get_next_file_rand: 0x7f218b9e8950
file     43973 get_next_file: 0x7f218b9e8950 [/mnt/nfs1/nfs.7.14]
file     43973 get file /mnt/nfs1/nfs.7.14, ref=1
io       43973 get_next_offset: offset 458951840 >= size 437480608
io       43973 io_u 0x13ca5c0, failed getting offset
file     43973 put file /mnt/nfs1/nfs.7.14, ref=2
file     43973 put file /mnt/nfs1/nfs.7.14, ref=1
file     43973 fd close /mnt/nfs1/nfs.7.14
file     43973 /mnt/nfs1/nfs.7.14: is done (14 of 16)
file     43973 fd open /mnt/nfs1/nfs.7.9
file     43973 file not found in hash /mnt/nfs1/nfs.7.9
file     43973 get file /mnt/nfs1/nfs.7.9, ref=0
io       43973 declare unneeded cache /mnt/nfs1/nfs.7.9: 406318240/62500000
file     43973 get_next_file_rand: 0x7f218b9e7ff0
file     43973 get_next_file: 0x7f218b9e7ff0 [/mnt/nfs1/nfs.7.9]
file     43973 get file /mnt/nfs1/nfs.7.9, ref=1
io       43973 get_next_offset: offset 458382496 >= size 406318240
io       43973 io_u 0x13ca5c0, failed getting offset
file     43973 put file /mnt/nfs1/nfs.7.9, ref=2
file     43973 put file /mnt/nfs1/nfs.7.9, ref=1
file     43973 fd close /mnt/nfs1/nfs.7.9
file     43973 /mnt/nfs1/nfs.7.9: is done (15 of 16)
file     43973 fd open /mnt/nfs1/nfs.7.4
file     43973 file not found in hash /mnt/nfs1/nfs.7.4
file     43973 get file /mnt/nfs1/nfs.7.4, ref=0
io       43973 declare unneeded cache /mnt/nfs1/nfs.7.4: 431164576/62500000
file     43973 get_next_file_rand: 0x7f218b9e7690
file     43973 get_next_file: 0x7f218b9e7690 [/mnt/nfs1/nfs.7.4]
file     43973 get file /mnt/nfs1/nfs.7.4, ref=1
io       43973 get_next_offset: offset 463916192 >= size 431164576
io       43973 io_u 0x13ca5c0, failed getting offset
file     43973 put file /mnt/nfs1/nfs.7.4, ref=2
file     43973 put file /mnt/nfs1/nfs.7.4, ref=1
file     43973 fd close /mnt/nfs1/nfs.7.4
file     43973 /mnt/nfs1/nfs.7.4: is done (16 of 16)
file     43973 get_next_file: nr_open=0, nr_done=16, nr_files=16
io       43973 io_u 0x13ca5c0, setting file failed
io       43973 get_io_u failed
file     43973 close files

@chamarthy
Copy link
Author

Issue seems to be occuring only when "--file_append" is enabled. May be FIO is seeking out of range offsets?

@vincentkfu
Copy link
Collaborator

I had a recent patch that might be to blame:

46ec827

You could try reverting it to see if your issue goes away.

Also please try to simplify your job to the minimum set of parameters required to trigger the problem.

@vincentkfu
Copy link
Collaborator

Closing due to lack of response.

@vincentkfu vincentkfu closed this as not planned Won't fix, can't repro, duplicate, stale Jul 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants