Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIO reports verify: bad header rand_seed, in the RW mode when iodepth > 1 #1526

Closed
1 task done
KuribohG opened this issue Feb 24, 2023 · 3 comments
Closed
1 task done

Comments

@KuribohG
Copy link

Please acknowledge the following before creating a ticket

Description of the bug:

FIO reports verify error in the RW mode when iodepth > 1.

After debugging, I found this may be the reason: Under RW mode, there may be some IO requests that must be requeued, if there is a read/write switching when iodepth > 1. The rand_seed field in io_u changed when requeue (which I think is not desirable), but fio is still using the original rand_seed for verifying.

I'm not sure if this problem is related to #1503.

Environment: Ubuntu 20.04

fio version: 3.32

Reproduction steps

[random-readers]
group_reporting
rw=rw
do_verify=1
verify=crc32c
ioengine=vsync
threads=1
numjobs=1
blocksize=128k
size=1024k
nrfiles=1
filename_format=testsync.$jobnum
iodepth=4
iodepth_batch_submit=4
iodepth_batch_complete_min=4
iodepth_batch_complete_max=4

Run fio with the configuration above, and the bug will reproduce consistently.

@horshack-dpreview
Copy link
Contributor

@KuribohG Thanks for your submission. I was able to reproduce your issue using the current tip of master. As you discovered, the issue occurs when an I/O request gets requeued. This occurs because of iodepth_batch_complete_max - when that number of I/Os have been submitted (and waiting to complete), the next I/O generated results in an internal FIO_Q_BUSY before submission, which causes the I/O to be queued for later submission. When that I/O is later dequeued and executed, the logic in backend.c::do_io() calls verify.c::populate_verify_io_u(), but populate_verify_io_u() was already called once when the I/O was first generated and requeued with FIO_Q_BUSY - this causes the verify logic's random seed to advance an additional time (generated by verify.c::fill_verify_pattern(), which causes a later verify read I/O to fail because the seeds will be out of sync. Here's the relevant source:

fio/backend.c

Lines 703 to 708 in b5904c0

} else if (io_u->ddir == DDIR_WRITE) {
io_u->ddir = DDIR_READ;
io_u->numberio = td->verify_read_issues;
td->verify_read_issues++;
populate_verify_io_u(td, io_u);
break;

fio/verify.c

Lines 50 to 63 in b5904c0

void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len,
struct io_u *io_u, uint64_t seed, int use_seed)
{
struct thread_options *o = &td->o;
if (!o->verify_pattern_bytes) {
dprint(FD_VERIFY, "fill random bytes len=%u\n", len);
if (!use_seed) {
seed = __rand(&td->verify_state);
if (sizeof(int) != sizeof(long *))
seed *= (unsigned long)__rand(&td->verify_state);
}
io_u->rand_seed = seed;

The fix will likely be to modify backend.c::do_io() to not call verify.c::populate_verify_io_u() for reissued I/Os.

horshack-dpreview added a commit to horshack-dpreview/fio that referenced this issue Feb 26, 2023
On configurations that can cause I/Os to be internally requeued from
FIO_Q_BUSY such as '--iodepth_batch_complete_max', and the workload has
verify enabled, the subsequent verification of the data fails with a bad
verify rand_seed because the pattern for the I/O is generated twice for
the same I/O, causing the seed to become out of sync when the verify is
later performed. The seed is generate twice because do_io() handles the
I/O twice, first when it originates the I/O and again when it later gets
the same I/O back from get_io_u() after it's is pulled from the requeue
list, which is where the first submission landed due to the workload
reaching '--iodepth_batch_complete_max'.

The fix is for do_io() to track when it has generated the verify pattern
for an I/O via a new io_u flag 'IO_U_F_PATTERN_DONE', avoiding a second
call to populate_verify_io_u() when that flag is detected.

Note IO_U_F_PATTERN_DONE is set to 1<<9 instead of the next available
1<<8 because there is already a commit pending that uses 1<<8
(IO_U_F_OVERLAP_LOCK).

Link: axboe#1526

Signed-off-by: Adam Horshack (horshack@live.com)
horshack-dpreview added a commit to horshack-dpreview/fio that referenced this issue Feb 27, 2023
On configurations that can cause I/Os to be internally requeued from
FIO_Q_BUSY such as '--iodepth_batch_complete_max', and the workload has
verify enabled, the subsequent verification of the data fails with a bad
verify rand_seed because the pattern for the I/O is generated twice for
the same I/O, causing the seed to become out of sync when the verify is
later performed. The seed is generate twice because do_io() handles the
I/O twice, first when it originates the I/O and again when it later gets
the same I/O back from get_io_u() after it's is pulled from the requeue
list, which is where the first submission landed due to the workload
reaching '--iodepth_batch_complete_max'.

The fix is for do_io() to track when it has generated the verify pattern
for an I/O via a new io_u flag 'IO_U_F_PATTERN_DONE', avoiding a second
call to populate_verify_io_u() when that flag is detected.

Link: axboe#1526

Signed-off-by: Adam Horshack (horshack@live.com)
@horshack-dpreview
Copy link
Contributor

@KuribohG , The fix for this issue has been merged into master. Can you please verify it fixes on your config and if so, close this issue? Thanks.

@KuribohG
Copy link
Author

KuribohG commented Mar 1, 2023

Thanks! The PR fixed my issue. I will close this.

@KuribohG KuribohG closed this as completed Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants