FIO reports verify: bad header rand_seed, in the RW mode when iodepth > 1 #1526

KuribohG · 2023-02-24T03:36:22Z

Please acknowledge the following before creating a ticket

I have read the GitHub issues section of REPORTING-BUGS.

Description of the bug:

FIO reports verify error in the RW mode when iodepth > 1.

After debugging, I found this may be the reason: Under RW mode, there may be some IO requests that must be requeued, if there is a read/write switching when iodepth > 1. The rand_seed field in io_u changed when requeue (which I think is not desirable), but fio is still using the original rand_seed for verifying.

I'm not sure if this problem is related to #1503.

Environment: Ubuntu 20.04

fio version: 3.32

Reproduction steps

[random-readers]
group_reporting
rw=rw
do_verify=1
verify=crc32c
ioengine=vsync
threads=1
numjobs=1
blocksize=128k
size=1024k
nrfiles=1
filename_format=testsync.$jobnum
iodepth=4
iodepth_batch_submit=4
iodepth_batch_complete_min=4
iodepth_batch_complete_max=4

Run fio with the configuration above, and the bug will reproduce consistently.

The text was updated successfully, but these errors were encountered:

horshack-dpreview · 2023-02-24T18:11:58Z

@KuribohG Thanks for your submission. I was able to reproduce your issue using the current tip of master. As you discovered, the issue occurs when an I/O request gets requeued. This occurs because of iodepth_batch_complete_max - when that number of I/Os have been submitted (and waiting to complete), the next I/O generated results in an internal FIO_Q_BUSY before submission, which causes the I/O to be queued for later submission. When that I/O is later dequeued and executed, the logic in backend.c::do_io() calls verify.c::populate_verify_io_u(), but populate_verify_io_u() was already called once when the I/O was first generated and requeued with FIO_Q_BUSY - this causes the verify logic's random seed to advance an additional time (generated by verify.c::fill_verify_pattern(), which causes a later verify read I/O to fail because the seeds will be out of sync. Here's the relevant source:

fio/backend.c

Lines 703 to 708 in b5904c0

    
           } else if (io_u->ddir == DDIR_WRITE) { 
        
           	io_u->ddir = DDIR_READ; 
        
           	io_u->numberio = td->verify_read_issues; 
        
           	td->verify_read_issues++; 
        
           	populate_verify_io_u(td, io_u); 
        
           	break;

fio/verify.c

Lines 50 to 63 in b5904c0

    
           void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len, 
        
           			 struct io_u *io_u, uint64_t seed, int use_seed) 
        
           { 
        
           	struct thread_options *o = &td->o; 
        
           	if (!o->verify_pattern_bytes) { 
        
           		dprint(FD_VERIFY, "fill random bytes len=%u\n", len); 
        
           		if (!use_seed) { 
        
           			seed = __rand(&td->verify_state); 
        
           			if (sizeof(int) != sizeof(long *)) 
        
           				seed *= (unsigned long)__rand(&td->verify_state); 
        
           		} 
        
           		io_u->rand_seed = seed;

The fix will likely be to modify backend.c::do_io() to not call verify.c::populate_verify_io_u() for reissued I/Os.

On configurations that can cause I/Os to be internally requeued from FIO_Q_BUSY such as '--iodepth_batch_complete_max', and the workload has verify enabled, the subsequent verification of the data fails with a bad verify rand_seed because the pattern for the I/O is generated twice for the same I/O, causing the seed to become out of sync when the verify is later performed. The seed is generate twice because do_io() handles the I/O twice, first when it originates the I/O and again when it later gets the same I/O back from get_io_u() after it's is pulled from the requeue list, which is where the first submission landed due to the workload reaching '--iodepth_batch_complete_max'. The fix is for do_io() to track when it has generated the verify pattern for an I/O via a new io_u flag 'IO_U_F_PATTERN_DONE', avoiding a second call to populate_verify_io_u() when that flag is detected. Note IO_U_F_PATTERN_DONE is set to 1<<9 instead of the next available 1<<8 because there is already a commit pending that uses 1<<8 (IO_U_F_OVERLAP_LOCK). Link: axboe#1526 Signed-off-by: Adam Horshack (horshack@live.com)

On configurations that can cause I/Os to be internally requeued from FIO_Q_BUSY such as '--iodepth_batch_complete_max', and the workload has verify enabled, the subsequent verification of the data fails with a bad verify rand_seed because the pattern for the I/O is generated twice for the same I/O, causing the seed to become out of sync when the verify is later performed. The seed is generate twice because do_io() handles the I/O twice, first when it originates the I/O and again when it later gets the same I/O back from get_io_u() after it's is pulled from the requeue list, which is where the first submission landed due to the workload reaching '--iodepth_batch_complete_max'. The fix is for do_io() to track when it has generated the verify pattern for an I/O via a new io_u flag 'IO_U_F_PATTERN_DONE', avoiding a second call to populate_verify_io_u() when that flag is detected. Link: axboe#1526 Signed-off-by: Adam Horshack (horshack@live.com)

horshack-dpreview · 2023-02-28T15:11:25Z

@KuribohG , The fix for this issue has been merged into master. Can you please verify it fixes on your config and if so, close this issue? Thanks.

KuribohG · 2023-03-01T09:13:17Z

Thanks! The PR fixed my issue. I will close this.

horshack-dpreview mentioned this issue Feb 26, 2023

FIx "verify bad_hdr rand_seed" for requeued I/Os #1527

Merged

KuribohG closed this as completed Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIO reports verify: bad header rand_seed, in the RW mode when iodepth > 1 #1526

FIO reports verify: bad header rand_seed, in the RW mode when iodepth > 1 #1526

KuribohG commented Feb 24, 2023

horshack-dpreview commented Feb 24, 2023

horshack-dpreview commented Feb 28, 2023

KuribohG commented Mar 1, 2023

FIO reports verify: bad header rand_seed, in the RW mode when iodepth > 1 #1526

FIO reports verify: bad header rand_seed, in the RW mode when iodepth > 1 #1526

Comments

KuribohG commented Feb 24, 2023

horshack-dpreview commented Feb 24, 2023

horshack-dpreview commented Feb 28, 2023

KuribohG commented Mar 1, 2023