Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIO can only run 2 seconds with RandSeed=47827 #1486

Closed
1 task done
ningqingqing opened this issue Nov 11, 2022 · 5 comments
Closed
1 task done

FIO can only run 2 seconds with RandSeed=47827 #1486

ningqingqing opened this issue Nov 11, 2022 · 5 comments

Comments

@ningqingqing
Copy link

ningqingqing commented Nov 11, 2022

Please acknowledge the following before creating a ticket

Description of the bug:
<FIO can only run 1-2 seconds if set randseed=47827>

Environment: <Linux, CentOS>

fio version: <FIO3.25>

Reproduction steps
<We have run two fio test, the only difference between them is we set different randseed value.
In the first fio test, we set randseed=47827, fio can only run 2 seconds
job command as below:
sudo fio --thread --ioengine=libaio --numjobs=1 --direct=1 --filename=/dev/nvme0n1 --name=bs131072_rwrandrw_qd32 --rwmixread=0 --rw=randrw --percentage_random=50 --randrepeat=0 --size=56019152896 --bs=131072 --offset=0 --randseed=47827 --iodepth=32 --max_latency=16000ms --cpus_allowed=0-3
fio output:
47827.txt

In the second fio test, we set randseed=47807, fio run normally
job command as below:
sudo fio --thread --ioengine=libaio --numjobs=1 --direct=1 --filename=/dev/nvme0n1 --name=bs131072_rwrandrw_qd32 --rwmixread=0 --rw=randrw --percentage_random=50 --randrepeat=0 --size=56019152896 --bs=131072 --offset=0 --randseed=47807 --iodepth=32 --max_latency=16000ms --cpus_allowed=0-3
fio output:
47807.txt

Could you help share why randseed parameter affect test time and what this parameter did in fio test?>

@ankit-sam
Copy link
Contributor

Hi @ningqingqing I ran both the commands and was able to replicate what you observed. With randseed=47827 I saw that fio wrote exactly 1694 MiB data, which is the same in the attached log. With randseed=47807 it was able to write entire 52.2 GiB

I enabled --debug=io for randseed=47827 and observed that fio fails, as the next generated random offset lies outside start offset and io_size.

io 8915 get_next_offset: offset 56019255296 >= io_size 56019152896
io 8915 io_u 0x7f72cc0030c0, failed getting offset
io 8915 io_u 0x7f72cc0030c0, setting file failed
io 8915 get_io_u failed
io 8915 io_u_queued_complete: min=31
io 8915 getevents: 31

This is also observed with randseed=47867 and randseed=47847 but at different point of time.

I also ran fio by removing percentage_random=50 (i.e. fully 100% random workload) and the issue was not seen with these randseed values. So it seems there is something weird happening with fio's random generation only when me mix random and sequential workloads.

I will go through this section of code and see if I can find the exact problem.

@ningqingqing
Copy link
Author

Hi, @ankit-sam , thanks for your time to replicate this.
I have a question about transaction file command, could you help me with that?
When fio fills drive, it can use write_iolog parameter to record the sent commands into a transaction file. However, are the commands recorded in this file definitely sent to drive? Or is this transaction file prepared in advance and not really sent to drive?

@ankit-sam
Copy link
Contributor

ankit-sam commented Nov 16, 2022

Hi, @ningqingqing I don't have much idea about transaction file command. Here is the possible fix for this issue, the change is based on fio-3.33

index 8035f4b7..e49b1b29 100644
--- a/io_u.c
+++ b/io_u.c
@@ -432,8 +432,11 @@ static int get_next_block(struct thread_data *td, struct io_u *io_u,
                                *is_random = false;
                                io_u_set(td, io_u, IO_U_F_BUSY_OK);
                                ret = get_next_seq_offset(td, f, ddir, &offset);
-                               if (ret)
+                               if (ret || offset >= f->io_size) {
                                        ret = get_next_rand_block(td, f, ddir, &b);
+                                       offset = -1ULL;
+                                       *is_random = true;
+                               }
                        }
                } else {
                        *is_random = false;

The actual issue was with sequential offset generation in this case when we have mix of sequential and random workload. The get_next_seq_offset does a few things

  • Reset if we reach end only for time based jobs.
  • Then either return 1 if we reach end of the file, in this case fio will generate a random offset,
    or else
  • return the next sequential offset. In this process we do wrap around but only if there are holes (i.e. if we are doing an I/O and skipping some size such as rw=write:4k

For sequential and random mix workload case, I have observed that get_next_seq_offset sometime returns offset which is bigger than f->io_size. The only possible solution I can think of is to look for random block in this case.

Can you please verify the changes and see that it doesn't break any existing functionality. if its fine I can submit a patch.

Tagging @vincentkfu for this inputs.

@ningqingqing
Copy link
Author

Hi, @ankit-sam , thank you so much, I will try later.

@ankit-sam
Copy link
Contributor

Hi, @ningqingqing you can try with fio option io_size

I submitted a patch but it affects when we are operating on multiple files, and later found that that there are a lot of places where size does not work as specified and modifying the existing code can result in regression. You can see the discussion in lore:
https://lore.kernel.org/fio/20221118051454.31288-1-ankit.kumar@samsung.com/T/#t

I think the work around is to update fio documentation for size

vincentkfu pushed a commit to vincentkfu/fio that referenced this issue Dec 1, 2022
In few cases with fio option size the number of bytes of data transferred
is actually less than what we specified. This can happen if there are
gaps or holes while doing I/O's or if we are running a mix of sequential
and random workload.
Update the documentation for that.

Fixes: axboe#1486

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
@axboe axboe closed this as completed in 942d66c Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants