-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add serialize_overlap option #343
Conversation
I don't understand why we have a need for this? If we are making these kinds of guarantees, then the random map should prevent this from happening. If it doesn't, it's a bug. If the job file uses options that excludes the use of a random map, then that's the issue. What am I missing? |
This patch isn't just about when fio does its own verification it's also to cope with external verification. I'm testing iSCSI storage that really doesn't cope when two overlapping I/Os are submitted together when an I/O log is replayed at full speed. It's been noted that submitting I/O in this way isn't done by programs that have to read it back but it's difficult to prevent this from happening as the logs being replayed don't have synchronization information within them. The storage goes on to create regions where its own (internal) verification points out that one path has one set of data and the other path to it has a different set of data for the same LBA! A similar issue is seen when trying to verify a random workload that is using a zipfian distribution - the last I/O that fio believes should be at a particular point is not actually the write that wound up winning so a verification mismatch occurs. |
That's a good point, there are workloads where we cannot (or don't want to) control blocks, like your zipf example. However, we should limit the checking for the cases where we absolutely HAVE to check. I'll need to think about this for a bit. It seems like a very heavy handed approach, would be nice if we could handle this more elegantly (and with much less overhead). |
@@ -96,6 +96,7 @@ void convert_thread_options_to_cpu(struct thread_options *o, | |||
o->iodepth_batch = le32_to_cpu(top->iodepth_batch); | |||
o->iodepth_batch_complete_min = le32_to_cpu(top->iodepth_batch_complete_min); | |||
o->iodepth_batch_complete_max = le32_to_cpu(top->iodepth_batch_complete_max); | |||
o->serialize_overlap = le32_to_cpu(top->serialize_overlap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing the conversions the other way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be addressed in v2.
* There's no need to check for in-flight overlapping IOs if the job | ||
* isn't changing data or the maximum iodepth is guaranteed to be 1 | ||
*/ | ||
if (o->serialize_overlap && !(td->flags & TD_F_READ_IOLOG) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, disabling when we don't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - my hope was to try and turn it off whenever possible just in case people were in the habit of turning it on. Technically if you're replaying an iolog at depth 1 you don't need to go through the checks either so perhaps I need to tidy this further.
/* | ||
* Currently can't check for overlaps in offload mode | ||
*/ | ||
if (o->serialize_overlap && o->io_submit_mode == IO_MODE_OFFLOAD) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's preventing it from being enabled for offloaded submission?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My ingenuity. I couldn't see a way to say "wait for completion" in the offloaded submission case so I took a punt. Can I just goto reap
as other parts of the code do?
47ea422
to
23a126c
Compare
I admit I took the easy way out on this one and essentially force quadratic behaviour on the iodepth and the number of I/Os (but it worked surprisingly well on my depth of 32). Perhaps the better thing to do would be to stuff each in flight I/O into a tree then at least the searching would be sped up. I would say a sparse bitmap is good enough but I originally had code that avoided waiting when both overlapping I/Os were reads (I've since removed) but if you've got multiple overlapping reads in the bitmap and you only want to remove one of them. It also meant if statements at submission AND completion. |
23a126c
to
284059d
Compare
284059d
to
7709172
Compare
The branch has been updated to add a man page entry for |
7709172
to
9b60f7a
Compare
Awaiting a response from Jeff Furlong on these because he was wondering if the combination of these patches (and using the |
9b60f7a
to
ba46d71
Compare
Jeff has come back and says he hasn't seen any issues with this patch so I've updated the commit messages to mention him in a Tested-by line. |
If this isn't set (the default) fio can submit write I/Os that overlap other in-flight I/Os leading to potential data races. For example the following job frequently fails at the verification stage: ./fio --random_distribution=zipf:1.6 --direct=1 --filename \ /tmp/fiofile --ioengine=posixaio --iodepth=32 --size=20M --bs=4k \ --rw=randwrite --verify=crc32c --name=verifyoverlap When serialize_overlap=1 fio avoids creating such races. Thanks to Rachel Lunnon (StorMagic) for helping me debug the initial version of this! Fixes: axboe#335 v2: Fix merge conflict and add missing conversion. v3: Add man page, fix serialize_overlap disabling, improve commit message. Tested-by: Jeff Furlong <jeff.furlong@wdc.com> Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
When running valgrind ./fio --randseed=1 --ioengine=libaio --thread --rw=randrw \ --random_distribution=zipf:1.4 --filename=/tmp/fiofile --io_limit=50M \ --verify=crc32c --name=verifyfree --iodepth=32 --bsrange=512-1M --size=100M valgrind reports: ==29301== Invalid read of size 4 ==29301== at 0x44ADFC: io_completed (io_u.c:1835) ==29301== by 0x44B215: ios_completed (io_u.c:1924) ==29301== by 0x44B683: io_u_queued_complete (io_u.c:1983) ==29301== by 0x46FA6F: wait_for_completions (backend.c:455) ==29301== by 0x471568: do_io (backend.c:1046) ==29301== by 0x474405: thread_main (backend.c:1746) ==29301== by 0x576E6B9: start_thread (pthread_create.c:333) ==29301== by 0x5C8E82C: clone (clone.S:109) ==29301== Address 0x62cf988 is 72 bytes inside a block of size 88 free'd ==29301== at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) When the __ipo is still in-flight we shouldn't free it when it overlaps because it will be used at I/O completion time. Fixes axboe#336 . Tested-by: Jeff Furlong <jeff.furlong@wdc.com> Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Running the following fio jobs unexpectedly reports a verification failure: rm /tmp/tmp.fio; ./fio --iodepth=1 \ --verify=pattern --verify_fatal=1 --size=100M --bsrange=512-128k \ --rw=randwrite --verify_backlog=128 --filename=/tmp/tmp.fio \ --verify_pattern="%o" --name=spuriousmismatch1 rm /tmp/tmp.fio; ./fio --iodepth=1 \ --verify=crc32c --verify_fatal=1 --size=100M --bs=4k \ --rw=randwrite --verify_backlog=20 --filename=/tmp/tmp.fio \ --percentage_random=50 --randseed=86 --name=spuriousmismatch2 In the case of the first job, using a bsrange where the start and end are different can cause random write I/O to overlap an already written region making the original data unverifiable. For the second job, when percentage_random is between 1 and 99 the same offset can be generated multiple times but only the last write to that offset should be verified. Rather than special casing the growing number of random jobs that might generate overlaps while still having a randommap, and given preallocation during layout is the default where possible, just remove the overwrite=0 optimisation thus forcing all random jobs to be checked for overlaps. It is still possible to force the old behaviour by setting verifysort=0. Fixes axboe#335 and axboe#344 . Tested-by: Jeff Furlong <jeff.furlong@wdc.com> Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Code inspection shows there are currently guards in the call sites of log_io_piece() that prevent log_io_piece() being called when td->o.verify == VERIFY_NONE so skip checking it within log_io_piece(). Tested-by: Jeff Furlong <jeff.furlong@wdc.com> Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
ba46d71
to
4a06cb4
Compare
I pulled it it, and added an extra commit to move the overlap checking into a helper, and only call the overlap check if we have > 1 IO in flight, instead of > 0. The latter is because the io_u in question is already accounted. By definition, we need >= 2 IOs to be worried about overlap. |
Good catch - thanks Jens! |
If this isn't set (the default) fio can submit asynchronous I/Os that
overlap leading to potential data races. When this option is set, the
race is avoided by having fio serialize the overlapping submissions.
Thanks to Rachel Lunnon (StorMagic) for helping me debug the initial
version of this!