Add serialize_overlap option #343

sitsofe · 2017-04-26T21:12:27Z

If this isn't set (the default) fio can submit asynchronous I/Os that
overlap leading to potential data races. When this option is set, the
race is avoided by having fio serialize the overlapping submissions.

Thanks to Rachel Lunnon (StorMagic) for helping me debug the initial
version of this!

axboe · 2017-04-26T21:20:38Z

I don't understand why we have a need for this? If we are making these kinds of guarantees, then the random map should prevent this from happening. If it doesn't, it's a bug. If the job file uses options that excludes the use of a random map, then that's the issue.

What am I missing?

sitsofe · 2017-04-26T21:35:45Z

This patch isn't just about when fio does its own verification it's also to cope with external verification.

I'm testing iSCSI storage that really doesn't cope when two overlapping I/Os are submitted together when an I/O log is replayed at full speed. It's been noted that submitting I/O in this way isn't done by programs that have to read it back but it's difficult to prevent this from happening as the logs being replayed don't have synchronization information within them. The storage goes on to create regions where its own (internal) verification points out that one path has one set of data and the other path to it has a different set of data for the same LBA!

A similar issue is seen when trying to verify a random workload that is using a zipfian distribution - the last I/O that fio believes should be at a particular point is not actually the write that wound up winning so a verification mismatch occurs.

axboe · 2017-04-26T21:46:06Z

That's a good point, there are workloads where we cannot (or don't want to) control blocks, like your zipf example. However, we should limit the checking for the cases where we absolutely HAVE to check. I'll need to think about this for a bit. It seems like a very heavy handed approach, would be nice if we could handle this more elegantly (and with much less overhead).

axboe · 2017-04-26T21:46:42Z

cconv.c

@@ -96,6 +96,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->iodepth_batch = le32_to_cpu(top->iodepth_batch);
 	o->iodepth_batch_complete_min = le32_to_cpu(top->iodepth_batch_complete_min);
 	o->iodepth_batch_complete_max = le32_to_cpu(top->iodepth_batch_complete_max);
+	o->serialize_overlap = le32_to_cpu(top->serialize_overlap);


This is missing the conversions the other way

Should be addressed in v2.

axboe · 2017-04-26T21:47:20Z

init.c

+	 * There's no need to check for in-flight overlapping IOs if the job
+	 * isn't changing data or the maximum iodepth is guaranteed to be 1
+	 */
+	if (o->serialize_overlap && !(td->flags & TD_F_READ_IOLOG) &&


I like this, disabling when we don't need it.

Yeah - my hope was to try and turn it off whenever possible just in case people were in the habit of turning it on. Technically if you're replaying an iolog at depth 1 you don't need to go through the checks either so perhaps I need to tidy this further.

axboe · 2017-04-26T21:47:35Z

init.c

+	/*
+	 * Currently can't check for overlaps in offload mode
+	 */
+	if (o->serialize_overlap && o->io_submit_mode == IO_MODE_OFFLOAD) {


What's preventing it from being enabled for offloaded submission?

My ingenuity. I couldn't see a way to say "wait for completion" in the offloaded submission case so I took a punt. Can I just goto reap as other parts of the code do?

sitsofe · 2017-04-26T22:37:33Z

I admit I took the easy way out on this one and essentially force quadratic behaviour on the iodepth and the number of I/Os (but it worked surprisingly well on my depth of 32). Perhaps the better thing to do would be to stuff each in flight I/O into a tree then at least the searching would be sped up. I would say a sparse bitmap is good enough but I originally had code that avoided waiting when both overlapping I/Os were reads (I've since removed) but if you've got multiple overlapping reads in the bitmap and you only want to remove one of them. It also meant if statements at submission AND completion.

sitsofe · 2017-08-11T23:11:24Z

The branch has been updated to add a man page entry for serialize_overlap, fix a bug where serialize_overlap would disable itself wrongly (because I thought td_trimwrite meant the job did a trim or write which turned out to be incorrect), and it also add the patches in the pull request #345 (which supersedes the "iolog: sort verify entries when using dynamic blocksizes" commit) and the pull request #359 .

sitsofe · 2017-08-13T20:25:20Z

Awaiting a response from Jeff Furlong on these because he was wondering if the combination of these patches (and using the serialize_overlap=1 option they introduce) might solve the issue mentioned over on http://www.spinics.net/lists/fio/msg06139.html .

sitsofe · 2017-08-14T03:41:36Z

Jeff has come back and says he hasn't seen any issues with this patch so I've updated the commit messages to mention him in a Tested-by line.

If this isn't set (the default) fio can submit write I/Os that overlap other in-flight I/Os leading to potential data races. For example the following job frequently fails at the verification stage: ./fio --random_distribution=zipf:1.6 --direct=1 --filename \ /tmp/fiofile --ioengine=posixaio --iodepth=32 --size=20M --bs=4k \ --rw=randwrite --verify=crc32c --name=verifyoverlap When serialize_overlap=1 fio avoids creating such races. Thanks to Rachel Lunnon (StorMagic) for helping me debug the initial version of this! Fixes: axboe#335 v2: Fix merge conflict and add missing conversion. v3: Add man page, fix serialize_overlap disabling, improve commit message. Tested-by: Jeff Furlong <jeff.furlong@wdc.com> Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

When running valgrind ./fio --randseed=1 --ioengine=libaio --thread --rw=randrw \ --random_distribution=zipf:1.4 --filename=/tmp/fiofile --io_limit=50M \ --verify=crc32c --name=verifyfree --iodepth=32 --bsrange=512-1M --size=100M valgrind reports: ==29301== Invalid read of size 4 ==29301== at 0x44ADFC: io_completed (io_u.c:1835) ==29301== by 0x44B215: ios_completed (io_u.c:1924) ==29301== by 0x44B683: io_u_queued_complete (io_u.c:1983) ==29301== by 0x46FA6F: wait_for_completions (backend.c:455) ==29301== by 0x471568: do_io (backend.c:1046) ==29301== by 0x474405: thread_main (backend.c:1746) ==29301== by 0x576E6B9: start_thread (pthread_create.c:333) ==29301== by 0x5C8E82C: clone (clone.S:109) ==29301== Address 0x62cf988 is 72 bytes inside a block of size 88 free'd ==29301== at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) When the __ipo is still in-flight we shouldn't free it when it overlaps because it will be used at I/O completion time. Fixes axboe#336 . Tested-by: Jeff Furlong <jeff.furlong@wdc.com> Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

Running the following fio jobs unexpectedly reports a verification failure: rm /tmp/tmp.fio; ./fio --iodepth=1 \ --verify=pattern --verify_fatal=1 --size=100M --bsrange=512-128k \ --rw=randwrite --verify_backlog=128 --filename=/tmp/tmp.fio \ --verify_pattern="%o" --name=spuriousmismatch1 rm /tmp/tmp.fio; ./fio --iodepth=1 \ --verify=crc32c --verify_fatal=1 --size=100M --bs=4k \ --rw=randwrite --verify_backlog=20 --filename=/tmp/tmp.fio \ --percentage_random=50 --randseed=86 --name=spuriousmismatch2 In the case of the first job, using a bsrange where the start and end are different can cause random write I/O to overlap an already written region making the original data unverifiable. For the second job, when percentage_random is between 1 and 99 the same offset can be generated multiple times but only the last write to that offset should be verified. Rather than special casing the growing number of random jobs that might generate overlaps while still having a randommap, and given preallocation during layout is the default where possible, just remove the overwrite=0 optimisation thus forcing all random jobs to be checked for overlaps. It is still possible to force the old behaviour by setting verifysort=0. Fixes axboe#335 and axboe#344 . Tested-by: Jeff Furlong <jeff.furlong@wdc.com> Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

Code inspection shows there are currently guards in the call sites of log_io_piece() that prevent log_io_piece() being called when td->o.verify == VERIFY_NONE so skip checking it within log_io_piece(). Tested-by: Jeff Furlong <jeff.furlong@wdc.com> Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

axboe · 2017-08-14T14:30:27Z

I pulled it it, and added an extra commit to move the overlap checking into a helper, and only call the overlap check if we have > 1 IO in flight, instead of > 0. The latter is because the io_u in question is already accounted. By definition, we need >= 2 IOs to be worried about overlap.

sitsofe · 2017-08-14T14:54:08Z

Good catch - thanks Jens!

axboe reviewed Apr 26, 2017

View reviewed changes

sitsofe force-pushed the serialize_overlap branch 2 times, most recently from 47ea422 to 23a126c Compare April 26, 2017 22:28

sitsofe mentioned this pull request May 18, 2017

Wrong data corruption error report on NVMe drive ? (verify: bad magic header) #366

Closed

sitsofe force-pushed the serialize_overlap branch from 23a126c to 284059d Compare June 4, 2017 08:39

sitsofe force-pushed the serialize_overlap branch from 284059d to 7709172 Compare August 11, 2017 23:02

sitsofe force-pushed the serialize_overlap branch from 7709172 to 9b60f7a Compare August 13, 2017 20:12

sitsofe force-pushed the serialize_overlap branch from 9b60f7a to ba46d71 Compare August 14, 2017 03:22

sitsofe added 4 commits August 14, 2017 04:43

sitsofe force-pushed the serialize_overlap branch from ba46d71 to 4a06cb4 Compare August 14, 2017 03:44

axboe merged commit 4a06cb4 into axboe:master Aug 14, 2017

This was referenced Aug 14, 2017

iolog: fix use after free when two in-flight writes overlap #359

Closed

iolog: fix spurious verify mismatch when random I/O overlaps #345

Closed

sitsofe deleted the serialize_overlap branch August 14, 2017 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add serialize_overlap option #343

Add serialize_overlap option #343

sitsofe commented Apr 26, 2017

axboe commented Apr 26, 2017

sitsofe commented Apr 26, 2017

axboe commented Apr 26, 2017

axboe Apr 26, 2017

sitsofe Apr 26, 2017

axboe Apr 26, 2017

sitsofe Apr 26, 2017

axboe Apr 26, 2017

sitsofe Apr 26, 2017

sitsofe commented Apr 26, 2017

sitsofe commented Aug 11, 2017

sitsofe commented Aug 13, 2017

sitsofe commented Aug 14, 2017

axboe commented Aug 14, 2017

sitsofe commented Aug 14, 2017

Add serialize_overlap option #343

Add serialize_overlap option #343

Conversation

sitsofe commented Apr 26, 2017

axboe commented Apr 26, 2017

sitsofe commented Apr 26, 2017

axboe commented Apr 26, 2017

axboe Apr 26, 2017

Choose a reason for hiding this comment

sitsofe Apr 26, 2017

Choose a reason for hiding this comment

axboe Apr 26, 2017

Choose a reason for hiding this comment

sitsofe Apr 26, 2017

Choose a reason for hiding this comment

axboe Apr 26, 2017

Choose a reason for hiding this comment

sitsofe Apr 26, 2017

Choose a reason for hiding this comment

sitsofe commented Apr 26, 2017

sitsofe commented Aug 11, 2017

sitsofe commented Aug 13, 2017

sitsofe commented Aug 14, 2017

axboe commented Aug 14, 2017

sitsofe commented Aug 14, 2017