Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialize_overlap option #343

Merged
merged 4 commits into from
Aug 14, 2017
Merged

Add serialize_overlap option #343

merged 4 commits into from
Aug 14, 2017

Conversation

sitsofe
Copy link
Collaborator

@sitsofe sitsofe commented Apr 26, 2017

If this isn't set (the default) fio can submit asynchronous I/Os that
overlap leading to potential data races. When this option is set, the
race is avoided by having fio serialize the overlapping submissions.

Thanks to Rachel Lunnon (StorMagic) for helping me debug the initial
version of this!

@axboe
Copy link
Owner

axboe commented Apr 26, 2017

I don't understand why we have a need for this? If we are making these kinds of guarantees, then the random map should prevent this from happening. If it doesn't, it's a bug. If the job file uses options that excludes the use of a random map, then that's the issue.

What am I missing?

@sitsofe
Copy link
Collaborator Author

sitsofe commented Apr 26, 2017

This patch isn't just about when fio does its own verification it's also to cope with external verification.

I'm testing iSCSI storage that really doesn't cope when two overlapping I/Os are submitted together when an I/O log is replayed at full speed. It's been noted that submitting I/O in this way isn't done by programs that have to read it back but it's difficult to prevent this from happening as the logs being replayed don't have synchronization information within them. The storage goes on to create regions where its own (internal) verification points out that one path has one set of data and the other path to it has a different set of data for the same LBA!

A similar issue is seen when trying to verify a random workload that is using a zipfian distribution - the last I/O that fio believes should be at a particular point is not actually the write that wound up winning so a verification mismatch occurs.

@axboe
Copy link
Owner

axboe commented Apr 26, 2017

That's a good point, there are workloads where we cannot (or don't want to) control blocks, like your zipf example. However, we should limit the checking for the cases where we absolutely HAVE to check. I'll need to think about this for a bit. It seems like a very heavy handed approach, would be nice if we could handle this more elegantly (and with much less overhead).

@@ -96,6 +96,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
o->iodepth_batch = le32_to_cpu(top->iodepth_batch);
o->iodepth_batch_complete_min = le32_to_cpu(top->iodepth_batch_complete_min);
o->iodepth_batch_complete_max = le32_to_cpu(top->iodepth_batch_complete_max);
o->serialize_overlap = le32_to_cpu(top->serialize_overlap);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing the conversions the other way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed in v2.

* There's no need to check for in-flight overlapping IOs if the job
* isn't changing data or the maximum iodepth is guaranteed to be 1
*/
if (o->serialize_overlap && !(td->flags & TD_F_READ_IOLOG) &&
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, disabling when we don't need it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - my hope was to try and turn it off whenever possible just in case people were in the habit of turning it on. Technically if you're replaying an iolog at depth 1 you don't need to go through the checks either so perhaps I need to tidy this further.

/*
* Currently can't check for overlaps in offload mode
*/
if (o->serialize_overlap && o->io_submit_mode == IO_MODE_OFFLOAD) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's preventing it from being enabled for offloaded submission?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My ingenuity. I couldn't see a way to say "wait for completion" in the offloaded submission case so I took a punt. Can I just goto reap as other parts of the code do?

@sitsofe sitsofe force-pushed the serialize_overlap branch 2 times, most recently from 47ea422 to 23a126c Compare April 26, 2017 22:28
@sitsofe
Copy link
Collaborator Author

sitsofe commented Apr 26, 2017

I admit I took the easy way out on this one and essentially force quadratic behaviour on the iodepth and the number of I/Os (but it worked surprisingly well on my depth of 32). Perhaps the better thing to do would be to stuff each in flight I/O into a tree then at least the searching would be sped up. I would say a sparse bitmap is good enough but I originally had code that avoided waiting when both overlapping I/Os were reads (I've since removed) but if you've got multiple overlapping reads in the bitmap and you only want to remove one of them. It also meant if statements at submission AND completion.

@sitsofe
Copy link
Collaborator Author

sitsofe commented Aug 11, 2017

The branch has been updated to add a man page entry for serialize_overlap, fix a bug where serialize_overlap would disable itself wrongly (because I thought td_trimwrite meant the job did a trim or write which turned out to be incorrect), and it also add the patches in the pull request #345 (which supersedes the "iolog: sort verify entries when using dynamic blocksizes" commit) and the pull request #359 .

@sitsofe
Copy link
Collaborator Author

sitsofe commented Aug 13, 2017

Awaiting a response from Jeff Furlong on these because he was wondering if the combination of these patches (and using the serialize_overlap=1 option they introduce) might solve the issue mentioned over on http://www.spinics.net/lists/fio/msg06139.html .

@sitsofe
Copy link
Collaborator Author

sitsofe commented Aug 14, 2017

Jeff has come back and says he hasn't seen any issues with this patch so I've updated the commit messages to mention him in a Tested-by line.

If this isn't set (the default) fio can submit write I/Os that overlap
other in-flight I/Os leading to potential data races. For example the
following job frequently fails at the verification stage:

./fio --random_distribution=zipf:1.6 --direct=1 --filename \
 /tmp/fiofile --ioengine=posixaio --iodepth=32 --size=20M --bs=4k \
 --rw=randwrite --verify=crc32c --name=verifyoverlap

When serialize_overlap=1 fio avoids creating such races.

Thanks to Rachel Lunnon (StorMagic) for helping me debug the initial
version of this!

Fixes: axboe#335

v2: Fix merge conflict and add missing conversion.
v3: Add man page, fix serialize_overlap disabling, improve commit
    message.

Tested-by: Jeff Furlong <jeff.furlong@wdc.com>
Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
When running
valgrind ./fio --randseed=1 --ioengine=libaio --thread --rw=randrw \
 --random_distribution=zipf:1.4 --filename=/tmp/fiofile --io_limit=50M \
 --verify=crc32c --name=verifyfree --iodepth=32 --bsrange=512-1M --size=100M

valgrind reports:
==29301== Invalid read of size 4
==29301==    at 0x44ADFC: io_completed (io_u.c:1835)
==29301==    by 0x44B215: ios_completed (io_u.c:1924)
==29301==    by 0x44B683: io_u_queued_complete (io_u.c:1983)
==29301==    by 0x46FA6F: wait_for_completions (backend.c:455)
==29301==    by 0x471568: do_io (backend.c:1046)
==29301==    by 0x474405: thread_main (backend.c:1746)
==29301==    by 0x576E6B9: start_thread (pthread_create.c:333)
==29301==    by 0x5C8E82C: clone (clone.S:109)
==29301==  Address 0x62cf988 is 72 bytes inside a block of size 88 free'd
==29301==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

When the __ipo is still in-flight we shouldn't free it when it overlaps
because it will be used at I/O completion time. Fixes
axboe#336 .

Tested-by: Jeff Furlong <jeff.furlong@wdc.com>
Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Running the following fio jobs unexpectedly reports a verification
failure:
rm /tmp/tmp.fio; ./fio --iodepth=1 \
 --verify=pattern --verify_fatal=1 --size=100M --bsrange=512-128k \
 --rw=randwrite --verify_backlog=128 --filename=/tmp/tmp.fio \
 --verify_pattern="%o" --name=spuriousmismatch1

rm /tmp/tmp.fio; ./fio --iodepth=1 \
 --verify=crc32c --verify_fatal=1 --size=100M --bs=4k \
 --rw=randwrite --verify_backlog=20 --filename=/tmp/tmp.fio \
 --percentage_random=50 --randseed=86 --name=spuriousmismatch2

In the case of the first job, using a bsrange where the start and end
are different can cause random write I/O to overlap an already written
region making the original data unverifiable.

For the second job, when percentage_random is between 1 and 99 the same
offset can be generated multiple times but only the last write to that
offset should be verified.

Rather than special casing the growing number of random jobs that might
generate overlaps while still having a randommap, and given preallocation
during layout is the default where possible, just remove the overwrite=0
optimisation thus forcing all random jobs to be checked for overlaps. It
is still possible to force the old behaviour by setting verifysort=0.

Fixes axboe#335 and
axboe#344 .

Tested-by: Jeff Furlong <jeff.furlong@wdc.com>
Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Code inspection shows there are currently guards in the call sites of
log_io_piece() that prevent log_io_piece() being called when
td->o.verify == VERIFY_NONE so skip checking it within log_io_piece().

Tested-by: Jeff Furlong <jeff.furlong@wdc.com>
Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
@axboe axboe merged commit 4a06cb4 into axboe:master Aug 14, 2017
@axboe
Copy link
Owner

axboe commented Aug 14, 2017

I pulled it it, and added an extra commit to move the overlap checking into a helper, and only call the overlap check if we have > 1 IO in flight, instead of > 0. The latter is because the io_u in question is already accounted. By definition, we need >= 2 IOs to be worried about overlap.

@sitsofe
Copy link
Collaborator Author

sitsofe commented Aug 14, 2017

Good catch - thanks Jens!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants