How to optimize FIO testing? #579

jjpcat · 2018-04-11T23:26:15Z

First time to post here. Please excuse me if I post it at the wrong place.

I have 4 very fast SSDs (Intel P4600, Micron 9200 MAX) in my i7-7700 PC. I am using FIO to try to measure the aggregated 4kB random read performance. So I start a FIO for each SSD, e.g.,

sudo fio --name=dummy --size=50G --runtime=600 --filename=/dev/nvme0n1 --ioengine=libaio --direct=1 --rw=randread -bs=4k** --iodepth=64 --numjobs=8 --group_reporting

The CPU load shots to 100%. It's still 90%+ when I cut down to testing 3 SSDs. Reducing numjobs helps in reducing CPU load. But I notice that it also brings down the IOPS. So I am wondering if I can do any optimizations to reduce FIO's CPU requirement.

Thanks.

axboe · 2018-04-12T00:21:21Z

Post some actual output from the fio run, 100% doesn’t necessarily say very much when you are running 8 processes. To hit peak, generally one process would be enough, I’d probably start there if I was you. Fio is very optimized as it is, the bottleneck is not going to be fio.

…

On Apr 11, 2018, at 5:26 PM, jjpcat ***@***.***> wrote: First time to post here. Please excuse me if I post it at the wrong place. I have 4 very fast SSDs (Intel P4600, Micron 9200 MAX) in my i7-7700 PC. I am using FIO to try to measure the aggregated 4kB random read performance. So I start a FIO for each SSD, e.g., sudo fio --name=dummy --size=50G --runtime=600 --filename=/dev/nvme0n1 --ioengine=libaio --direct=1 --rw=randread -bs=4k** --iodepth=64 --numjobs=8 --group_reporting The CPU load shots to 100%. It's still 90%+ when I cut down to testing 3 SSDs. Reducing numjobs helps in reducing CPU load. But I notice that it also brings down the IOPS. So I am wondering if I can do any optimizations to reduce FIO's CPU requirement. Thanks. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

szaydel · 2018-04-12T02:18:28Z

@jjpcat: Latency should be examined closely, this sounds very much like io wait. A single fio process can drive hundreds of drives with rather low CPU load. I suspect the problem is elsewhere. For sake of sanity, are you using a fairly recent version of fio?

jjpcat · 2018-04-12T17:14:13Z

Thanks for the responses.

@szaydel Could you show me how to use a single fio process to drive multiple SSDs?

I am attaching a screenshot when I was running 3 FIOs at the same time. The 3 SSDs used in this case: 1 Intel P4600 (the one in the lower left window) and 2 other somewhat slow NVMe SSDs. The average latency is large (3.9ms - 5.8ms). But that's expected when iodepth is set to 512. If I set iodepth to 1, then average latency for these 3 SSDs is 93us - 126us, which is typical.

So for this test, because of 2 rather slow SSDs, the CPU usage is only 84%.

I am using fio 2.16 running on Ubuntu with kernel 4.13.

My observation with these SSDs is that, to hit max IOPS, I need to set numjobs to 8. IOPS will be about 2% lower if I set numjobs to 4 and significantly lower if numjobs is set to 1.

jjpcat · 2018-04-12T17:38:59Z

@szaydel Is this the right way to drive multiple SSDs with 1 fio? --filename=/dev/nvme0n1:/dev/nvme1n1

I tried this. The IOPS is higher than each individual IOPS. But it's 10-30% lower than the sum of each individual IOPS. Increasing numjobs or iodepth doesn't help. There is an article on the internet saying that it's limited by the slowest SSD in the group.

With a single fio process driving multiple SSDs, the CPU utilization also shot up. It doesn't improve CPU usage per aggregated IOPS, which is the problem I am trying to solve.

Thanks.

szaydel · 2018-04-13T01:23:04Z

@jjpcat : I was fairly generic with my statement, which assumed a filesystem over the drives as opposed to just the individual drives. With regard to IO depth, I am not sure 512 is really sane, but you may have a very specific reason for that number. You might be hitting throughput limits of the bus also. Have you done just very basic sequential IO tests to see what throughput you top out at?

szaydel · 2018-04-13T01:26:13Z

@jjpcat: Install later version of fio., because 2.16 is quite dated at this point.

jjpcat · 2018-04-16T19:54:27Z

@szaydel Updated to FIO 3.5. Still the same.

sitsofe · 2018-04-18T09:39:28Z

@jjpcat As this isn't so much an issue in fio and more of a "How do I?" question it is better aimed at the fio mailing list. I'll note there has been the occasional "What are go faster options for fio?" questions on the mailing list in the past (e.g. https://www.spinics.net/lists/fio/msg05451.html ) and there are examples of the jobs people used to reach high IOPS in various places (e.g. https://marc.info/?l=linux-kernel&m=140313968523237&w=2 ).
Some of those options increase IOPS at the expense of CPU though but some will reduce overhead while increasing latency (e.g. the batching options in http://fio.readthedocs.io/en/latest/fio_doc.html#i-o-depth ).

Is this the right way to drive multiple SSDs with 1 fio? --filename=/dev/nvme0n1:/dev/nvme1n1

Sort of. That's more for doing round robin between multiple disks but this is has already veered into a discussion topic rather than an issue.

Here are some hints:

Max IOPS is a tricky business because it rarely models real world workloads. In real life you don't actually want to send the tiniest block sizes all the time because those have the biggest overhead...
Run only one fio with numjob=1 against one SSD for now. Doing anything else is just going to confuse matters and create potential confounding factors. Once you've maxed that THEN we can move on.
Since your disks are NVMe do you know what the maximum queue depth they support is? You may be able to get a hint by looking in /sys/block/<dev>/device/queue_depth. You may also find it in spec sheets but be aware your controller may limit things too etc.
When setting options on the command line you need to use -- not just - (e.g. see what you did with -bs=4k**).
Why do you have two ** after 4k?
In the screenshot you attached a huge amount of time was spent in the kernel (50.2 sys). You might want to investigate why.
It's easier for us if you copy/paste your terminal's text into a text box rather than sending a screenshot ;-) .

As I said I'd strongly recommend taking this to the mailing list...

sitsofe · 2018-04-24T19:31:44Z

@jjpcat Any follow up on this?

jjpcat · 2018-04-24T21:35:56Z

@sitsofe Thanks for providing those info. I think you have a good point regarding the time spent in the kernel. That's kind of beyond my control. I am using the standard Linux nvme driver. I may try some user space driver and test it again.

Sorry that I can not do numjob=1. None of our competitors are doing this. If I use only 1 job, our IOPS numbers will look so much worse than our competitors.

szaydel · 2018-04-25T02:25:54Z

@jjpcat, are you trying to represent real world, or are you really just trying to chase numbers? If numbers, I think queue depth is quite important. Did you do any straight sequential IO where you try to get as much pushed through as possible? At least that should tell you how far you can push hardware.

Kernel time is likely due to small IO, and the resulting large number of syscalls to get the IO done. And, as far as we can tell seems like system is spending a lot of time getting this IO done, which means we are waiting in the kernel.

Just a few thoughts about how I would approach this. First, do straight sequential IO, just reads with large block. Next, same test, do both reads and writes, keep watch on CPU and kernel time. Start out with effectively QD=1, and increase from there, trying to figure out at which point QD stops to matter. CPU utilization should keep going up with QD. Once you hit a bottleneck, toss another CPU into the mix. If no difference, your problem is something else in the system. I am quite certain fio won't be root cause, and something else is going to be your bottleneck.

sitsofe · 2018-04-25T05:22:07Z

@jjpcat Just for the record I wasn't saying only do numjob=1 and never go any further but rather try and tune the speed when numjob=1 and only move on to numjob=2 etc once you (and everyone else) is sure that is maxed out. When you're able to submit I/O asynchronously one job is able to keep most single disks totally busy and the less threads/processes you have the less overhead you waste doing things like context switching etc.

Don't forget to look over https://github.com/axboe/fio/blob/master/MORAL-LICENSE if you're going to publish statistics using fio.

axboe · 2018-04-25T16:15:52Z

I agree with both of you. As a rule of thumb, you need enough threads to get the max perf, and no more. For NVMe, on modern boxes, a round figure or ~450K IOPS per core is feasible. So for NVMe, you'll usually find your best performance in the 2-4 threads case. Make QD as low as possible to reach the peak, no more.

jjpcat closed this as completed Apr 24, 2018

sitsofe mentioned this issue Sep 26, 2018

Low IOPS while running FIO against multiple NVMoF targets #694

Closed

sitsofe mentioned this issue Sep 30, 2019

Can bw (MiB/s) max value be considered as the maximum bandwidth offered by the SSD #829

Closed

polarathene mentioned this issue Sep 1, 2021

All FIO threads spawn on the same cpu core JonMagon/KDiskMark#50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to optimize FIO testing? #579

How to optimize FIO testing? #579

jjpcat commented Apr 11, 2018

axboe commented Apr 12, 2018 via email

szaydel commented Apr 12, 2018 •

edited

Loading

jjpcat commented Apr 12, 2018

jjpcat commented Apr 12, 2018 •

edited

Loading

szaydel commented Apr 13, 2018

szaydel commented Apr 13, 2018

jjpcat commented Apr 16, 2018

sitsofe commented Apr 18, 2018 •

edited

Loading

sitsofe commented Apr 24, 2018

jjpcat commented Apr 24, 2018

szaydel commented Apr 25, 2018

sitsofe commented Apr 25, 2018

axboe commented Apr 25, 2018

How to optimize FIO testing? #579

How to optimize FIO testing? #579

Comments

jjpcat commented Apr 11, 2018

axboe commented Apr 12, 2018 via email

szaydel commented Apr 12, 2018 • edited Loading

jjpcat commented Apr 12, 2018

jjpcat commented Apr 12, 2018 • edited Loading

szaydel commented Apr 13, 2018

szaydel commented Apr 13, 2018

jjpcat commented Apr 16, 2018

sitsofe commented Apr 18, 2018 • edited Loading

sitsofe commented Apr 24, 2018

jjpcat commented Apr 24, 2018

szaydel commented Apr 25, 2018

sitsofe commented Apr 25, 2018

axboe commented Apr 25, 2018

szaydel commented Apr 12, 2018 •

edited

Loading

jjpcat commented Apr 12, 2018 •

edited

Loading

sitsofe commented Apr 18, 2018 •

edited

Loading