-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to optimize FIO testing? #579
Comments
Post some actual output from the fio run, 100% doesn’t necessarily say very much when you are running 8 processes. To hit peak, generally one process would be enough, I’d probably start there if I was you. Fio is very optimized as it is, the bottleneck is not going to be fio.
… On Apr 11, 2018, at 5:26 PM, jjpcat ***@***.***> wrote:
First time to post here. Please excuse me if I post it at the wrong place.
I have 4 very fast SSDs (Intel P4600, Micron 9200 MAX) in my i7-7700 PC. I am using FIO to try to measure the aggregated 4kB random read performance. So I start a FIO for each SSD, e.g.,
sudo fio --name=dummy --size=50G --runtime=600 --filename=/dev/nvme0n1 --ioengine=libaio --direct=1 --rw=randread -bs=4k** --iodepth=64 --numjobs=8 --group_reporting
The CPU load shots to 100%. It's still 90%+ when I cut down to testing 3 SSDs. Reducing numjobs helps in reducing CPU load. But I notice that it also brings down the IOPS. So I am wondering if I can do any optimizations to reduce FIO's CPU requirement.
Thanks.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@jjpcat: Latency should be examined closely, this sounds very much like io wait. A single fio process can drive hundreds of drives with rather low CPU load. I suspect the problem is elsewhere. For sake of sanity, are you using a fairly recent version of fio? |
Thanks for the responses. @szaydel Could you show me how to use a single fio process to drive multiple SSDs? I am attaching a screenshot when I was running 3 FIOs at the same time. The 3 SSDs used in this case: 1 Intel P4600 (the one in the lower left window) and 2 other somewhat slow NVMe SSDs. The average latency is large (3.9ms - 5.8ms). But that's expected when iodepth is set to 512. If I set iodepth to 1, then average latency for these 3 SSDs is 93us - 126us, which is typical. So for this test, because of 2 rather slow SSDs, the CPU usage is only 84%. I am using fio 2.16 running on Ubuntu with kernel 4.13. My observation with these SSDs is that, to hit max IOPS, I need to set numjobs to 8. IOPS will be about 2% lower if I set numjobs to 4 and significantly lower if numjobs is set to 1. |
@szaydel Is this the right way to drive multiple SSDs with 1 fio? --filename=/dev/nvme0n1:/dev/nvme1n1 I tried this. The IOPS is higher than each individual IOPS. But it's 10-30% lower than the sum of each individual IOPS. Increasing numjobs or iodepth doesn't help. There is an article on the internet saying that it's limited by the slowest SSD in the group. With a single fio process driving multiple SSDs, the CPU utilization also shot up. It doesn't improve CPU usage per aggregated IOPS, which is the problem I am trying to solve. Thanks. |
@jjpcat : I was fairly generic with my statement, which assumed a filesystem over the drives as opposed to just the individual drives. With regard to IO depth, I am not sure 512 is really sane, but you may have a very specific reason for that number. You might be hitting throughput limits of the bus also. Have you done just very basic sequential IO tests to see what throughput you top out at? |
@jjpcat: Install later version of fio., because 2.16 is quite dated at this point. |
@szaydel Updated to FIO 3.5. Still the same. |
@jjpcat As this isn't so much an issue in fio and more of a "How do I?" question it is better aimed at the fio mailing list. I'll note there has been the occasional "What are go faster options for fio?" questions on the mailing list in the past (e.g. https://www.spinics.net/lists/fio/msg05451.html ) and there are examples of the jobs people used to reach high IOPS in various places (e.g. https://marc.info/?l=linux-kernel&m=140313968523237&w=2 ).
Sort of. That's more for doing round robin between multiple disks but this is has already veered into a discussion topic rather than an issue. Here are some hints:
As I said I'd strongly recommend taking this to the mailing list... |
@jjpcat Any follow up on this? |
@sitsofe Thanks for providing those info. I think you have a good point regarding the time spent in the kernel. That's kind of beyond my control. I am using the standard Linux nvme driver. I may try some user space driver and test it again. Sorry that I can not do numjob=1. None of our competitors are doing this. If I use only 1 job, our IOPS numbers will look so much worse than our competitors. |
@jjpcat, are you trying to represent real world, or are you really just trying to chase numbers? If numbers, I think queue depth is quite important. Did you do any straight sequential IO where you try to get as much pushed through as possible? At least that should tell you how far you can push hardware. Kernel time is likely due to small IO, and the resulting large number of syscalls to get the IO done. And, as far as we can tell seems like system is spending a lot of time getting this IO done, which means we are waiting in the kernel. Just a few thoughts about how I would approach this. First, do straight sequential IO, just reads with large block. Next, same test, do both reads and writes, keep watch on CPU and kernel time. Start out with effectively QD=1, and increase from there, trying to figure out at which point QD stops to matter. CPU utilization should keep going up with QD. Once you hit a bottleneck, toss another CPU into the mix. If no difference, your problem is something else in the system. I am quite certain |
@jjpcat Just for the record I wasn't saying only do Don't forget to look over https://github.com/axboe/fio/blob/master/MORAL-LICENSE if you're going to publish statistics using fio. |
I agree with both of you. As a rule of thumb, you need enough threads to get the max perf, and no more. For NVMe, on modern boxes, a round figure or ~450K IOPS per core is feasible. So for NVMe, you'll usually find your best performance in the 2-4 threads case. Make QD as low as possible to reach the peak, no more. |
First time to post here. Please excuse me if I post it at the wrong place.
I have 4 very fast SSDs (Intel P4600, Micron 9200 MAX) in my i7-7700 PC. I am using FIO to try to measure the aggregated 4kB random read performance. So I start a FIO for each SSD, e.g.,
sudo fio --name=dummy --size=50G --runtime=600 --filename=/dev/nvme0n1 --ioengine=libaio --direct=1 --rw=randread -bs=4k** --iodepth=64 --numjobs=8 --group_reporting
The CPU load shots to 100%. It's still 90%+ when I cut down to testing 3 SSDs. Reducing numjobs helps in reducing CPU load. But I notice that it also brings down the IOPS. So I am wondering if I can do any optimizations to reduce FIO's CPU requirement.
Thanks.
The text was updated successfully, but these errors were encountered: