-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All FIO threads spawn on the same cpu core #50
Comments
I get 14,000 IOPS when I run that fio command. The fio command does still pin one of the cpu threads at 100% which I guess is limiting me below 40,000 from my more powerful machine. Results are repeatable for both the fio as well as KDiskMark test (and yes I am making sure to do RND4K Q32 T16 in KDiskMark). Cpu speeds are the same between both tests |
@JonMagon any ideas? |
output.txt note I only did a 12M test here but I also ran a full 128M test and got the same results |
This disk you're testing is likely external via USB? There is quite a few differences between an SBC and desktop that would affect the performance between the two systems. But this only with hardware differences.. it appears you also have software differences (OS and benching tool) which can further complicate a fair comparison between two systems. Usually single threaded workloads aren't pinned to a specific core. It may appear that 1 core is under full load, but the workload is often being load balanced across each core (switching cores), unless you explicitly pinned it. You should also try to compare the same OS and kernel against the desktop system. Your SBC is running a rather old kernel (4.9 released in Dec 2016), notable changes have been made since then, especially with disk I/O schedulers from 5.0 release in early 2019. Or perhaps you can update what your SBC is running. As your testing against Windows on desktop for comparison, try test on the desktop with a distro that has more recent kernel and perhaps switch the fio Just in the past year for example, an Intel Optane Gen2 disk was found to achieve 2.58M IOPS via fio, but with all the new optimizations since, the upcoming 5.15 kernel is presently achieving 3.5M IOPS on the same device. Additionally... like the kernel, you're using a rather old version of It's probably not relevant, but one user noticed a reporting output difference with This Also keep point 3 from this response in mind, from your info so far it's probably unrelated, but when using USB devices (or enclosures) a bridge chipset for disk to USB sometimes constrains a devices capabilities, for example some SATA bridges have a much lower queue depth supported, among other SATA features and USB itself in a variety of ways can likewise restrict that, some of which are system specific (USB controller chipset, USB version, power supplied, USB driver, kernel, filesystem driver etc). Consider looking over the end of this response which investigates similar performance/CPU issues, noting the kernel being an issue again:
Related to that is this issue comment, noting queue depth multiplied by number of jobs raises the effective queue depth (to 512 in this case). This can contribute to the issue, but is unlikely a cause between the two systems, the block I/O scheduler might be more relevant of a difference, or CPU governor. Note that the kernel can also be built with settings that favor throughput vs responsiveness (lower latency for interactive feedback at expense of throughput). These variables all contribute towards performance. As for differences between KDiskMark and the direct This is further supported by this StackOverflow answer:
Another answer on that SO link also mentions fsync time measurements in newer One last likely influence is that since this is an external disk between Linux and Windows, are you testing against the same filesystem? exFAT or NTFS for example? On your old kernel, you're like using a FUSE userspace driver. These have significant overhead vs in-kernel supported filesystems. Modern kernels have kernel support for exFAT these days with NTFS still a WIP. It's quite likely that this could be causing a big difference for you as it will slow down the total throughput you can achieve and IIRC whenever I did such transfers will use up considerable CPU. This would be the first concern to test and verify, followed by using newer versions of |
@polarathene thanks for the detailed response. Filesystem is ext4 and I'll likely move to testing it on linux on both the tegra-x1 and my more powerful desktop for a more fare comparison instead of crystaldiskmark (which was just used since this is a somewhat clone of that tools functionality). Newer version of the kernel is not an option for this SBC |
Yeah, I only realized at the very end of my response that I had slipped up reading that command, that it was matching what KDiskMark is doing internally 😬
They're definitely using the same
You're welcome :) Hopefully one of those suggestions makes a difference!
I wasn't expecting that 😅 How well does that run via Windows? Or are you using it via WSL? Your version on Windows is probably newer than the ext4 filesystem on the 4.9 kernel, maybe different mount settings too.
That's unfortunate :( I did have a look around, and it seemed it was possible but quite a hassle, back between 2016-2018 there was more activity in content for getting newer kernels or customizations built, but seems to be little after that. The most promising might be this Debian guide which was also last updated mid 2018, but mentions working with upstream/mainline kernel that Debian uses (eg 4.14 or 4.16 are mentioned at the bottom). Not sure how many of the issues it points out remain as that page doesn't look maintained anymore. For the purpose of trying a new kernel to perform some I/O with Really surprised that nvidia continues to make releases of L4T but keeps it stuck on 4.9 kernel :\ |
I didn't specify before but we actually do technically have a custom kernel (just based on the l4t kernel with lots of fixes, patches) and this is running on a nintendo switch (tegra-x1 based system) We have mainline running as well but are stuck with nouveau drivers (which is less than ideal based on their performance and lack of vulkan), hence why sticking with L4T is the best option right now. I'll see if I can figure out of kdiskmark is using the system fio (I assume it is but just need to check)
yeah I think I had it mounted with wsl2... anyway I'll be testing in linux on the same pc in the future. I'll see about trying out a newer fio version soon enough but it won't be for a few days probably |
I think I am observing the same issue. Compared to command-line fio, results barely change with KDiskMark by varying the number of threads in the benchmark. Using OpenSUSE Tumbleweed (rolling release) with 5.18.5 kernel, fio 3.29, KDiskMark 2.3.0. I verified this also using the command suggested in the second post: #50 (comment) With it and a WD SN850 NVMe SSD I get 1050 kIOPS for read performance, but with KDiskMark and similar settings only about half this value, with the number of threads set in in the program having only very limited influence on the results. |
Description:
I am testing this on a relatively low powered SBC (tegra x1). All the FIO threads spawn on the same cpu core, which pins one core to 100% on the heavy IO tests (like the 4k read/writes). This results in much worse performance than the storage drives are capable of (getting 6,600 IOPS on the tegra x1 and 40,000 IOPS using the same drive on a MUCH more powerful windows machiene). Note: crystaldiskmark appears to use all cpu cores (even if it doesn't show in taskmanager).
The text was updated successfully, but these errors were encountered: