Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run FIO on iSCSI but can't reach the network speed limit #508

Closed
WisQTuser opened this issue Jan 5, 2018 · 11 comments
Closed

Run FIO on iSCSI but can't reach the network speed limit #508

WisQTuser opened this issue Jan 5, 2018 · 11 comments

Comments

@WisQTuser
Copy link

Setup tgtd by iscsi target service in Linux through 100Gbps break out 4* 25Gbps with 100Gbps Switch, client storage performance will be limited almost 1.5Gbps by FIO

@sitsofe
Copy link
Collaborator

sitsofe commented Jan 5, 2018

@WisQTuser Hi there!

As phrased this is unactionable by the fio project because your problem is so vague and open ended - there are simply too many variables and not enough information to help. You've said it's slow but why should I believe that's within fio? :-)

For example is your kernel misconfigured? Did you give fio a bad job with settings that limit it? Are your iSCSI initiator settings good enough? Did tgtd or fio run out of CPU? Are you using an old version of fio? Are you using jumbo frames? Do other tools go faster and if so how are they submitting I/O? And so forth. Even if we were to only consider that prior list of questions the fio project would only have time and capacity to help you with one of those questions ("Did you give fio a bad job?") and could only do so if you included the information requested within https://github.com/axboe/fio/blob/fio-3.3/REPORTING-BUGS . Asking someone to debug and tune your entire system is too big a request for a github issue ;-)

I don't know what others will say I would suggest that you close this issue until you've narrowed down your problem to something provably within fio itself (and ideally reproducible without tgtd). Until then a better starting point might be the tgtd mailing list over on http://vger.kernel.org/vger-lists.html#stgt . Also note while being extremely flexible tgtd isn't the fastest iSCSI target I've seen...

@WisQTuser
Copy link
Author

Hi,
Our answer as below, thanks!

  1. your kernel misconfigured? No, SUSE12SP3 kernel was 4.4.73
  2. Did you give fio a bad job with settings that limit it? we using FIO command as “fio -filename=/dev/sdb -direct=1 -iodepth 64 -thread -rw=read -ioengine=psync -bs=16M -size=100G -numjobs=16 -runtime=1000 -group_reporting -name=mytest”
  3. Are your iSCSI initiator settings good enough? Yes,
  4. Did tgtd or fio run out of CPU? No, the CPU usage was very low
  5. Are you using an old version of fio? We try to use FIO2.99 and FIO3.10 that have the same of problem.
  6. Are you using jumbo frames? No, we setup standard frames but try to change setting to jumbo frames that still get the same problem.
  7. Do other tools go faster and if so how are they submitting I/O? 100G to 4* 25Gbps network environment bandwidth was normally, but just iSCSI target with FIO had bandwidth limitation.

@axboe
Copy link
Owner

axboe commented Jan 5, 2018

This is a user error, I know plenty of folks that are driving way more, iscsi included. psync and iodepth > 1 doesn't make sense, as the max depth for a sync engine is 1. I'd experiment with libaio if you want higher queue depths per thread, and potentially reducing the massive buffer size.

@szaydel
Copy link
Contributor

szaydel commented Jan 5, 2018

@WisQTuser - There are all sorts of other issues that could be going on with iSCSI specifically, having nothing to do with FIO. What I would start by asking first is, are you actually seeing completely different results from another tool, or tools, or doing testing via some other means?

With iSCSI where I would start is large sequential IOs, and depth of 1, with a single job. Figure out if you can push a single IO stream, and instead of blocksize of 16M, which I am quite sure is not ideal with pretty much any target out there, consider what iSCSI is meant to mimic, which is disks, and instead start with 8K block, then going to perhaps 64K, then 128K and maybe up to 1M, to see if you are observing progression.

Of course, whatever is on the other end of the iSCSI target matters. Being on the vendor side of this, I see these sorts of things all the time. Networks are rarely a problem, and fio is even less likely to be a problem. If your local disks on the target are not capable, no matter what the network can do, you can only go as fast as the disks, or whatever media it is, are going to allow. At 25Gbps you are already talking over 3GB/s, and are you sure your storage is actually capable of that figure, sequentially?

@sitsofe
Copy link
Collaborator

sitsofe commented Jan 6, 2018

@WisQTuser I can only second the comments people have posted already:

  1. Can you prove the iSCSI target's disk can keep up - what is the output of fio operating directly on the underlying disk of the iSCSI target with nothing else in the way?
  2. Your block size is gigantic (16 megabytes!) which will almost certainly mean you are forcing the iSCSI client kernel to break it up into smaller pieces before passing it on adding overhead. Interestingly, your large block size would help a bit with a synchronous ioengine (see below) as the kernel will keep multiple broken up I/Os in flight BUT even then there is a crossover point where bigger means you actually go slower. Disks sometimes advertise an optimal block size (e.g. /sys/block/<disk>/queue/optimal_io_size as mentioned over in https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-block ) and there's always a maximum block size (/sys/block/<disk>/queue/max_hw_sectors_kb).
  3. You chose a synchronous ioengine (pvsync) so an iodepth greater than 1 doesn't matter - did you see the warning about synchronous ioengines in the documentation for the iodepth option? If you're going for "top speed" with normal disks you almost certainly want to use an asynchronous ioengine like libaio as suggested by Jens.

At this stage I think it's unlikely to be an issue in fio but here's an experiment: run the following as root on the iSCSI client machine:

modprobe null_blk irqmode=0
fio --filename=/dev/nullb0 --direct=1 --rw=read --bs=127k --stonewall --runtime=10s --time_based --name=test1psync --ioengine=pvsync --name=test2libaio --ioengine=libaio --iodepth=128

and post us back the full fio output.

@WisQTuser
Copy link
Author

Hi @sitsofe ,
my feedback as below,

  1. Can you prove the iSCSI target's disk can keep up - what is the output of fio operating directly on the underlying disk of the iSCSI target with nothing else in the way?

Ans: After try to build up iSCSI with NVMe SSD in windows server (iSCSI target), FIO of client storage performance can be targeted expect result (meet NVMe SSD performance up to 3,100 MB/s through Mellanox ConnectX 4 100Gbps network adapter), but Linux can’t (always limit on 1,500 MB/s).

  1. Your block size is gigantic (16 megabytes!) which will almost certainly mean you are forcing the iSCSI client kernel to break it up into smaller pieces before passing it on adding overhead. Interestingly, your large block size would help a bit with a synchronous ioengine (see below) as the kernel will keep multiple broken up I/Os in flight BUT even then there is a crossover point where bigger means you actually go slower. Disks sometimes advertise an optimal block size (e.g. /sys/block//queue/optimal_io_size as mentioned over in https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-block ) and there's always a maximum block size (/sys/block//queue/max_hw_sectors_kb).

Ans: We try to use your comment command as “fio --filename=/dev/nullb0 --direct=1 --rw=read --bs=127k --stonewall --runtime=10s --time_based --name=test1psync --ioengine=pvsync --name=test2libaio --ioengine=libaio --iodepth=128” that still have this problem

@sitsofe
Copy link
Collaborator

sitsofe commented Jan 9, 2018

@WisQTuser

After try to build up iSCSI with NVMe SSD in windows server (iSCSI target), FIO of client storage performance can be targeted expect result (meet NVMe SSD performance up to 3,100 MB/s through Mellanox ConnectX 4 100Gbps network adapter), but Linux can’t (always limit on 1,500 MB/s).

OK so you're saying Windows iSCSI target <-> Linux iSCSI client reaches the speeds you expect but Linux tgtd iSCSI target <-> Linux iSCSI client doesn't?

We try to use your comment command as “fio --filename=/dev/nullb0 --direct=1 --rw=read --bs=127k --stonewall --runtime=10s --time_based --name=test1psync --ioengine=pvsync --name=test2libaio --ioengine=libaio --iodepth=128” that still have this problem

That command was designed to be run explicitly run against Linux's null block device NOT your iSCSI target disk:

run the following as root on the iSCSI client machine:

modprobe null_blk irqmode=0
fio --filename=/dev/nullb0 --direct=1 --rw=read --bs=127k --stonewall --runtime=10s --time_based --name=test1psync --ioengine=pvsync --name=test2libaio --ioengine=libaio --iodepth=128

Can you make sure that you are running that fio explicitly against Linux's null block device as above and please post the full output of running those commands into this issue (i.e. don't summarise it really copy and paste the output of running those commands into this github issue).

@sitsofe
Copy link
Collaborator

sitsofe commented Jan 9, 2018

@WisQTuser Just in case it's not clear what I mean by full output: take a look at #509 (comment) . There the reporter not only posts the command they ran (the line that starts # ) but also the full output of running that command (the lines from job1: down to fio: in this particular case).

@sitsofe
Copy link
Collaborator

sitsofe commented Jan 10, 2018

@WisQTuser your previous comment seems to suggest that whatever you're seeing isn't down to fio itself. If you're unable to post the null block device test full output could you please close this issue as it can be meaningful taken further here.

@WisQTuser
Copy link
Author

Hi @sitsofe
Our test result as below, please kindly refer it. (No matter the amount of client, Server side always keep on 2.7 GBps)
About Storage of iSCSI Server, we using NVMe SSD that speed up to 3000MBps

FIO command in Client:
fio --filename=/dev/sdc --direct=1 --rw=read --bs=128k --stonewall --runtime=1000s --time_based --name=test2libaio --ioengine=libaio --iodepth=128

fio33

@sitsofe
Copy link
Collaborator

sitsofe commented Jan 11, 2018

@WisQTuser Well (sigh)... the results are for different device than the one I was asking for and I was very explicit the last time [emphasis added]:

Can you make sure that you are running that fio explicitly against Linux's null block device and please post the full output of running those commands into this issue

You also left out the answer to this question:

you're saying Windows iSCSI target <-> Linux iSCSI client reaches the speeds you expect but Linux tgtd iSCSI target <-> Linux iSCSI client doesn't?

When loaded, the null block devices start at /dev/nullb0. The reason I wanted you to check them is because the null block device is extremely fast and as such it's a way to measure fio's top speeds submitting I/O to a block device on a given system. For example here's what I get using the null block device on my system:

# modprobe null_blk irqmode=0
# ./fio --filename=/dev/nullb0 --direct=1 --rw=read --bs=127k --stonewall --runtime=10s --time_based --name=test1psync --ioengine=pvsync --name=test2libaio --ioengine=libaio --iodepth=128
test1psync: (g=0): rw=read, bs=(R) 127KiB-127KiB, (W) 127KiB-127KiB, (T) 127KiB-127KiB, ioengine=pvsync, iodepth=1
test2libaio: (g=1): rw=read, bs=(R) 127KiB-127KiB, (W) 127KiB-127KiB, (T) 127KiB-127KiB, ioengine=libaio, iodepth=128
fio-3.3-16-g1023-dirty
Starting 2 processes
Jobs: 1 (f=1): [_(1),R(1)][100.0%][r=37.7GiB/s,w=0KiB/s][r=311k,w=0 IOPS][eta 00m:00s]         
test1psync: (groupid=0, jobs=1): err= 0: pid=7083: Thu Jan 11 06:48:57 2018
   read: IOPS=225k, BW=27.3GiB/s (29.3GB/s)(273GiB/10000msec)
    clat (usec): min=2, max=8481, avg= 3.72, stdev= 6.64
     lat (usec): min=2, max=16046, avg= 3.87, stdev=12.59
    clat percentiles (nsec):
     |  1.00th=[ 2608],  5.00th=[ 2704], 10.00th=[ 2800], 20.00th=[ 2800],
     | 30.00th=[ 2800], 40.00th=[ 2896], 50.00th=[ 2992], 60.00th=[ 3408],
     | 70.00th=[ 3696], 80.00th=[ 4512], 90.00th=[ 5600], 95.00th=[ 6432],
     | 99.00th=[ 7520], 99.50th=[13248], 99.90th=[15040], 99.95th=[16512],
     | 99.99th=[26752]
   bw (  MiB/s): min=11977, max=27521, per=77.92%, avg=21754.46, stdev=5801.18, samples=19
   iops        : min=96573, max=221902, avg=175405.63, stdev=46775.01, samples=19
  lat (usec)   : 4=73.10%, 10=26.32%, 20=0.55%, 50=0.04%, 100=0.01%
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 10=0.01%
  cpu          : usr=12.28%, sys=87.68%, ctx=24, majf=0, minf=42
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=2251110,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
test2libaio: (groupid=1, jobs=1): err= 0: pid=7084: Thu Jan 11 06:48:57 2018
   read: IOPS=284k, BW=34.4GiB/s (36.9GB/s)(344GiB/10001msec)
    slat (nsec): min=1800, max=943700, avg=2485.87, stdev=2334.75
    clat (nsec): min=1700, max=3513.9k, avg=447450.83, stdev=117602.04
     lat (usec): min=3, max=3603, avg=450.10, stdev=118.27
    clat percentiles (usec):
     |  1.00th=[  359],  5.00th=[  359], 10.00th=[  359], 20.00th=[  379],
     | 30.00th=[  383], 40.00th=[  392], 50.00th=[  408], 60.00th=[  433],
     | 70.00th=[  469], 80.00th=[  478], 90.00th=[  562], 95.00th=[  685],
     | 99.00th=[  898], 99.50th=[ 1020], 99.90th=[ 1221], 99.95th=[ 1369],
     | 99.99th=[ 2180]
   bw (  MiB/s): min=16898, max=40374, per=99.48%, avg=35035.84, stdev=6797.59, samples=19
   iops        : min=136250, max=325544, avg=282493.89, stdev=54809.05, samples=19
  lat (usec)   : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
  lat (usec)   : 250=0.01%, 500=84.57%, 750=11.36%, 1000=3.31%
  lat (msec)   : 2=0.75%, 4=0.01%
  cpu          : usr=18.84%, sys=81.12%, ctx=5, majf=0, minf=497
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=2839863,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=27.3GiB/s (29.3GB/s), 27.3GiB/s-27.3GiB/s (29.3GB/s-29.3GB/s), io=273GiB (293GB), run=10000-10000msec

Run status group 1 (all jobs):
   READ: bw=34.4GiB/s (36.9GB/s), 34.4GiB/s-34.4GiB/s (36.9GB/s-36.9GB/s), io=344GiB (369GB), run=10001-10001msec

Disk stats (read/write):
  nullb0: ios=5069657/0, merge=0/0, ticks=3292/0, in_queue=3160, util=15.62%

So with the pvsync ioengine fio is pushing at least 27 gigabytes per second using a single core and looking at the cpu line shows we're using near enough 100% CPU. The libaio result is even higher but uses more userspace time. Put another way, if your "disk" can go fast enough, on my system fio can push 10s of gigabytes while using 20% or less CPU for itself on a single core even at an I/O depth of 1! So this suggests fio itself is not your bottleneck but as a last ditch effort let's analyze what you posted...

Looking at the right most terminal in the screenshots you posted we see submission latencies (time to submit the I/O and have the kernel tell us it's queued it up for sending) in the 1-6 microsecond range (which is fine). Unfortunately your completion latencies (time from when kernel accepted it for queuing until it got a reply back from the underlying disk saying the I/O completed and then we notice the kernel saying to us the I/O completed) are in the 1-108 millisecond range with an average of 11.2ms (this is not fine for fast speeds). In short fio is telling you that I/Os to /dev/sdc have a high latency. If I am sending 128 I/Os at once then that means at most I can do 1000/11.2 * 128 ≈ 11429 I/Os per second. So 11429 * (128*1024)/(1024.0/1024.0/1024.0) ≈ 1.4 gigabyte/s tops when your blocks are 128 kilobytes. So your maximum speed is being limited by the (high) latency of your I/Os and NOT by fio. Thus you have demonstrated the problem is NOT within fio.

In short your problem is down to the latency of something in or below /dev/sdc being high and because your issue is not in fio this is the wrong venue for a general "why is my disk slow?" question. Sorry @WisQTuser, but I have to recommend you close this issue here and start hunting for why your iSCSI stack has that latency and how you can overcome/workaround it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants