Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge differences of results between FIO V3.0 and V3.14 #805

Closed
Mydeus72 opened this issue Jul 23, 2019 · 14 comments
Closed

Huge differences of results between FIO V3.0 and V3.14 #805

Mydeus72 opened this issue Jul 23, 2019 · 14 comments

Comments

@Mydeus72
Copy link

Hello,
I was using FIO v3.0 until last week, because I build this FIO almost two years ago, I downloaded newer version v3.14. As you can see on pictures below, newer version have approx: 128 times better performance that old version. I am using shell script for calling different set-ups of FIO jobs.
In example case I used this set-up:
NAME=_022_write_rand_direct_1M_4K_100 RW=randwrite DIRECT=1 BUFFERED=0 SIZE=1m BS=4k NUMJOBS=100 FSYNC=0 CREATE_FSYNC=0 PRE_READ=0 INVALIDATE=1 SYNC=1 IODEPTH=1 $FIO_PATH/fio $FIO_PATH/mmc_benchmark.fio --max-jobs=200

FIO V3.0
image
FIO V3.14
image

I want to ask, If anyone knows if old or newer version have some problems? Or where exactly is the problem that newer version have so better performance. I run it on the same VM

@sitsofe
Copy link
Collaborator

sitsofe commented Jul 23, 2019

Hi @Mydeus72 ,

There are too many possibilities to say anything meaningful... Could you use git bisect to narrow the commit that introduced this regression?

@sitsofe sitsofe added the needreporterinfo Waiting on information from the issue reporter label Jul 23, 2019
@Mydeus72
Copy link
Author

Don't know how to do that. I will try find something about git bisect.

@sitsofe
Copy link
Collaborator

sitsofe commented Jul 23, 2019

@Mydeus72 Take a look at the comments from #587 (comment) onwards. Some snippets:

[...]
See https://blog.theodo.fr/2017/05/how-to-git-bisect-your-project/ for an example.

[...]
Basically what you do is:

$ git bisect start
$ git bisect good fio-3.0
$ git bisect bad fio-3.14

Please continue to ask if you need more hints with the git bisect. If I had to guess in advance where your problem lies I would imagine fio-3.5 was be "slow" and fio-3.6 will be "fast" but let's see where the bisect gets us...

@Mydeus72
Copy link
Author

@sitsofe
Ok, It looks like I need to have this FIO git repository on my PC. In meanwhite I write here how I use FIO on the VM now.

image

  • Copy to VM to some folder and unzip. Start terminal.
  • In terminal make build of FIO binary file with ./configure and make command.
  • Copy this birary file to the same folder with shellscript file (benchmark.sh) and with config file (benchmark.fio)
  • Finally, start the benchmark.sh with number of test I want to start. Example: ./benchmark.sh 22

@Mydeus72
Copy link
Author

@sitsofe
So bisect show me just this:
image

@sitsofe
Copy link
Collaborator

sitsofe commented Jul 23, 2019

@Mydeus72 OK you're at the beginning of your bisection. After you build and test (make clean; make; # do test) the commit you're on (f0d16) you need to say whether it was good (git bisect good) or bad (git bisect bad) and you'll get another commit to build (and then test) with.

@Mydeus72
Copy link
Author

@sitsofe
So I finished git bisect and here is result:
image

And I have a question, which version is good and which is bad?
There is that before we had sum of all job, or value just for one of them?

@sitsofe
Copy link
Collaborator

sitsofe commented Jul 23, 2019

Hmm your screenshot says 70750d6 is the first bad commit so I'd guess 0483fce is the first last good one. You can do a git bisect log to see the choices that the bisection took to get to that commit.

There is that before we had sum of all job, or value just for one of them?

Before - when doing group reporting we just reported sample data as summation averaged out data for both "types" of stats (regardless of whether they were time related or more of a scalar).

$ git checkout 0483fce; make clean; eatmydata make -j $(nproc)
[...]
$ ./fio --ioengine=null --size=500M --rate=100M --stonewall  --name=single --name=double --numjobs=2 --group_reporting
[...]
single: (groupid=0, jobs=1): err= 0: pid=29385: Fri Jul 26 08:35:58 2019
  read: IOPS=25.6k, BW=100MiB/s (105MB/s)(500MiB/5000msec)
[...]
   bw (  KiB/s): min=78761, max=102496, per=79.56%, avg=81471.56, stdev=7884.43, samples=9
   iops        : min=19690, max=25624, avg=20367.56, stdev=1971.23, samples=9
double: (groupid=1, jobs=2): err= 0: pid=29387: Fri Jul 26 08:35:58 2019
  read: IOPS=51.2k, BW=200MiB/s (210MB/s)(1000MiB/5000msec)
[...]
   bw (  KiB/s): min=102256, max=102464, per=50.00%, avg=102400.17, stdev=92.46, samples=18
   iops        : min=25562, max=25616, avg=25599.78, stdev=23.29, samples=18

Now we continue to sum sum data which is time based (bw/iops) but and only average out things like latencies when grouping:

git checkout 70750d6; make clean; eatmydata make -j $(nproc)
[...]
$ ./fio --ioengine=null --size=500M --rate=100M --stonewall  --name=single --name=double --numjobs=2 --group_reporting
[...]
single: (groupid=0, jobs=1): err= 0: pid=32013: Fri Jul 26 08:37:29 2019
  read: IOPS=25.6k, BW=100MiB/s (105MB/s)(500MiB/5000msec)
[...]
   bw (  KiB/s): min=78766, max=102489, per=79.57%, avg=81480.78, stdev=7878.37, samples=9
   iops        : min=19690, max=25622, avg=20369.89, stdev=1969.61, samples=9
[...]
double: (groupid=1, jobs=2): err= 0: pid=32014: Fri Jul 26 08:37:29 2019
  read: IOPS=51.2k, BW=200MiB/s (210MB/s)(1000MiB/5000msec)
[...]
   bw (  KiB/s): min=204511, max=205016, per=100.00%, avg=204797.11, stdev=104.06, samples=18
   iops        : min=51127, max=51254, avg=51199.11, stdev=26.13, samples=18

@Mydeus72
Copy link
Author

OK, I add here log:
image

From my opinion the problem with sum of jobs still persists. Because in v3.14 is still there, but in commit message you can see, that they try to make avg. value per one job. And that version was FIO v3.12-22-g070750.

@sitsofe
Copy link
Collaborator

sitsofe commented Jul 26, 2019

@Mydeus72 :

I've updated my earlier comment to correct some mistakes.

From my opinion the problem with sum of jobs still persists. Because in v3.14 is still there, but in commit message you can see, that they try to make avg. value per one job. And that version was FIO v3.12-22-g070750.

Sorry I don't quite follow.

  • Are you saying v3.12-22-g070750 is the first one with the changed behaviour?
  • But you are also saying that that "the problem" persists?

Just for reference assuming the bisect went well:

  • Commits equal to or prior to 0483fce should show the old behaviour
  • Commits equal to or greater than 70750d6 should show the new behaviour

Basically could you restate

  • The description of the problem
  • Your expected result
  • What you actually ended up seeing

(PS: When needed, can you copy and paste text rather than screenshots - it's easier on our end when we have to take values :-) )

@sitsofe
Copy link
Collaborator

sitsofe commented Aug 5, 2019

@Mydeus72 - did you want to continue with this one and if so could you clarify some of the points above? Thanks!

@Mydeus72
Copy link
Author

Mydeus72 commented Aug 5, 2019 via email

@Mydeus72
Copy link
Author

Mydeus72 commented Aug 6, 2019

@sitsofe
Hello, sorry for delay.
I wrote that problem still persist because in first bad commit is written:
...Before we'd have --> bw = 344 438 kB (for me BAD value) N1
...After this the same looks like --> bw = 1363 kB (for me GOOD value) N2

From the commit report it looks like, that problem was solved and in new versions of FIO is use N2. But from my experiance with FIO 3.14 is still use N1.

So I thought that problem shows in the past, "was repaired" but is still there.

@sitsofe sitsofe removed the needreporterinfo Waiting on information from the issue reporter label Aug 6, 2019
@sitsofe
Copy link
Collaborator

sitsofe commented Aug 17, 2019

@Mydeus72 (I notice you closed this but I'm still going to reply just in case

...Before we'd have --> bw = 344 438 kB (for me BAD value) N1
...After this the same looks like --> bw = 1363 kB (for me GOOD value) N2

One subtlety I failed to point out is that the units in the commit message output for bw change from KiB to MiB (so as in #805 (comment) the value was actually increasing). So that 1363 is actually MiB not KiB...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants