DNM : Race condition in rados bench #5812

Abhishekvrshny · 2015-09-04T12:39:39Z

http://tracker.ceph.com/issues/12949

ghost · 2015-09-04T12:49:46Z

@Abhishekvrshny the link in the description must be an issue in the "Backport" tracker

Abhishekvrshny · 2015-09-04T12:52:16Z

@dachary Yes my bad :( . It should have been 12949. Also the name of the branch should have been wip-12949-hammer. Can I change that somehow?

loic-bot · 2015-09-04T13:07:15Z

FAILURE: http://jenkins.ceph.dachary.org/job/ceph/7504/

FAILURE http://jenkins.ceph.dachary.org/job/ceph/LABELS=ubuntu-14.04&&x86_64/7504/

ghost · 2015-09-04T13:12:50Z

@Abhishekvrshny I see you figured it out ;-)

loic-bot · 2015-09-04T14:02:31Z

FAILURE: http://jenkins.ceph.dachary.org/job/ceph/7506/

FAILURE http://jenkins.ceph.dachary.org/job/ceph/LABELS=ubuntu-14.04&&x86_64/7506/

Abhishekvrshny · 2015-09-04T17:05:03Z

@dachary #5152 was merged into master after #4690 was merged. Can that be the reason for failures? Shall we wait for #5810 to get into hammer first before backporting this particular PR ?

ghost · 2015-09-04T17:15:06Z

You can include fa72ecb and dba8b5b from #5810 and see if it helps. I would do:

git reset --hard ceph/hammer
git cherry-pick fa72ecb dba8b5b be2aab5
git push --force

and see if the bot compiles ok with that. It's perfectly ok to include commits from other pull requests if needed. They will be silently trimmed by git when merging.

loic-bot · 2015-09-04T18:35:33Z

FAILURE: http://jenkins.ceph.dachary.org/job/ceph/7517/

FAILURE http://jenkins.ceph.dachary.org/job/ceph/LABELS=ubuntu-14.04&&x86_64/7517/

shinobu-x · 2015-09-05T11:56:19Z

@dachary very nice introduction not for bug-fix ... Sorry it's not quite about this pr.

Tab replaced with spaces Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit 069d95e)

Generalize vec_stddev by making them templates, so that we could use different kind of a vector to calculate STDDEV Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit e360bfd)

fix indents in output change `snprintf' to `setprecision' Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit ddb422f)

Add IOPS metric in result output for write_bench, rand_read_bench and seq_read_bench routines Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit 3944264)

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> (cherry picked from commit e3d62a9)

'trans_size' description from header: "size of the write/read to perform" But really 'object_size' is used in write/read operations. 'trans_size' is used only in ObjBencher::status_printer for calc current and average bandwidth. As result - bad statistics in case 'trans_size' and and 'object_size' are different. Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit 31d16e9)

Remove excess branch 'else {object_size = op_size;}' from 'if(operation != OP_WRITE){...}else{object_size = op_size;}' because 'object_size' variable already initialized with 'op_size' Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit 1d1d0aa)

Rename aio_bench argument 'op_size' to 'object_size' to reflect the reality of how it is used. Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit a87ac4d)

@dachary

We never want to accumulate a bandwidth that is zero. A bandwidth of zero means that the object could not be written in one second (or that it's the first time in the loop). Even if writing an object takes a very long time, the bandwidth cannot be zero, it will just be very small. Fix suggested by @dachary Fixes: ceph#7401 Signed-off-by: Dmitry Yatsushkevich <dmitry.yatsushkevich@gmail.com> (cherry picked from commit cd0f2b7)

Signed-off-by: Loic Dachary <ldachary@redhat.com> (cherry picked from commit 1ac7279)

ObjBencher::aio_bench sets data.finished to num_objs but it should be set to zero since it reflects the number of objects read / written. That's what the read bench do and the write bench should do the same. It does not cause a problem, it is just confusing. Signed-off-by: Loic Dachary <ldachary@redhat.com> (cherry picked from commit 5c6190d)

It was broken by 1ac7279 Signed-off-by: Loic Dachary <ldachary@redhat.com> (cherry picked from commit fe79bab)

Fixes the high CPU usage and corrects rados bench scores on fast SSDs and ramdisks/memstore. For bench run on SSD, on Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz before this patch, times are: write: real 5m0.169s, user 2m33.565s, sys 4m39.791s seq: real 4m28.642s, user 1m35.250s, sys 6m42.948s rand: real 5m0.258s, user 1m19.656s, sys 6m47.145s After this patch: write: real 5m1.162s, user 0m27.788s, sys 3m11.707s seq: real 5m1.149s, user 2m23.278s, sys 4m14.427s rand: real 5m1.021s, user 2m30.514s, sys 4m20.347s Bench run: rados -p ssd bench 300 write|seq|read --no-cleanup Note the increase in user time cpu on seq/read tests, along with decreased sys cpu time; this is because there's additional memcmp() that compares read objects with expected contents. With less time spent memory juggling, more time is spent performing more reads per second, increasing memcmp call count and increasing amount of user cpu time used. Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com> (cherry picked from commit b894fc7)

When doing seq and rand read benchmarks using rados bench, a quite large portion of cpu time is consumed by doing object verification. This patch adds an option to disable this verification when it's not needed, in turn giving better cluster utilization. rados -p storage bench 600 rand scores without --no-verification: Total time run: 600.228901 Total reads made: 144982 Read size: 4194304 Bandwidth (MB/sec): 966 Average IOPS: 241 Stddev IOPS: 38 Max IOPS: 909522486 Min IOPS: 0 Average Latency: 0.0662 Max latency: 1.51 Min latency: 0.004 real 10m1.173s user 5m41.162s sys 11m42.961s Same command, but with --no-verify: Total time run: 600.161379 Total reads made: 174142 Read size: 4194304 Bandwidth (MB/sec): 1.16e+03 Average IOPS: 290 Stddev IOPS: 20 Max IOPS: 909522486 Min IOPS: 0 Average Latency: 0.0551 Max latency: 1.12 Min latency: 0.00343 real 10m1.172s user 4m13.792s sys 13m38.556s Note the decreased latencies, increased bandwidth and more reads performed. Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com> (cherry picked from commit ca6abca)

This function iterates over all bufferlist internal buffers and calls their invalidate_crc() method. Required for rados bench to work correctly, because it modifies buffers outside buffer api, invalidating internal CRC cache in the process - this method clears that cache, so another call for buffer::get_crc() to return correct checksum. Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com> (cherry picked from commit 55a6f9e)

Under certain conditions (like bench seq/rand -b 1024 -t 128) it is possible that aio_read reads data into destination buffers before or during memcmp execution, resulting in "[..] is not correct!" errors even if actual objects are perfectly fine. Also, moved latencty calculation around, so it is no longer affeted by memcmp. Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com> (cherry picked from commit 9bcf5f0)

loic-bot · 2015-09-07T11:29:25Z

FAILURE: http://jenkins.ceph.dachary.org/job/ceph/7575/

FAILURE http://jenkins.ceph.dachary.org/job/ceph/LABELS=ubuntu-14.04&&x86_64/7575/

theanalyst · 2015-09-07T12:15:52Z

src/common/obj_bencher.cc

@@ -153,8 +153,12 @@ void *ObjBencher::status_printer(void *_bencher) {

 int ObjBencher::aio_bench(
  int operation, int secondsToRun,
+<<<<<<< HEAD


These lines shouldn't be present in the commits..

Abhishekvrshny · 2015-09-07T12:20:36Z

@theanalyst my bad. The side-effect of manually merging conflicts

theanalyst · 2015-09-07T12:21:18Z

@Abhishekvrshny no problems... this is a non trivial merge .. and a good learning experience

ghost · 2015-09-07T12:40:48Z

@Abhishekvrshny I suggest you try to get all modifications made to obj_bencher since hammer with

git cherry-pick -x 069d95eaf49cadaa9a8fa1fa186455944a50ec7d e360bfd7058af01fd03aa9260eacbdc5d631e34b ddb422f40394ec2f4fc4a6a6cff49e43538121a9 39442641fc61c6ba7bde2cde5ceafd9276647f85 e3d62a9d4c6da7bd48f7aefce91db050bdc9a637 31d16e90108a741e92f897e550a8bf1e26392d2d 1d1d0aa01b944e8f9dad96b7f13bce2a581ba758 a87ac4d36b86aa826afb990769831bfcf08ed3fa cd0f2b754dc3cdf1e4f1647dc1cd91d891ce896c 1ac727982213bf676beefe9340d6822193dbb700 5c6190d012785222ba1eebee734aa34f79c5bf27 fe79babc9867a4cdf81a5b83019b9e1bbc86f9ae b894fc7acf7dee7f7ec8c5c280e7a6be41133328 ca6abca 55a6f9e b894fc7 9bcf5f0

It applies cleanly and it's legitimate IMHO because:

it fixes a few bugs
it is limited to the obj_bencher.{cc,h} and rados.cc files
a few cleanups that occurred in between bug fixes make it difficult to backport cleanly (when dealing with isolated commits)

although it's made of a dozen commit the series is straightforward

a417478 tools: fix race condition in seq/rand bench
src/common/obj_bencher.cc | 80 ++++++++++++++++++++++++++++++++++++++++++++------------------------------------
1 file changed, 44 insertions(+), 36 deletions(-)
697cfc9 bufferlist: implement bufferlist::invalidate_crc()
src/common/buffer.cc   |  9 +++++++++
src/include/buffer.h   |  1 +
src/test/bufferlist.cc | 40 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 50 insertions(+)
8c3d381 tools: add --no-verify option to rados bench
src/common/obj_bencher.cc | 67 ++++++++++++++++++++++++++++++++++++++++++-------------------------
src/common/obj_bencher.h  |  6 +++---
src/tools/rados/rados.cc  | 11 ++++++++++-
3 files changed, 55 insertions(+), 29 deletions(-)
ba2bf47 tools: Don't delete, recreate and re-fill buffers in rados bench.
src/common/obj_bencher.cc | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
056ecdb common: fix ObjBencher::aio_bench signature
src/common/obj_bencher.cc | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
036744d common: rados bench data.finished = 0
src/common/obj_bencher.cc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
193fa52 common: remove unused maxObjectsToCreate
src/common/obj_bencher.cc | 22 +++++++++-------------
src/common/obj_bencher.h  |  4 ++--
src/tools/rados/rados.cc  |  2 +-
3 files changed, 12 insertions(+), 16 deletions(-)
b128b78 obj_bencher: does not accumulate bandwidth that is zero
src/common/obj_bencher.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
f3d718a obj_bencher: aio_bench - rename op_size to object_size
src/common/obj_bencher.cc | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
bd2d94d obj_bencher: remove excess 'object_size = op_size'
src/common/obj_bencher.cc | 2 --
1 file changed, 2 deletions(-)
e1797ca obj_bencher: remove 'trans_size' as obsolete
src/common/obj_bencher.cc | 5 ++---
src/common/obj_bencher.h  | 1 -
2 files changed, 2 insertions(+), 4 deletions(-)
13b3290 common: make rados bench return correctly errno.
src/common/obj_bencher.cc | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
2566198 obj_bencher: add IOPS metric calculation
src/common/obj_bencher.cc | 28 ++++++++++++++++++++++++++++
src/common/obj_bencher.h  |  3 +++
2 files changed, 31 insertions(+)
2188183 obj_bencher: cosmetic display fixes
src/common/obj_bencher.cc | 30 ++++++++++++------------------
1 file changed, 12 insertions(+), 18 deletions(-)
455b7f8 obj_bencher: generalize vec_stddev function
src/common/obj_bencher.cc | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
8813436 obj_bencher: fix indents
src/common/obj_bencher.cc | 116 ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
1 file changed, 58 insertions(+), 58 deletions(-)

Abhishekvrshny · 2015-09-07T13:00:10Z

@dachary This would put a bulk of commits in one backport PR. Is that fine? or shall there be separate backport PRs for each of them?

Also if we take all changes since hammer, how do we know if there isn't something which is NOT meant to be in hammer (and is probably a feature only for the next release)?

ghost · 2015-09-07T13:14:33Z

Also if we take all changes since hammer, how do we know if there isn't something which is NOT meant to be in hammer (and is probably a feature only for the next release)?

Very good question :-) The only way is to review each commit individually, which I did. I guess we're lucky that this is a simple situation. It may have been a lot more hairy if there were new features.

ghost · 2015-09-07T13:18:01Z

This would put a bulk of commits in one backport PR. Is that fine?

This is generally not ok and we do that very rarely. IMHO it's ok in this specific case because it's about a tool and not the core and because the series of change is quite simple.

or shall there be separate backport PRs for each of them?

Let's first see how it goes with tests. If tests are happy and noone objects this rather extensive backport, then we can worry about cross referencing this commit in the right places.

You're lucky : you got to deal with one of the most complicated cases of backport :-)

loic-bot · 2015-09-07T13:39:17Z

SUCCESS: http://jenkins.ceph.dachary.org/job/ceph/7581/

SUCCESS http://jenkins.ceph.dachary.org/job/ceph/LABELS=ubuntu-14.04&&x86_64/7581/

Abhishekvrshny · 2015-09-07T15:01:02Z

@dachary :) . Now that the tests have passed, shall I make a new branch with an appropriate name and send a separate PR, or will this be fine?

branch-predictor · 2015-09-09T09:07:56Z

I didn't expected that it'll be so complicated. Sorry, guys!
And thanks for the hard work.

ghost · 2015-09-09T11:10:17Z

@branch-predictor it is a good experience (not just for @Abhishekvrshny ;-).

ghost · 2015-09-28T10:00:46Z

@Abhishekvrshny I think it's worth asking if #5428 should be backported (on the ceph-devel mailing list).

ghost · 2015-12-01T12:14:15Z

For the record, the mail thread is at : http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/27302

ghost · 2015-12-01T12:14:59Z

 abhishekvrshny: why we need #5428? #5812 relies on it?
 ah, right
 there's a change to rest_bench as well, but it got lost once rest_bench was eradicated
 abhishekvrshny: this: http://ceph.predictor.org.pl/diff-wip-12947.txt should help
 BranchPredictor: Can you raise a hammer specific PR for 5812 and I would close 5812 itself. I can then schedule an integration test with that.
 abhishekvrshny: well, I can...
 company will kill me for all those git clones. :P
 BranchPredictor: it looks like a non-trivial backport to me. I dont want to mess it up :P 
 abhishekvrshny: no worries, I'll do it.
 BranchPredictor: Thanks :)

branch-predictor · 2015-12-01T12:23:39Z

125456 < BranchPredictor> abhishekvrshny: why we need #5428? #5812 relies on it?
125545 < abhishekvrshny> loicd: http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/27302
125606 < BranchPredictor> ah, right
125631 < BranchPredictor> there's a change to rest_bench as well, but it got lost once rest_bench was eradicated
125729 < BranchPredictor> abhishekvrshny: this: http://ceph.predictor.org.pl/diff-wip-12947.txt should help
130233 < abhishekvrshny> BranchPredictor: Can you raise a hammer specific PR for 5812 and I would close 5812 itself. I can then schedule an integration test with that.
130305 < BranchPredictor> abhishekvrshny: well, I can...
130423 < abhishekvrshny> BranchPredictor: it looks like a non-trivial backport to me. I dont want to mess it up :P
130444 < BranchPredictor> abhishekvrshny: no worries, I'll do it.
131010 < abhishekvrshny> BranchPredictor: Thanks :)

ghost · 2015-12-03T14:39:32Z

replaced by #6791

ghost added bug-fix tools labels Sep 4, 2015

ghost added this to the hammer milestone Sep 4, 2015

ghost self-assigned this Sep 4, 2015

Abhishekvrshny force-pushed the wip-12947-hammer branch from 5538727 to be2aab5 Compare September 4, 2015 13:49

Abhishekvrshny force-pushed the wip-12947-hammer branch from be2aab5 to 5fb896a Compare September 4, 2015 18:00

Dmitry Yatsushkevich and others added 15 commits September 7, 2015 16:20

obj_bencher: fix indents

ff4d8e8

Tab replaced with spaces Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit 069d95e)

obj_bencher: generalize vec_stddev function

3174be1

Generalize vec_stddev by making them templates, so that we could use different kind of a vector to calculate STDDEV Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit e360bfd)

obj_bencher: cosmetic display fixes

58ca555

fix indents in output change `snprintf' to `setprecision' Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit ddb422f)

obj_bencher: add IOPS metric calculation

5e1eec7

Add IOPS metric in result output for write_bench, rand_read_bench and seq_read_bench routines Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit 3944264)

common: make rados bench return correctly errno.

5f246a6

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> (cherry picked from commit e3d62a9)

obj_bencher: aio_bench - rename op_size to object_size

6871ba3

Rename aio_bench argument 'op_size' to 'object_size' to reflect the reality of how it is used. Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com> (cherry picked from commit a87ac4d)

common: remove unused maxObjectsToCreate

f4fdf0c

Signed-off-by: Loic Dachary <ldachary@redhat.com> (cherry picked from commit 1ac7279)

common: fix ObjBencher::aio_bench signature

ba9e2d7

It was broken by 1ac7279 Signed-off-by: Loic Dachary <ldachary@redhat.com> (cherry picked from commit fe79bab)

Abhishekvrshny force-pushed the wip-12947-hammer branch from 5fb896a to e298f84 Compare September 7, 2015 10:57

Abhishekvrshny force-pushed the wip-12947-hammer branch from e298f84 to 8f8d488 Compare September 7, 2015 11:47

theanalyst reviewed Sep 7, 2015
View reviewed changes

Abhishekvrshny force-pushed the wip-12947-hammer branch from 8f8d488 to a71071a Compare September 7, 2015 12:26

Abhishekvrshny force-pushed the wip-12947-hammer branch from a71071a to d37c990 Compare September 7, 2015 12:54

Abhishekvrshny changed the title ~~Race condition in rados bench~~ DNM : Race condition in rados bench Sep 22, 2015

branch-predictor mentioned this pull request Nov 5, 2015

hammer: tools: heavy memory shuffling in rados bench #5810

Merged

branch-predictor mentioned this pull request Dec 3, 2015

hammer: tools: race condition in rados bench #6791

Merged

ghost closed this Dec 3, 2015

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNM : Race condition in rados bench #5812

DNM : Race condition in rados bench #5812

Abhishekvrshny commented Sep 4, 2015

ghost commented Sep 4, 2015

Abhishekvrshny commented Sep 4, 2015

loic-bot commented Sep 4, 2015

ghost commented Sep 4, 2015

loic-bot commented Sep 4, 2015

Abhishekvrshny commented Sep 4, 2015

ghost commented Sep 4, 2015

loic-bot commented Sep 4, 2015

shinobu-x commented Sep 5, 2015

loic-bot commented Sep 7, 2015

theanalyst Sep 7, 2015

Abhishekvrshny commented Sep 7, 2015

theanalyst commented Sep 7, 2015

ghost commented Sep 7, 2015

Abhishekvrshny commented Sep 7, 2015

ghost commented Sep 7, 2015

ghost commented Sep 7, 2015

loic-bot commented Sep 7, 2015

Abhishekvrshny commented Sep 7, 2015

branch-predictor commented Sep 9, 2015

ghost commented Sep 9, 2015

ghost commented Sep 28, 2015

ghost commented Dec 1, 2015

ghost commented Dec 1, 2015

branch-predictor commented Dec 1, 2015

ghost commented Dec 3, 2015

DNM : Race condition in rados bench #5812

DNM : Race condition in rados bench #5812

Conversation

Abhishekvrshny commented Sep 4, 2015

ghost commented Sep 4, 2015

Abhishekvrshny commented Sep 4, 2015

loic-bot commented Sep 4, 2015

ghost commented Sep 4, 2015

loic-bot commented Sep 4, 2015

Abhishekvrshny commented Sep 4, 2015

ghost commented Sep 4, 2015

loic-bot commented Sep 4, 2015

shinobu-x commented Sep 5, 2015

loic-bot commented Sep 7, 2015

theanalyst Sep 7, 2015

Choose a reason for hiding this comment

Abhishekvrshny commented Sep 7, 2015

theanalyst commented Sep 7, 2015

ghost commented Sep 7, 2015

Abhishekvrshny commented Sep 7, 2015

ghost commented Sep 7, 2015

ghost commented Sep 7, 2015

loic-bot commented Sep 7, 2015

Abhishekvrshny commented Sep 7, 2015

branch-predictor commented Sep 9, 2015

ghost commented Sep 9, 2015

ghost commented Sep 28, 2015

ghost commented Dec 1, 2015

ghost commented Dec 1, 2015

branch-predictor commented Dec 1, 2015

ghost commented Dec 3, 2015