Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM : Race condition in rados bench #5812

Closed
wants to merge 16 commits into from

Conversation

Abhishekvrshny
Copy link

@ghost ghost added bug-fix tools labels Sep 4, 2015
@ghost ghost added this to the hammer milestone Sep 4, 2015
@ghost ghost self-assigned this Sep 4, 2015
@ghost
Copy link

ghost commented Sep 4, 2015

@Abhishekvrshny the link in the description must be an issue in the "Backport" tracker

@Abhishekvrshny
Copy link
Author

@dachary Yes my bad :( . It should have been 12949. Also the name of the branch should have been wip-12949-hammer. Can I change that somehow?

@ghost
Copy link

ghost commented Sep 4, 2015

@Abhishekvrshny I see you figured it out ;-)

@Abhishekvrshny
Copy link
Author

@dachary #5152 was merged into master after #4690 was merged. Can that be the reason for failures? Shall we wait for #5810 to get into hammer first before backporting this particular PR ?

@ghost
Copy link

ghost commented Sep 4, 2015

You can include fa72ecb and dba8b5b from #5810 and see if it helps. I would do:

git reset --hard ceph/hammer
git cherry-pick fa72ecb dba8b5b be2aab5
git push --force

and see if the bot compiles ok with that. It's perfectly ok to include commits from other pull requests if needed. They will be silently trimmed by git when merging.

@shinobu-x
Copy link
Contributor

@dachary very nice introduction not for bug-fix ... Sorry it's not quite about this pr.

Dmitry Yatsushkevich and others added 15 commits September 7, 2015 16:20
Tab replaced with spaces

Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com>
(cherry picked from commit 069d95e)
Generalize vec_stddev by making them templates, so that we could use different kind of a vector to calculate STDDEV

Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com>
(cherry picked from commit e360bfd)
fix indents in output
change `snprintf' to `setprecision'

Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com>
(cherry picked from commit ddb422f)
Add IOPS metric in result output for write_bench, rand_read_bench and seq_read_bench routines

Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com>
(cherry picked from commit 3944264)
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit e3d62a9)
'trans_size' description from header: "size of the write/read to perform"
But really 'object_size' is used in write/read operations. 'trans_size' is used
only in ObjBencher::status_printer for calc current and average bandwidth.
As result - bad statistics in case 'trans_size' and and 'object_size' are different.

Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com>
(cherry picked from commit 31d16e9)
Remove excess branch 'else {object_size = op_size;}' from
'if(operation != OP_WRITE){...}else{object_size = op_size;}' because
'object_size' variable already initialized with 'op_size'

Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com>
(cherry picked from commit 1d1d0aa)
Rename aio_bench argument 'op_size' to 'object_size' to reflect the reality of
how it is used.

Signed-off-by: Dmitry Yatsushkevich <dyatsushkevich@mirantis.com>
(cherry picked from commit a87ac4d)
We never want to accumulate a bandwidth that is zero. A bandwidth of
zero means that the object could not be written in one second (or that
it's the first time in the loop). Even if writing an object takes a very
long time, the bandwidth cannot be zero, it will just be very small.

Fix suggested by @dachary
Fixes: ceph#7401

Signed-off-by: Dmitry Yatsushkevich <dmitry.yatsushkevich@gmail.com>
(cherry picked from commit cd0f2b7)
Signed-off-by: Loic Dachary <ldachary@redhat.com>
(cherry picked from commit 1ac7279)
ObjBencher::aio_bench sets data.finished to num_objs but it should be
set to zero since it reflects the number of objects read /
written. That's what the read bench do and the write bench should do the
same. It does not cause a problem, it is just confusing.

Signed-off-by: Loic Dachary <ldachary@redhat.com>
(cherry picked from commit 5c6190d)
It was broken by 1ac7279

Signed-off-by: Loic Dachary <ldachary@redhat.com>
(cherry picked from commit fe79bab)
Fixes the high CPU usage and corrects rados bench scores on fast SSDs
and ramdisks/memstore.
For bench run on SSD, on Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
before this patch, times are:
write: real 5m0.169s, user 2m33.565s, sys 4m39.791s
seq: real 4m28.642s, user 1m35.250s, sys 6m42.948s
rand: real 5m0.258s, user 1m19.656s, sys 6m47.145s

After this patch:
write: real 5m1.162s, user 0m27.788s, sys 3m11.707s
seq: real 5m1.149s, user 2m23.278s, sys 4m14.427s
rand: real 5m1.021s, user 2m30.514s, sys 4m20.347s

Bench run: rados -p ssd bench 300 write|seq|read --no-cleanup

Note the increase in user time cpu on seq/read tests,
along with decreased sys cpu time; this is because there's
additional memcmp() that compares read objects with expected
contents. With less time spent memory juggling, more time is
spent performing more reads per second, increasing memcmp call
count and increasing amount of user cpu time used.

Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
(cherry picked from commit b894fc7)
When doing seq and rand read benchmarks using rados bench, a quite large
portion of cpu time is consumed by doing object verification. This patch
adds an option to disable this verification when it's not needed, in turn
giving better cluster utilization. rados -p storage bench 600 rand scores
without --no-verification:

Total time run:       600.228901
Total reads made:     144982
Read size:            4194304
Bandwidth (MB/sec):   966
Average IOPS:         241
Stddev IOPS:          38
Max IOPS:             909522486
Min IOPS:             0
Average Latency:      0.0662
Max latency:          1.51
Min latency:          0.004

real    10m1.173s
user    5m41.162s
sys     11m42.961s

Same command, but with --no-verify:

Total time run:       600.161379
Total reads made:     174142
Read size:            4194304
Bandwidth (MB/sec):   1.16e+03
Average IOPS:         290
Stddev IOPS:          20
Max IOPS:             909522486
Min IOPS:             0
Average Latency:      0.0551
Max latency:          1.12
Min latency:          0.00343

real    10m1.172s
user    4m13.792s
sys     13m38.556s

Note the decreased latencies, increased bandwidth and more reads performed.

Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
(cherry picked from commit ca6abca)
This function iterates over all bufferlist internal buffers and calls
their invalidate_crc() method. Required for rados bench to work
correctly, because it modifies buffers outside buffer api, invalidating
internal CRC cache in the process - this method clears that cache, so
another call for buffer::get_crc() to return correct checksum.

Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
(cherry picked from commit 55a6f9e)
Under certain conditions (like bench seq/rand -b 1024 -t 128) it is
possible that aio_read reads data into destination buffers before or
during memcmp execution, resulting in "[..] is not correct!" errors
even if actual objects are perfectly fine.
Also, moved latencty calculation around, so it is no longer affeted
by memcmp.

Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
(cherry picked from commit 9bcf5f0)
@@ -153,8 +153,12 @@ void *ObjBencher::status_printer(void *_bencher) {

int ObjBencher::aio_bench(
int operation, int secondsToRun,
<<<<<<< HEAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines shouldn't be present in the commits..

@Abhishekvrshny
Copy link
Author

@theanalyst my bad. The side-effect of manually merging conflicts

@theanalyst
Copy link
Member

@Abhishekvrshny no problems... this is a non trivial merge .. and a good learning experience

@ghost
Copy link

ghost commented Sep 7, 2015

@Abhishekvrshny I suggest you try to get all modifications made to obj_bencher since hammer with

git cherry-pick -x 069d95eaf49cadaa9a8fa1fa186455944a50ec7d e360bfd7058af01fd03aa9260eacbdc5d631e34b ddb422f40394ec2f4fc4a6a6cff49e43538121a9 39442641fc61c6ba7bde2cde5ceafd9276647f85 e3d62a9d4c6da7bd48f7aefce91db050bdc9a637 31d16e90108a741e92f897e550a8bf1e26392d2d 1d1d0aa01b944e8f9dad96b7f13bce2a581ba758 a87ac4d36b86aa826afb990769831bfcf08ed3fa cd0f2b754dc3cdf1e4f1647dc1cd91d891ce896c 1ac727982213bf676beefe9340d6822193dbb700 5c6190d012785222ba1eebee734aa34f79c5bf27 fe79babc9867a4cdf81a5b83019b9e1bbc86f9ae b894fc7acf7dee7f7ec8c5c280e7a6be41133328 ca6abca 55a6f9e b894fc7 9bcf5f0

It applies cleanly and it's legitimate IMHO because:

  • it fixes a few bugs

  • it is limited to the obj_bencher.{cc,h} and rados.cc files

  • a few cleanups that occurred in between bug fixes make it difficult to backport cleanly (when dealing with isolated commits)

  • although it's made of a dozen commit the series is straightforward

    a417478 tools: fix race condition in seq/rand bench
    src/common/obj_bencher.cc | 80 ++++++++++++++++++++++++++++++++++++++++++++------------------------------------
    1 file changed, 44 insertions(+), 36 deletions(-)
    697cfc9 bufferlist: implement bufferlist::invalidate_crc()
    src/common/buffer.cc   |  9 +++++++++
    src/include/buffer.h   |  1 +
    src/test/bufferlist.cc | 40 ++++++++++++++++++++++++++++++++++++++++
    3 files changed, 50 insertions(+)
    8c3d381 tools: add --no-verify option to rados bench
    src/common/obj_bencher.cc | 67 ++++++++++++++++++++++++++++++++++++++++++-------------------------
    src/common/obj_bencher.h  |  6 +++---
    src/tools/rados/rados.cc  | 11 ++++++++++-
    3 files changed, 55 insertions(+), 29 deletions(-)
    ba2bf47 tools: Don't delete, recreate and re-fill buffers in rados bench.
    src/common/obj_bencher.cc | 32 +++++++++++++++++++-------------
    1 file changed, 19 insertions(+), 13 deletions(-)
    056ecdb common: fix ObjBencher::aio_bench signature
    src/common/obj_bencher.cc | 3 +--
    1 file changed, 1 insertion(+), 2 deletions(-)
    036744d common: rados bench data.finished = 0
    src/common/obj_bencher.cc | 3 ++-
    1 file changed, 2 insertions(+), 1 deletion(-)
    193fa52 common: remove unused maxObjectsToCreate
    src/common/obj_bencher.cc | 22 +++++++++-------------
    src/common/obj_bencher.h  |  4 ++--
    src/tools/rados/rados.cc  |  2 +-
    3 files changed, 12 insertions(+), 16 deletions(-)
    b128b78 obj_bencher: does not accumulate bandwidth that is zero
    src/common/obj_bencher.cc | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)
    f3d718a obj_bencher: aio_bench - rename op_size to object_size
    src/common/obj_bencher.cc | 3 +--
    1 file changed, 1 insertion(+), 2 deletions(-)
    bd2d94d obj_bencher: remove excess 'object_size = op_size'
    src/common/obj_bencher.cc | 2 --
    1 file changed, 2 deletions(-)
    e1797ca obj_bencher: remove 'trans_size' as obsolete
    src/common/obj_bencher.cc | 5 ++---
    src/common/obj_bencher.h  | 1 -
    2 files changed, 2 insertions(+), 4 deletions(-)
    13b3290 common: make rados bench return correctly errno.
    src/common/obj_bencher.cc | 8 ++++----
    1 file changed, 4 insertions(+), 4 deletions(-)
    2566198 obj_bencher: add IOPS metric calculation
    src/common/obj_bencher.cc | 28 ++++++++++++++++++++++++++++
    src/common/obj_bencher.h  |  3 +++
    2 files changed, 31 insertions(+)
    2188183 obj_bencher: cosmetic display fixes
    src/common/obj_bencher.cc | 30 ++++++++++++------------------
    1 file changed, 12 insertions(+), 18 deletions(-)
    455b7f8 obj_bencher: generalize vec_stddev function
    src/common/obj_bencher.cc | 11 ++++++-----
    1 file changed, 6 insertions(+), 5 deletions(-)
    8813436 obj_bencher: fix indents
    src/common/obj_bencher.cc | 116 ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
    1 file changed, 58 insertions(+), 58 deletions(-)
    

@Abhishekvrshny
Copy link
Author

@dachary This would put a bulk of commits in one backport PR. Is that fine? or shall there be separate backport PRs for each of them?

Also if we take all changes since hammer, how do we know if there isn't something which is NOT meant to be in hammer (and is probably a feature only for the next release)?

@ghost
Copy link

ghost commented Sep 7, 2015

Also if we take all changes since hammer, how do we know if there isn't something which is NOT meant to be in hammer (and is probably a feature only for the next release)?

Very good question :-) The only way is to review each commit individually, which I did. I guess we're lucky that this is a simple situation. It may have been a lot more hairy if there were new features.

@ghost
Copy link

ghost commented Sep 7, 2015

This would put a bulk of commits in one backport PR. Is that fine?

This is generally not ok and we do that very rarely. IMHO it's ok in this specific case because it's about a tool and not the core and because the series of change is quite simple.

or shall there be separate backport PRs for each of them?

Let's first see how it goes with tests. If tests are happy and noone objects this rather extensive backport, then we can worry about cross referencing this commit in the right places.

You're lucky : you got to deal with one of the most complicated cases of backport :-)

@Abhishekvrshny
Copy link
Author

@dachary :) . Now that the tests have passed, shall I make a new branch with an appropriate name and send a separate PR, or will this be fine?

@branch-predictor
Copy link
Contributor

I didn't expected that it'll be so complicated. Sorry, guys!
And thanks for the hard work.

@ghost
Copy link

ghost commented Sep 9, 2015

@branch-predictor it is a good experience (not just for @Abhishekvrshny ;-).

@Abhishekvrshny Abhishekvrshny changed the title Race condition in rados bench DNM : Race condition in rados bench Sep 22, 2015
@ghost
Copy link

ghost commented Sep 28, 2015

@Abhishekvrshny I think it's worth asking if #5428 should be backported (on the ceph-devel mailing list).

@ghost
Copy link

ghost commented Dec 1, 2015

For the record, the mail thread is at : http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/27302

@ghost
Copy link

ghost commented Dec 1, 2015

 abhishekvrshny: why we need #5428? #5812 relies on it?
 ah, right
 there's a change to rest_bench as well, but it got lost once rest_bench was eradicated
 abhishekvrshny: this: http://ceph.predictor.org.pl/diff-wip-12947.txt should help
 BranchPredictor: Can you raise a hammer specific PR for 5812 and I would close 5812 itself. I can then schedule an integration test with that.
 abhishekvrshny: well, I can...
 company will kill me for all those git clones. :P
 BranchPredictor: it looks like a non-trivial backport to me. I dont want to mess it up :P 
 abhishekvrshny: no worries, I'll do it.
 BranchPredictor: Thanks :)

@branch-predictor
Copy link
Contributor

125456 < BranchPredictor> abhishekvrshny: why we need #5428? #5812 relies on it?
125545 < abhishekvrshny> loicd: http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/27302
125606 < BranchPredictor> ah, right
125631 < BranchPredictor> there's a change to rest_bench as well, but it got lost once rest_bench was eradicated
125729 < BranchPredictor> abhishekvrshny: this: http://ceph.predictor.org.pl/diff-wip-12947.txt should help
130233 < abhishekvrshny> BranchPredictor: Can you raise a hammer specific PR for 5812 and I would close 5812 itself. I can then schedule an integration test with that.
130305 < BranchPredictor> abhishekvrshny: well, I can...
130423 < abhishekvrshny> BranchPredictor: it looks like a non-trivial backport to me. I dont want to mess it up :P
130444 < BranchPredictor> abhishekvrshny: no worries, I'll do it.
131010 < abhishekvrshny> BranchPredictor: Thanks :)

@ghost
Copy link

ghost commented Dec 3, 2015

replaced by #6791

@ghost ghost closed this Dec 3, 2015
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants