Benchmarking Runs Forever [Question] #618

shivam-maharshi · 2016-02-07T21:50:32Z

I ran YCSB with both parameters - operationcounts & maxexecutiontime. Now according to the documentation, the benchmarking should stop whichever occurs earlier. However my benchmarking never stops and I receive this on my prompt periodically - "Still waiting for thread Thread 1 to complete. Workload status: true". I see the code of the TerminatorThread which tries to Join the benchmarking threads after waiting for maxexecutiontime period. However if it is unable to join it in first go, then it keeps on retrying after every 2 second. Which is why my benchmarking runs forever.

I am creating a Step Benchmarking and hence I need to run one workload for strictly 10 minutes and then start another workload. Is there any way I can make sure the benchmarking stops withing say 10 minutes + 30 second (padded stopping time.) ? Any request that doesn't respond within this time can be safely considered a fail.

busbey · 2016-02-07T21:53:58Z

Which workload is this happening in? Which JVM are you using? Which OS?

The biggest problem I've had when trying to use time bound is the lag when waiting for workload d to handle its set up.

shivam-maharshi · 2016-02-07T22:01:17Z

Workload: Core Workload - I have customized this to benchmark Web Services but have not touched the part that deals with any threads (Worker, Client or Terminator).
Java: Java HotSpot 64 Bit Server VM - 1.8.0_72
OS: Mac OS X El Captain 10.11.3

busbey · 2016-02-07T22:12:15Z

Customized the code or customized the workload configuration file?

shivam-maharshi · 2016-02-07T22:17:36Z

Customized the CoreWorkload.java file since YCSB doesn't support benchmarking of Web Services and I needed many other runtime configurable parameters. I have that code uploaded on GitHub here: https://github.com/shivam-maharshi/YCSB4WebServices/blob/master/src/main/java/com/yahoo/ycsb/workloads/CoreWorkload.java. This is not a clean code and is only required for my specific benchmarking purpose.

busbey · 2016-02-07T22:53:58Z

Just to set expectations: it's going o be difficult to figure out if the issue is in YCSB or your changes (more so because it looks like you started with the base YCSB+some changes rather than forking the repo).

Could you explain what the fundamental issue is with i.e. writing a datastore binding for your web service? That would make it easier to isolate issues in the core framework from the changes specific to your use case.

I'll try to get a sense of the code this evening.

shivam-maharshi · 2016-02-07T23:12:59Z

Yes you are right. I had a conversation with Andy Kruth and I've already forked YCSB and started to create a Rest Client binding module. That work is under progress.

The reason I started to not write a binding previously was because writing it cleanly was gonna take a little more time, which I unfortunately did not have. With the current framework I couldn't have done these, which I required.

Configurable Zipf's constant value.
Separate URL traces for Reads/Writes from a file.
A non Scrambled Zipf's generator for Field length chooser.

Hence I decided to go with this approach, even when I knew it was wrong. Long story short, if you can get sense of the code, it would be great. However I feel that this is not introduced due to changes in CoreWorkload, since it does not manipulate threads in any way. I am hoping to create a pull request for the REST client I am working on this week. If we run into the same issue with that module as well, then it would be easier to deal with it.

Thanks for your response!

kruthar · 2016-02-08T22:14:04Z

@shivam-maharshi - Can you attach the command you are running, the workload configuration file you are using and also the output? These might help figure out context of the issue you are seeing.

kruthar · 2016-02-23T16:08:28Z

@shivam-maharshi - did you solve your issue?

shivam-maharshi · 2016-02-23T22:46:50Z

@kruthar - I resolved this issue for myself by setting a maximum timeout for individual requests, in CRUD operations, in the client-binding. (Implementation was simply running a timer thread in parallel to stop the operation if it exceeded the given time limit.)

Coming back to the main point. YCSB benchmarking can run forever even if "maxexceutiontime" property is mentioned in the workload file. This is because Terminator Thread only tries to finish the benchmarking once "maxexceutiontime" has passed but does not guarantee it. Since it only joins the worker threads (benchmarking threads) to wind up the benchmarking and not interrupt/kill them. Hence for the scenario where a worker thread gets stuck - waiting to receive response/next bytes from the server side and never gets response from it, the terminator thread will indefinitely try to join the worker thread but will never be successful. This can happen when the DB Server does not respond to a request for long time.

Why hasn't it been reported so far?
The reason that this hasn't been reported so far is because most of the clients do have a connection or read timeout configured in the client-binding. Hence if the worker thread gets stuck, they will automatically be failed once those timeouts are reached. Since most (in-fact all) clients have timeouts I feel that the YCSB handling for "maxexecutiontime" is fine the way it is currently.

Once you've read this please let me know. I will close this item since it is not an issue, it is a design decision taken by YCSB.

busbey · 2016-02-23T23:12:49Z

should we be sending an interrupt to the client threads?

shivam-maharshi · 2016-02-23T23:28:26Z

That can be done but it can have an implication on the benchmarking results if not handled properly. For example if 10 client threads have just sent out some requests and "maxexecutiontime" has been reached. Now if we decide to interrupt those client threads, then the question arises that how should that be handled? Should the operations for those interrupted client threads be considered as fail or pass? IMO it would make sense to not count those operations in the benchmarking results at all.

busbey · 2016-02-25T17:13:23Z

Okay, I'm fine closing this as-is. Would probably be worth mentioning the important of driver timeouts when we get around to making a datastore binding contribution guide.

busbey · 2016-04-10T18:46:58Z

I just ran into this with the accumulo client while testing 0.8.0-RC3 (#678). Due to a cluster misconfiguration I filled HDFS during a load phase. Since I had client-side buffering on for the accumulo client, the terminator thread waited for shutdown to complete, which waited for the buffered writes to flush, which happily waited for Accumulo to recover.

+ ycsb-accumulo-binding-0.8.0-RC3/bin/ycsb load accumulo -P ycsb-accumulo-binding-0.8.0-RC3/workloads/workloade -p table=ycsb_workloade -cp /etc/accumulo/conf -p accumulo.columnFamily=family -p accumulo.instanceName=accumulo -p accumulo.zooKeepers=YYYYY -p accumulo.username=ycsb -p accumulo.password=XXXXXX -s -p maxexecutiontime=1200 -threads 30 -jvm-args=-Xmx8192m -p recordcount=2147483647 -p insertstart=0 -p insertcount=429496729 -p exportfile=/root/ycsb-load_workloade-accumulo-test-1.gce.cloudera.com-measurements.json -p exporter=com.yahoo.ycsb.measurements.exporter.JSONArrayMeasurementsExporter


real    469m43.223s
user    8m17.917s
sys     5m9.610s

Once I saw the cluste's status I fixed things, which then allowed Accumulo to recover, which then allowed the final writes to flush, which finally allowed the thread to complete and the client to exit. Bit over my target limit of 20 minutes though. ;)

2victoria · 2017-12-16T18:05:56Z

i meet the same problem when i test cassandra using ycsb,how to solve?please

amgads · 2023-02-16T01:45:53Z

same problem with SQL mysql/mariadb - YCSB 0.17.0

shivam-maharshi changed the title ~~Benchmarking Never Ending [Question]~~ Benchmarking Runs Forever Question] Feb 7, 2016

shivam-maharshi changed the title ~~Benchmarking Runs Forever Question]~~ Benchmarking Runs Forever [Question] Feb 7, 2016

busbey mentioned this issue Apr 10, 2016

Release version 0.8.0 #678

Closed

risdenk added question improvement labels Jan 19, 2017

busbey closed this as completed Jul 7, 2018

busbey mentioned this issue Dec 2, 2019

Bring back the maxtime option #1377

Closed

jiaweixiao mentioned this issue Jan 1, 2022

Failed to test with YCSB QuickServerLab/QuickCached#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking Runs Forever [Question] #618

Benchmarking Runs Forever [Question] #618

shivam-maharshi commented Feb 7, 2016

busbey commented Feb 7, 2016

shivam-maharshi commented Feb 7, 2016

busbey commented Feb 7, 2016

shivam-maharshi commented Feb 7, 2016

busbey commented Feb 7, 2016

shivam-maharshi commented Feb 7, 2016

kruthar commented Feb 8, 2016

kruthar commented Feb 23, 2016

shivam-maharshi commented Feb 23, 2016

busbey commented Feb 23, 2016

shivam-maharshi commented Feb 23, 2016

busbey commented Feb 25, 2016

busbey commented Apr 10, 2016

2victoria commented Dec 16, 2017

amgads commented Feb 16, 2023

Benchmarking Runs Forever [Question] #618

Benchmarking Runs Forever [Question] #618

Comments

shivam-maharshi commented Feb 7, 2016

busbey commented Feb 7, 2016

shivam-maharshi commented Feb 7, 2016

busbey commented Feb 7, 2016

shivam-maharshi commented Feb 7, 2016

busbey commented Feb 7, 2016

shivam-maharshi commented Feb 7, 2016

kruthar commented Feb 8, 2016

kruthar commented Feb 23, 2016

shivam-maharshi commented Feb 23, 2016

busbey commented Feb 23, 2016

shivam-maharshi commented Feb 23, 2016

busbey commented Feb 25, 2016

busbey commented Apr 10, 2016

2victoria commented Dec 16, 2017

amgads commented Feb 16, 2023