-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a number ask on detiles of workloads #548
Comments
These settings will affect different databases differently.
Writing this I realized that only some of these properties are listed on the core properties page: https://github.com/brianfrankcooper/YCSB/wiki/Core-Properties. I will open an issue to rectify that. |
For reference #550 has been created to update documentation on core properties. |
if we have 1000 record and if we want have 200 read,100 update,50 insert,250 scan |
Roughly, yes, assuming you've set the |
on readallfields=true , how to specify one of fileds for read? |
You cannot specify which field to read or update. The field that gets read on each read operation, and also the field that gets updated on each update operation are randomly chosen for each operation. |
@kruthar ok |
I think you are asking if you can see the operation latencies in buckets? Yes, you can. Take a look at Core Properties for more details on these two properties https://github.com/brianfrankcooper/YCSB/wiki/Core-Properties.
The default hdrhistogram measurementtype also has useful information such as latency min, max, avg and percentiles. |
when run YCSB on single machine can we define number of replications and get Their impact? |
Do you mean replications of data in your database? or replicating the workload via threads? |
my meaning is simulation replications of data on YCSB , unless when run a workloads all therads do it? in "insertorder" unless all records are not the same? So what difference how is the reading? |
YCSB doesn't directly deal with replication of data. How your database will replicate data depends on how you configure the database. Some YCSB clients do come with write consistency settings which determine how replicated data has to be for an insert or update operation to be considered successful. YCSB threads are just a way to increase load on the database by spinning up multiple YCSB client threads that act against the database at the same time.
|
meaning if we want to use replicate should setting up replicate cosistency on our database like cassandra? when we use insertorder=hashed , first all key sorted next them inserted ? |
other question , in most of benchmark that talk about number of core , their Purpose is only real core or virtual core too? |
Yes, you should look into how to setup the type of replication you want on the database you are using. Then check the respective YCSB binding README for consistency settings.
They are not sorted perse. When you do your load, YCSB starts at 0 and counts up to whatever number of records you are inserting. Each value (0, 1, 2...) is then hashed and appended to 'user'. The hashed values will not be sorted.
This I can't say for sure, it depends on who is publishing the information and what is written there. |
meaning that when threads want do insert , they select records randomly? in YCSB if we want leave example for 50% read and 50% insert , what is use case for this? and also 50% insert ,50 readmodifywrite proportion ? |
with change values on workloadd like this , can i use it on YCSB with insert , read (50%,50%)? should i use license for it? |
No. If you are doing
workloada is 50% read, 50% update which is a common starting point for people. As for specific use cases, the predefined workloads are just starting points, you really need to identify what type of load use case you want to simulate then design a workload around that. The sample workload files have a short description of a possible use case for each one.
It looks like this is just a copy of workloadd with different percentage values? If so I don't see any issue with it. |
in Meteorology, data is heavy , right? |
You'll really have to do your own research here to see what data use cases look like. |
ok,if i use cpu that have 4 thread , can i define thread=100 on load command ? i do load and run this: but it got this messages :
run:
and load:
|
Hi |
Yes, operation count can be larger or smaller than record count.
|
if operation count can be larger than record count how run it? it have less operation. |
@Hadi14 - operationcount and recordcount are unrelated. You can have operationcount be larger than recordcount, you would run that the same way as you normally would. YCSB uses number generators to pick a new key value to perform each operation on. This means that the same key value may be chosen more than once. So, if your operationcount is higher than your recordcount than certain record keys will be operated on more than once. There should be no issues with having an operationcount higher than the record count. |
ok, What is the benefit of set target while we always want throughput? |
I'm sorry, I don't understand the question. |
on YCSB we often want do benchmark for mesurement throughput , |
Ah. So, yes you may be trying to measure throughput under certain workloads, and so throttling throughput seems counter productive in your case. But YCSB can also conceivable be used as a constant load generator in which you would set your target ops/sec to send to your database. This could be useful to test other things like how well your database handles prolonged constant load, or maybe failover scenarios under load. |
than you 2- what is 95,99thPercentileLatency ?i dont understand theme. |
@kruthar other Question , why is not the same number of Operations on read and update as Exactly? |
YCSB uses a discrete generator which takes in the different operation proportions you specify and probabilistically chooses which operation should happen next depending on the relative proportions. What this means is that with properties such as:
each operation has a 50-50 chance of being a read or an update. YCSB effectively flips a coin to decide what each operation should be in this case. You are not guaranteed that you will get a perfect 50-50 operation split. It will be close but not exact. |
if we set a parameter for example recordcount on command line , is priority with command line? or parameter on work load? is different that which be on last or first? |
Yes, command line takes priority. Issues like this it doesn't hurt to just go ahead and try yourself? |
excusme i dont understand your question completely but i did benchmark with 1 thread and 2000 operation count : |
benchmarks with 2 thread have RunTime larger than 1 thread , is i natural? |
Yes, that seems possible. It all depends on your workload configuration and the performance of your database. When you add a second thread you are doubling the load on your database. Each thread is going to perform the number of operations you specify in |
ok , mean that if we set operationcount=1000 and -threads 2 any of thread run 1000 operation separately? so why in follow result with 4000 operation , total operation is the same of 4000? unless this result be per thread.
|
Are you sure you changed the operation count to 1000? Earlier you said you were working with 2000 operationcount:
|
the operation count is the total for the given client run. the total is split up amongst the number of threads you specify. So with thread =2 and count = 2k, a total of 2k operations are performed. this still might take longer than with 1 thread and 2k operations because there is overhead to running multiple threads (in YCSB and possibly in the data store driver) and 2 thousand is a small enough number that you may not over come that overhead in throughput savings. |
@kruthar my mean is 1000 for example. |
@busbey explained threaded functionality. At this point I'm not sure what the question is? |
sorry ,when we load and run workload for many time , Is the previous loaded data will be erased on previous run? |
No, YCSB doesn't do anything to clean up data already in a datastore, whether from prior YCSB runs or elsewhere. |
i got this error , is it's cause low to memory?
excume. |
You missed the error text. |
If you're having trouble getting Cassandra to run, you should seek help from the Cassandra community. Their user mailing list details are here: |
@busbey ok thankyou. |
Hi , on workload F i got this output,
|
readmodifywrite is actual just a measure of a read operation and a write operation on the same key. All read and update operations report their performance metrics, readmodifywrite reports its total performance as well. This is why you have readmodifywrite, read, and update metrics. We have already talked about cleanup on this thread, each thread of the workload runs a cleanup operation at the end.
This is another case of probabilistic operation selection. We talked about this a few posts up. Please check my answer to your question about why the number of read and update operations are not exactly the same with 50-50 proportions. |
But the workloada was not operation on the same key? on this test we have 2000 read but 989 READ-MODIFY-WRITE and 989 UPDATE (989+989=1978) so 22 other operation? |
As I said, the readmodifywrite operations are actually double counts of the read and update operations that are contained in the readmodifywrite. This means that 989 readmodifywrite operations consist of 989 read operations and 989 update operations. As you can see there are 989 update operations recorded. There are 2000 read operations recorded, this is because there are 989 read operations from readmodifywrite, and 1,011 plain read operations. This is because workloadf is 50% readmodifywrite, and 50% read. 1,011 is very close to 989, but not exact. This is to be expected. So, 989 + 1,011 = 2000 total operations which should be your |
excusme mean on this workload(read/readmodifywrite 50%/50%) there is two step 1-simple Read (50%) and 2-readmodifywrite (50%).
correct?
|
The ordering of the print outs is arbitrary. YCSB performs all the operations in a random order, so there really is no notion of 'step1 and step2'. Each readmodifywrite operation counts one for But as you pointed out there are actually: |
ok understand thankyou :). |
closing out since it seems like we've covered everything relevant to what workload parameters mean. |
@kruthar , considering the ordering of the prints is arbitrary, is getting cleanup before the operations valid, or is it a sign there's something wrong? |
Hi All
excusme i have a number of ask again :(
1-on workloads we have recordcount argument , what is different between it and operationcoun ? seem them are identical.
2- if we set readallfields=false and writeallfields=false what is going to happen?
3-what is fieldlengthdistribution, scanlengthdistribution,hotspotdatafraction arguments?
The text was updated successfully, but these errors were encountered: