-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13131] [SQL] Use best and average time in benchmark #11018
Conversation
rang/filter/aggregate: Avg Time(ms) Avg Rate(M/s) Relative Rate | ||
------------------------------------------------------------------------------- | ||
rang/filter/aggregate codegen=false 12509.22 41.91 1.00 X | ||
rang/filter/aggregate codegen=true 846.38 619.45 14.78 X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query is changed from .count()
to .groupBy().sum().collect()
, it's 20X for previous query.
Test build #50541 has finished for PR 11018 at commit
|
@davies I think this depends on the error distribution. Let's assume that the measured running time of algorithm A is described by an additive model: a + W, where a is a constant indicating the ideal running time and w is a positive random variable describing system noise/overhead. We assume the same error distribution for algorithm B: b + W. Basically, we want to test which one is smaller (faster), a or b. One common way is to compare the sample mean values of If we agree on this model, which method is better majorly depends on the variance of the sample mean and sample min (first order statistic). We know that the variance of sample mean is of order Certainly we can do many runs and draw the empirical error distribution out and then tell which one is better for this case. But without good knowledge of the error distribution, using sample mean is definitely a safe bet because we know the variance is of order |
@mengxr Thanks for the details, that make sense. Ran a few tests, here is the distribution of W (removed the outliners, microseconds) Because the number we care most is Ran a few tests on this particular case, the relative rates of first benchmark (range/filter/ are listed here (this is the number we care about most): It seems that best time or medium time are much better than mean time, the variance of best time (0.21) is little better than medium time (0.33), the variance of mean time is 2.4. I think we should go with best time or medium time. cc @rxin @nongli |
@davies what is the x axis? runs of the benchmark? Do we know what the variance is per run? (error bars on your graph) |
@nongli The x axis is per run (runs of benchmark). The first chart is histogram of (run time - best time, microseconds) for each run within a benchmark. |
@davies Thanks for plotting the histogram! The mean estimator is apparently not good here due to outliers. Median seems more stable and I don't think we have enough numbers to confidently tell whether best time or median is better. Given the current data scale, it seems that |
Sounds good ,will go with median |
Test build #50637 has finished for PR 11018 at commit
|
@@ -62,13 +63,15 @@ private[spark] class Benchmark( | |||
val firstRate = results.head.avgRate | |||
// The results are going to be processor specific so it is useful to include that. | |||
println(Benchmark.getProcessorName()) | |||
printf("%-30s %16s %16s %14s\n", name + ":", "Avg Time(ms)", "Avg Rate(M/s)", "Relative Rate") | |||
println("-------------------------------------------------------------------------------") | |||
printf("%-30s %16s %12s %13s %10s\n", name + ":", "median Time(ms)", "Rate(M/s)", "Per Row(ns)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you capitalize the M -> "Median Time"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update this when merging.
LGTM |
@mengxr Here are four runs for BroadcastHashJoin:
With median time, the improvements are 2.31X, 3.34X, 3.38X, 2.01X With best time, they will be 2.12X, 1.97X, 2.0X, 2.0X, they are much stable than those using median time. @mengxr @nongli So I still think that we should use best time here. Also, keep only one digits after dot. |
LGTM |
Test build #50680 has finished for PR 11018 at commit
|
Test build #2511 has finished for PR 11018 at commit
|
Merging this into master. |
Test build #50697 has finished for PR 11018 at commit
|
benchmark.name, | ||
"%5.0f / %4.0f" format (result.bestMs, result.avgMs), | ||
"%10.1f" format result.bestRate, | ||
"%6.1f" format (1000 / result.bestRate), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this "Per Row" means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nano seconds per row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be per iteration
? we may execute multiple rows in one iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be, but once we use batch mode, per iteration
will be worse
Best time is stabler than average time, also added a column for nano seconds per row (which could be used to estimate contributions of each components in a query).
Having best time and average time together for more information (we can see kind of variance).
rate, time per row and relative are all calculated using best time.
The result looks like this: