-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] trim_left
investigation
#67
base: main
Are you sure you want to change the base?
Conversation
``` DDSketch/insert-single/1 time: [195.24 ns 196.81 ns 198.42 ns] thrpt: [5.0399 Melem/s 5.0811 Melem/s 5.1218 Melem/s] change: time: [-12.248% -11.332% -10.298%] (p = 0.00 < 0.05) thrpt: [+11.480% +12.780% +13.957%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) low mild 1 (1.00%) high mild DDSketch/insert-single/10 time: [926.41 ns 932.76 ns 938.95 ns] thrpt: [10.650 Melem/s 10.721 Melem/s 10.794 Melem/s] change: time: [-46.653% -46.185% -45.738%] (p = 0.00 < 0.05) thrpt: [+84.292% +85.823% +87.452%] Performance has improved. DDSketch/insert-single/100 time: [9.4161 µs 9.4404 µs 9.4644 µs] thrpt: [10.566 Melem/s 10.593 Melem/s 10.620 Melem/s] change: time: [-62.182% -61.795% -61.405%] (p = 0.00 < 0.05) thrpt: [+159.10% +161.75% +164.42%] Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild DDSketch/insert-single/1000 time: [101.41 µs 101.60 µs 101.79 µs] thrpt: [9.8241 Melem/s 9.8427 Melem/s 9.8612 Melem/s] change: time: [-37.791% -37.406% -37.021%] (p = 0.00 < 0.05) thrpt: [+58.783% +59.759% +60.747%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild DDSketch/insert-single/10000 time: [710.74 µs 714.54 µs 720.15 µs] thrpt: [13.886 Melem/s 13.995 Melem/s 14.070 Melem/s] change: time: [-13.108% -12.535% -11.676%] (p = 0.00 < 0.05) thrpt: [+13.220% +14.332% +15.086%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe DDSketch/insert-many/1 time: [218.42 ns 220.55 ns 223.97 ns] thrpt: [4.4648 Melem/s 4.5340 Melem/s 4.5782 Melem/s] change: time: [-2.4500% -1.5756% -0.5105%] (p = 0.00 < 0.05) thrpt: [+0.5132% +1.6008% +2.5116%] Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe DDSketch/insert-many/10 time: [718.25 ns 722.67 ns 727.43 ns] thrpt: [13.747 Melem/s 13.838 Melem/s 13.923 Melem/s] change: time: [-1.9143% -1.0697% -0.2484%] (p = 0.01 < 0.05) thrpt: [+0.2491% +1.0813% +1.9517%] Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) low mild 1 (1.00%) high mild DDSketch/insert-many/100 time: [2.6682 µs 2.6761 µs 2.6842 µs] thrpt: [37.255 Melem/s 37.367 Melem/s 37.479 Melem/s] change: time: [+0.5251% +1.0608% +1.5666%] (p = 0.00 < 0.05) thrpt: [-1.5424% -1.0496% -0.5223%] Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe DDSketch/insert-many/1000 time: [15.579 µs 15.604 µs 15.628 µs] thrpt: [63.988 Melem/s 64.088 Melem/s 64.187 Melem/s] change: time: [+3.4126% +3.7884% +4.1394%] (p = 0.00 < 0.05) thrpt: [-3.9749% -3.6501% -3.3000%] Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild DDSketch/insert-many/10000 time: [135.18 µs 135.51 µs 135.86 µs] thrpt: [73.604 Melem/s 73.797 Melem/s 73.975 Melem/s] change: time: [+3.3188% +3.8789% +4.4002%] (p = 0.00 < 0.05) thrpt: [-4.2147% -3.7340% -3.2122%] Performance has regressed. ```
Regression Detector (Saluki)Regression Detector ResultsRun ID: 65964b84-ff45-4b5b-9413-54fce30909cf Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | uds_dogstatsd_to_api | ingress throughput | -0.00 | [-0.20, +0.20] |
➖ | distribution_metrics | memory utilization | -1.19 | [-1.44, -0.93] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector (DogStatsD)Regression Detector ResultsRun ID: f1c70947-f053-4736-ae75-38d08fb37b0f Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | uds_dogstatsd_to_api | ingress throughput | -0.00 | [-0.20, +0.20] |
➖ | distribution_metrics | memory utilization | -0.06 | [-0.20, +0.08] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
There's something wrong with the 'trim left' function in the agent's sketch implementation, but I was never quite able to pin it down. The commentary in this codebase got me thinking about it again.
Here it is. This shouldn't happen:
Overflowing while at the bin limit causes the bin limit to be ignored.