Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] trim_left investigation #67

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

GeorgeHahn
Copy link
Contributor

There's something wrong with the 'trim left' function in the agent's sketch implementation, but I was never quite able to pin it down. The commentary in this codebase got me thinking about it again.

Here it is. This shouldn't happen:

insert: Value::NFloats(65535 * 5, 0.0),
expected: "0:65535 0:65535 0:65535 0:65535 0:65535",
max_bins: 3,

Overflowing while at the bin limit causes the bin limit to be ignored.

```
DDSketch/insert-single/1
                        time:   [195.24 ns 196.81 ns 198.42 ns]
                        thrpt:  [5.0399 Melem/s 5.0811 Melem/s 5.1218 Melem/s]
                 change:
                        time:   [-12.248% -11.332% -10.298%] (p = 0.00 < 0.05)
                        thrpt:  [+11.480% +12.780% +13.957%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
DDSketch/insert-single/10
                        time:   [926.41 ns 932.76 ns 938.95 ns]
                        thrpt:  [10.650 Melem/s 10.721 Melem/s 10.794 Melem/s]
                 change:
                        time:   [-46.653% -46.185% -45.738%] (p = 0.00 < 0.05)
                        thrpt:  [+84.292% +85.823% +87.452%]
                        Performance has improved.
DDSketch/insert-single/100
                        time:   [9.4161 µs 9.4404 µs 9.4644 µs]
                        thrpt:  [10.566 Melem/s 10.593 Melem/s 10.620 Melem/s]
                 change:
                        time:   [-62.182% -61.795% -61.405%] (p = 0.00 < 0.05)
                        thrpt:  [+159.10% +161.75% +164.42%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
DDSketch/insert-single/1000
                        time:   [101.41 µs 101.60 µs 101.79 µs]
                        thrpt:  [9.8241 Melem/s 9.8427 Melem/s 9.8612 Melem/s]
                 change:
                        time:   [-37.791% -37.406% -37.021%] (p = 0.00 < 0.05)
                        thrpt:  [+58.783% +59.759% +60.747%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
DDSketch/insert-single/10000
                        time:   [710.74 µs 714.54 µs 720.15 µs]
                        thrpt:  [13.886 Melem/s 13.995 Melem/s 14.070 Melem/s]
                 change:
                        time:   [-13.108% -12.535% -11.676%] (p = 0.00 < 0.05)
                        thrpt:  [+13.220% +14.332% +15.086%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

DDSketch/insert-many/1  time:   [218.42 ns 220.55 ns 223.97 ns]
                        thrpt:  [4.4648 Melem/s 4.5340 Melem/s 4.5782 Melem/s]
                 change:
                        time:   [-2.4500% -1.5756% -0.5105%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5132% +1.6008% +2.5116%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
DDSketch/insert-many/10 time:   [718.25 ns 722.67 ns 727.43 ns]
                        thrpt:  [13.747 Melem/s 13.838 Melem/s 13.923 Melem/s]
                 change:
                        time:   [-1.9143% -1.0697% -0.2484%] (p = 0.01 < 0.05)
                        thrpt:  [+0.2491% +1.0813% +1.9517%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low mild
  1 (1.00%) high mild
DDSketch/insert-many/100
                        time:   [2.6682 µs 2.6761 µs 2.6842 µs]
                        thrpt:  [37.255 Melem/s 37.367 Melem/s 37.479 Melem/s]
                 change:
                        time:   [+0.5251% +1.0608% +1.5666%] (p = 0.00 < 0.05)
                        thrpt:  [-1.5424% -1.0496% -0.5223%]
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe
DDSketch/insert-many/1000
                        time:   [15.579 µs 15.604 µs 15.628 µs]
                        thrpt:  [63.988 Melem/s 64.088 Melem/s 64.187 Melem/s]
                 change:
                        time:   [+3.4126% +3.7884% +4.1394%] (p = 0.00 < 0.05)
                        thrpt:  [-3.9749% -3.6501% -3.3000%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
DDSketch/insert-many/10000
                        time:   [135.18 µs 135.51 µs 135.86 µs]
                        thrpt:  [73.604 Melem/s 73.797 Melem/s 73.975 Melem/s]
                 change:
                        time:   [+3.3188% +3.8789% +4.4002%] (p = 0.00 < 0.05)
                        thrpt:  [-4.2147% -3.7340% -3.2122%]
                        Performance has regressed.
```
@pr-commenter
Copy link

pr-commenter bot commented May 21, 2024

Regression Detector (Saluki)

Regression Detector Results

Run ID: 65964b84-ff45-4b5b-9413-54fce30909cf
Baseline: e44858f
Comparison: 76fedac

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI
uds_dogstatsd_to_api ingress throughput -0.00 [-0.20, +0.20]
distribution_metrics memory utilization -1.19 [-1.44, -0.93]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@pr-commenter
Copy link

pr-commenter bot commented May 21, 2024

Regression Detector (DogStatsD)

Regression Detector Results

Run ID: f1c70947-f053-4736-ae75-38d08fb37b0f
Baseline: 7.52.0
Comparison: 7.52.1

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI
uds_dogstatsd_to_api ingress throughput -0.00 [-0.20, +0.20]
distribution_metrics memory utilization -0.06 [-0.20, +0.08]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

Base automatically changed from georgehahn/faster-single-key-slow-path to main May 23, 2024 13:42
@tobz tobz added area/core Core functionality, event model, etc. type/bug Bug fixes. effort/complex Involves complicated changes that require guidance and careful review. labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Core functionality, event model, etc. effort/complex Involves complicated changes that require guidance and careful review. type/bug Bug fixes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants