[WIP] `trim_left` investigation #67

GeorgeHahn · 2024-05-21T02:17:20Z

There's something wrong with the 'trim left' function in the agent's sketch implementation, but I was never quite able to pin it down. The commentary in this codebase got me thinking about it again.

Here it is. This shouldn't happen:

insert: Value::NFloats(65535 * 5, 0.0),
expected: "0:65535 0:65535 0:65535 0:65535 0:65535",
max_bins: 3,

Overflowing while at the bin limit causes the bin limit to be ignored.

``` DDSketch/insert-single/1 time: [195.24 ns 196.81 ns 198.42 ns] thrpt: [5.0399 Melem/s 5.0811 Melem/s 5.1218 Melem/s] change: time: [-12.248% -11.332% -10.298%] (p = 0.00 < 0.05) thrpt: [+11.480% +12.780% +13.957%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) low mild 1 (1.00%) high mild DDSketch/insert-single/10 time: [926.41 ns 932.76 ns 938.95 ns] thrpt: [10.650 Melem/s 10.721 Melem/s 10.794 Melem/s] change: time: [-46.653% -46.185% -45.738%] (p = 0.00 < 0.05) thrpt: [+84.292% +85.823% +87.452%] Performance has improved. DDSketch/insert-single/100 time: [9.4161 µs 9.4404 µs 9.4644 µs] thrpt: [10.566 Melem/s 10.593 Melem/s 10.620 Melem/s] change: time: [-62.182% -61.795% -61.405%] (p = 0.00 < 0.05) thrpt: [+159.10% +161.75% +164.42%] Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild DDSketch/insert-single/1000 time: [101.41 µs 101.60 µs 101.79 µs] thrpt: [9.8241 Melem/s 9.8427 Melem/s 9.8612 Melem/s] change: time: [-37.791% -37.406% -37.021%] (p = 0.00 < 0.05) thrpt: [+58.783% +59.759% +60.747%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild DDSketch/insert-single/10000 time: [710.74 µs 714.54 µs 720.15 µs] thrpt: [13.886 Melem/s 13.995 Melem/s 14.070 Melem/s] change: time: [-13.108% -12.535% -11.676%] (p = 0.00 < 0.05) thrpt: [+13.220% +14.332% +15.086%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe DDSketch/insert-many/1 time: [218.42 ns 220.55 ns 223.97 ns] thrpt: [4.4648 Melem/s 4.5340 Melem/s 4.5782 Melem/s] change: time: [-2.4500% -1.5756% -0.5105%] (p = 0.00 < 0.05) thrpt: [+0.5132% +1.6008% +2.5116%] Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe DDSketch/insert-many/10 time: [718.25 ns 722.67 ns 727.43 ns] thrpt: [13.747 Melem/s 13.838 Melem/s 13.923 Melem/s] change: time: [-1.9143% -1.0697% -0.2484%] (p = 0.01 < 0.05) thrpt: [+0.2491% +1.0813% +1.9517%] Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) low mild 1 (1.00%) high mild DDSketch/insert-many/100 time: [2.6682 µs 2.6761 µs 2.6842 µs] thrpt: [37.255 Melem/s 37.367 Melem/s 37.479 Melem/s] change: time: [+0.5251% +1.0608% +1.5666%] (p = 0.00 < 0.05) thrpt: [-1.5424% -1.0496% -0.5223%] Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe DDSketch/insert-many/1000 time: [15.579 µs 15.604 µs 15.628 µs] thrpt: [63.988 Melem/s 64.088 Melem/s 64.187 Melem/s] change: time: [+3.4126% +3.7884% +4.1394%] (p = 0.00 < 0.05) thrpt: [-3.9749% -3.6501% -3.3000%] Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild DDSketch/insert-many/10000 time: [135.18 µs 135.51 µs 135.86 µs] thrpt: [73.604 Melem/s 73.797 Melem/s 73.975 Melem/s] change: time: [+3.3188% +3.8789% +4.4002%] (p = 0.00 < 0.05) thrpt: [-4.2147% -3.7340% -3.2122%] Performance has regressed. ```

pr-commenter · 2024-05-21T02:36:41Z

Regression Detector (Saluki)

Regression Detector Results

Run ID: 65964b84-ff45-4b5b-9413-54fce30909cf
Baseline: e44858f
Comparison: 76fedac

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	uds_dogstatsd_to_api	ingress throughput	-0.00	[-0.20, +0.20]
➖	distribution_metrics	memory utilization	-1.19	[-1.44, -0.93]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

pr-commenter · 2024-05-21T02:37:08Z

Regression Detector (DogStatsD)

Regression Detector Results

Run ID: f1c70947-f053-4736-ae75-38d08fb37b0f
Baseline: 7.52.0
Comparison: 7.52.1

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	uds_dogstatsd_to_api	ingress throughput	-0.00	[-0.20, +0.20]
➖	distribution_metrics	memory utilization	-0.06	[-0.20, +0.08]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

GeorgeHahn added 8 commits May 20, 2024 19:29

Add large batched points allocation test

4289572

Add insertion tests

1ba598e

Add an even faster slow path for single point insert

fe75834

clarify comment & add test case

58af76c

clippies in tests

c96e0af

Create a test showing the trimLeft overflow bug

dcb706a

Add a few more test cases

76fedac

Base automatically changed from georgehahn/faster-single-key-slow-path to main May 23, 2024 13:42

tobz mentioned this pull request Jun 7, 2024

conversion to sketch with too many bins vectordotdev/vector#20619

Open

tobz added area/core Core functionality, event model, etc. type/bug Bug fixes. effort/complex Involves complicated changes that require guidance and careful review. labels Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] `trim_left` investigation #67

[WIP] `trim_left` investigation #67

GeorgeHahn commented May 21, 2024

pr-commenter bot commented May 21, 2024

Fine details of change detection per experiment

Explanation

pr-commenter bot commented May 21, 2024

Fine details of change detection per experiment

Explanation

[WIP] trim_left investigation #67

Are you sure you want to change the base?

[WIP] trim_left investigation #67

Conversation

GeorgeHahn commented May 21, 2024

pr-commenter bot commented May 21, 2024

Regression Detector (Saluki)

Regression Detector Results

No significant changes in experiment optimization goals

Fine details of change detection per experiment

Explanation

pr-commenter bot commented May 21, 2024

Regression Detector (DogStatsD)

Regression Detector Results

No significant changes in experiment optimization goals

Fine details of change detection per experiment

Explanation

[WIP] `trim_left` investigation #67

[WIP] `trim_left` investigation #67