feat: `ResourceExhausted` for memory limit in `AggregateStream` #4405

crepererum · 2022-11-28T16:53:39Z

Which issue does this PR close?

Closes #3940.

Rationale for this change

Ensure that users don't run out of memory while performing group-by operations. This is esp. important for servers or multi-tenant systems.

What changes are included in this PR?

Same as #4371 and #4202 but for AggregateStream (used when there are no group keys).

Are these changes tested?

test_oom extended. Perf results:

❯ cargo bench -p datafusion --bench aggregate_query_sql -- --baseline issue3940e-pre
    Finished bench [optimized] target(s) in 0.09s
     Running benches/aggregate_query_sql.rs (target/release/deps/aggregate_query_sql-6efc67fad0362d05)
aggregate_query_no_group_by 15 12
                        time:   [690.60 µs 691.61 µs 692.75 µs]
                        change: [+0.2423% +0.7005% +1.1322%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

aggregate_query_no_group_by_min_max_f64
                        time:   [628.35 µs 630.25 µs 632.41 µs]
                        change: [-0.0124% +0.5867% +1.1382%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  5 (5.00%) high severe

aggregate_query_no_group_by_count_distinct_wide
                        time:   [2.5104 ms 2.5292 ms 2.5484 ms]
                        change: [-0.0360% +0.9941% +2.1057%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking aggregate_query_no_group_by_count_distinct_narrow: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.7s, enable flat sampling, or reduce sample count to 50.
aggregate_query_no_group_by_count_distinct_narrow
                        time:   [1.7087 ms 1.7170 ms 1.7250 ms]
                        change: [+0.7899% +1.9667% +3.0347%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

aggregate_query_group_by
                        time:   [2.2304 ms 2.2478 ms 2.2671 ms]
                        change: [-0.4475% +0.6787% +1.7891%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Benchmarking aggregate_query_group_by_with_filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter
                        time:   [1.1318 ms 1.1354 ms 1.1395 ms]
                        change: [-0.8011% -0.0882% +0.5275%] (p = 0.81 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  9 (9.00%) high mild
  3 (3.00%) high severe

aggregate_query_group_by_u64 15 12
                        time:   [2.2595 ms 2.2757 ms 2.2925 ms]
                        change: [-0.2947% +0.7316% +1.7738%] (p = 0.15 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

Benchmarking aggregate_query_group_by_with_filter_u64 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter_u64 15 12
                        time:   [1.1343 ms 1.1370 ms 1.1401 ms]
                        change: [+0.1610% +0.6333% +1.2276%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

aggregate_query_group_by_u64_multiple_keys
                        time:   [14.649 ms 14.972 ms 15.301 ms]
                        change: [-2.8133% +0.2410% +3.4242%] (p = 0.87 > 0.05)
                        No change in performance detected.

aggregate_query_approx_percentile_cont_on_u64
                        time:   [3.7312 ms 3.7587 ms 3.7860 ms]
                        change: [-1.7540% -0.6290% +0.4997%] (p = 0.26 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

aggregate_query_approx_percentile_cont_on_f32
                        time:   [3.1705 ms 3.1969 ms 3.2240 ms]
                        change: [-1.0292% +0.0460% +1.1440%] (p = 0.94 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Tl;Dr: No relevant changes!

Are there any user-facing changes?

The no-group agg op an now emit a ResourceExhausted error if it runs out of memory. Note that the error is kinda nested/wrapped due to #4172.

Closes apache#3940.

alamb

Looks great to me -- thank you @crepererum -- I also plan to test this as part of #4404

alamb · 2022-11-28T17:14:31Z

datafusion/core/src/physical_plan/aggregates/mod.rs


-        for (version, aggregates) in [(1, aggregates_v1), (2, aggregates_v2)] {
+        for (version, groups, aggregates) in [
+            (0, groups_none, aggregates_v0),


👍 this is the test coverage 👍

alamb · 2022-11-28T17:15:12Z

cc @milenkovicm

ursabot · 2022-11-28T21:11:35Z

Benchmark runs are scheduled for baseline = 0d334cf and contender = dd3f72a. dd3f72a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

feat: ResourceExhausted for memory limit in AggregateStream

2ff37a4

Closes apache#3940.

github-actions bot added the core Core DataFusion crate label Nov 28, 2022

alamb requested a review from yjshen November 28, 2022 17:11

alamb approved these changes Nov 28, 2022

View reviewed changes

alamb mentioned this pull request Nov 28, 2022

Add integration test for erroring when memory limits are hit #4406

Merged

1 task

alamb merged commit dd3f72a into apache:master Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: `ResourceExhausted` for memory limit in `AggregateStream` #4405

feat: `ResourceExhausted` for memory limit in `AggregateStream` #4405

Uh oh!

crepererum commented Nov 28, 2022 •

edited

Loading

Uh oh!

alamb left a comment

Uh oh!

alamb Nov 28, 2022

Uh oh!

alamb commented Nov 28, 2022

Uh oh!

ursabot commented Nov 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: ResourceExhausted for memory limit in AggregateStream #4405

feat: ResourceExhausted for memory limit in AggregateStream #4405

Uh oh!

Conversation

crepererum commented Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

alamb commented Nov 28, 2022

Uh oh!

ursabot commented Nov 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: `ResourceExhausted` for memory limit in `AggregateStream` #4405

feat: `ResourceExhausted` for memory limit in `AggregateStream` #4405

crepererum commented Nov 28, 2022 •

edited

Loading