GH-40224: [C++] Fix: improve the backpressure handling in the dataset writer #40722

westonpace · 2024-03-21T18:28:42Z

Rationale for this change

The dataset writer would fire the resume callback as soon as the underlying dataset writer's queues freed up, even if there were pending tasks. Backpressure is not applied immediately and so a few tasks will always trickle in. If backpressure is pausing and then resuming frequently this can lead to a buildup of pending tasks and uncontrolled memory growth.

What changes are included in this PR?

The resume callback is not called until all pending write tasks have completed.

Are these changes tested?

There is quite an extensive set of tests for the dataset writer already and they continue to pass. I ran them on repeat, with and without stress, and did not see any issues.

However, the underlying problem (dataset writer can have uncontrolled memory growth) is still not tested as it is quite difficult to test. I was able to run the setup described in the issue to reproduce the issue. With this fix the repartitioning task completes for me.

Are there any user-facing changes?

No

GitHub Issue: [C++] Repartitioning a large dataset with 3 variables uses all my RAM and kills the process #40224

github-actions · 2024-03-21T18:29:09Z

⚠️ GitHub issue #40224 has been automatically assigned in GitHub to PR creator.

thisisnic · 2024-03-25T10:00:02Z

Thanks @westonpace! I can confirm that when running this code on the dataset using the query reported in the original issue, everything now works perfectly! 🎉

thisisnic · 2024-04-01T15:22:04Z

@pitrou or @bkietz - are you happy for me to merge it?

mapleFU · 2024-04-01T15:54:58Z

cpp/src/arrow/util/async_util.h

@@ -277,6 +278,8 @@ class ARROW_EXPORT ThrottledAsyncTaskScheduler : public AsyncTaskScheduler {
  /// Allows task to be submitted again.  If there is a max_concurrent_cost limit then
  /// it will still apply.
  virtual void Resume() = 0;
+  /// Return the number of tasks queued but not yet submitted
+  virtual std::size_t QueueSize() = 0;


Would this better a std::size_t QueueSize() const?

The implementation uses a std::mutex and so I'd have to mark the mutex mutable right? Which would you prefer? "mutable mutex" or "non-const accessor"? I don't have strong preference.

Usally mutex is used with mutable. But this LGTM either, I don't have strong preference too

mapleFU · 2024-04-01T15:57:32Z

cpp/src/arrow/dataset/dataset_writer.cc

+            paused_ = true;
+            return has_room.Then([this] { ResumeIfNeeded(); });
+          } else {
+            ResumeIfNeeded();
          }


So this is because when has_room is finished when paused_, resume_callback_ is not called?

If has_room is finished then we can unpause if there are no other tasks because there is room for another batch.

May I ask a stupid question that why has_room.Then not being called in this scenerio? In ThrottledAsyncTaskSchedulerImpl::ContinueTasks(), wouldn't it trigger the callback?

Oh because it's paused...I got to understand this. So we don't "resume" enough, which causing new tasks didn't being consumed.

Here is a picture. In the old behavior, we resume too much.

mapleFU · 2024-04-01T16:07:21Z

LGTM as a fixing here, seems currently we don't understand why this happens?

westonpace · 2024-04-02T00:26:19Z

LGTM as a fixing here, seems currently we don't understand why this happens?

Thanks for the review. I do understand why this was happening.

We can't pause immediately because Acero is push-based. It takes time for the pause signal to travel from sink to source. More tasks may be scheduled during this time (and other tasks may be in flight).
Previously, we did not finish queued tasks before unpausing. This means we would unpause and let in more queued tasks and then pause again real quick. This leads to the # of queued tasks growing without bound.
The fix does not unpause until all queued tasks have run.

mapleFU · 2024-04-02T05:36:59Z

Previously, we did not finish queued tasks before unpausing. This means we would unpause and let in more queued tasks and then pause again real quick. This leads to the # of queued tasks growing without bound.

I get to understand why "pause" "resume" is important. But I remain a point that don't understand. When throttle and max_concurrent_cost = 1, the max-running-task = 1, wouldn't Release and ContinueTasks() being called after current pending task (size = 1) is finished?

westonpace · 2024-04-02T11:54:29Z

I get to understand why "pause" "resume" is important. But I remain a point that don't understand. When throttle and max_concurrent_cost = 1, the max-running-task = 1, wouldn't Release and ContinueTasks() being called after current pending task (size = 1) is finished?

A task is "running" even when it is blocked on backpressure. Since max-running-task is 1 then Release/ContinueTasks won't be called until the has_room future has finished. In the meantime, more tasks may have arrived. Since a task was running and max-running-tasks is 1 then those tasks are put in the queue.

mapleFU · 2024-04-02T13:54:40Z

So, actually this patch making "resume" more strict in dataset writer scenerio

mapleFU · 2024-04-02T16:08:29Z

Will merge this in friday if no negative comments

westonpace · 2024-04-02T19:34:51Z

So, actually this patch making "resume" more strict in dataset writer scenerio

Yes, we want to resume less frequently 👍

Will merge this in friday if no negative comments

Thanks

pitrou · 2024-04-03T08:26:08Z

cpp/src/arrow/dataset/dataset_writer.cc

+      }
+    }
+    if (needs_resume) {
+      paused_ = false;


Should this be done with the mutex acquired or are all accesses to paused_ done from the same thread?

Well, test-ubuntu-20.04-cpp-thread-sanitizer passed at least.

All access to paused_ is done from a single "logical thread". write_tasks_ is a scheduler with max capacity of 1 and so the items submitted to it will never run in parallel (though they may be run on different OS threads).

pitrou · 2024-04-03T08:26:24Z

@github-actions crossbow submit -g cpp

pitrou · 2024-04-03T08:34:59Z

@github-actions crossbow submit -g cpp

pitrou · 2024-04-03T08:35:07Z

I rebased for CI fixes.

github-actions · 2024-04-03T08:37:46Z

Revision: fdb625b

Submitted crossbow builds: ursacomputing/crossbow @ actions-af079e8fb6

Task	Status
test-alpine-linux-cpp
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-cuda-cpp
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-fedora-39-cpp
test-ubuntu-20.04-cpp
test-ubuntu-20.04-cpp-bundled
test-ubuntu-20.04-cpp-minimal-with-formats
test-ubuntu-20.04-cpp-thread-sanitizer
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-gcc-14

pitrou · 2024-04-03T09:49:41Z

@ursabot please benchmark

ursabot · 2024-04-03T09:49:47Z

Benchmark runs are scheduled for commit fdb625b. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

conbench-apache-arrow · 2024-04-03T14:36:45Z

Thanks for your patience. Conbench analyzed the 7 benchmarking runs that have been run so far on PR commit fdb625b.

There were 12 benchmark results indicating a performance regression:

Pull Request Run on test-mac-arm at 2024-04-03 12:29:21Z
- BM_ByteStreamSplitEncode_Double_Neon (C++) with params=1024, source=cpp-micro, suite=parquet-encoding-benchmark
- ChunkCSVFlightsExample (C++) with source=cpp-micro, suite=arrow-csv-parser-benchmark
and 10 more (see the report linked below)

The full Conbench report has more details.

conbench-apache-arrow · 2024-04-04T23:37:02Z

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 640c101.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them.

…ataset writer (apache#40722) ### Rationale for this change The dataset writer would fire the resume callback as soon as the underlying dataset writer's queues freed up, even if there were pending tasks. Backpressure is not applied immediately and so a few tasks will always trickle in. If backpressure is pausing and then resuming frequently this can lead to a buildup of pending tasks and uncontrolled memory growth. ### What changes are included in this PR? The resume callback is not called until all pending write tasks have completed. ### Are these changes tested? There is quite an extensive set of tests for the dataset writer already and they continue to pass. I ran them on repeat, with and without stress, and did not see any issues. However, the underlying problem (dataset writer can have uncontrolled memory growth) is still not tested as it is quite difficult to test. I was able to run the setup described in the issue to reproduce the issue. With this fix the repartitioning task completes for me. ### Are there any user-facing changes? No * GitHub Issue: apache#40224 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

westonpace added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Mar 21, 2024

westonpace added this to the 16.0.0 milestone Mar 21, 2024

github-actions bot added the awaiting committer review Awaiting committer review label Mar 21, 2024

westonpace requested a review from bkietz March 22, 2024 15:49

westonpace mentioned this pull request Mar 22, 2024

[C++] Repartitioning a large dataset with 3 variables uses all my RAM and kills the process #40224

Closed

thisisnic approved these changes Mar 25, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Mar 25, 2024

mapleFU reviewed Apr 1, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Apr 2, 2024

mapleFU approved these changes Apr 2, 2024

View reviewed changes

pitrou reviewed Apr 3, 2024

View reviewed changes

This comment was marked as outdated.

Sign in to view

fix: improve the backpressure handling in the dataset writer

fdb625b

pitrou force-pushed the fix/dataset-writer-backpressure branch from 0dc3a3d to fdb625b Compare April 3, 2024 08:34

github-actions bot added the Component: C++ label Apr 3, 2024

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 3, 2024

pitrou changed the title ~~GH-40224: [C++] fix: improve the backpressure handling in the dataset writer~~ GH-40224: [C++] Fix: improve the backpressure handling in the dataset writer Apr 3, 2024

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 3, 2024

pitrou approved these changes Apr 3, 2024

View reviewed changes

pitrou merged commit 640c101 into apache:main Apr 4, 2024
38 of 40 checks passed

pitrou removed the awaiting changes Awaiting changes label Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-40224: [C++] Fix: improve the backpressure handling in the dataset writer #40722

GH-40224: [C++] Fix: improve the backpressure handling in the dataset writer #40722

westonpace commented Mar 21, 2024 •

edited by github-actions bot

github-actions bot commented Mar 21, 2024

thisisnic commented Mar 25, 2024 •

edited

thisisnic commented Apr 1, 2024

mapleFU Apr 1, 2024

westonpace Apr 2, 2024

mapleFU Apr 2, 2024

mapleFU Apr 1, 2024

westonpace Apr 2, 2024

mapleFU Apr 2, 2024 •

edited

mapleFU Apr 2, 2024

westonpace Apr 2, 2024

mapleFU commented Apr 1, 2024

westonpace commented Apr 2, 2024

mapleFU commented Apr 2, 2024

westonpace commented Apr 2, 2024

mapleFU commented Apr 2, 2024

mapleFU commented Apr 2, 2024

westonpace commented Apr 2, 2024

pitrou Apr 3, 2024

pitrou Apr 3, 2024

westonpace Apr 3, 2024

pitrou commented Apr 3, 2024

This comment was marked as outdated.

pitrou commented Apr 3, 2024

pitrou commented Apr 3, 2024

github-actions bot commented Apr 3, 2024

pitrou commented Apr 3, 2024

ursabot commented Apr 3, 2024

conbench-apache-arrow bot commented Apr 3, 2024

conbench-apache-arrow bot commented Apr 4, 2024

GH-40224: [C++] Fix: improve the backpressure handling in the dataset writer #40722

GH-40224: [C++] Fix: improve the backpressure handling in the dataset writer #40722

Conversation

westonpace commented Mar 21, 2024 • edited by github-actions bot

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Mar 21, 2024

thisisnic commented Mar 25, 2024 • edited

thisisnic commented Apr 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mapleFU Apr 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mapleFU commented Apr 1, 2024

westonpace commented Apr 2, 2024

mapleFU commented Apr 2, 2024

westonpace commented Apr 2, 2024

mapleFU commented Apr 2, 2024

mapleFU commented Apr 2, 2024

westonpace commented Apr 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented Apr 3, 2024

This comment was marked as outdated.

pitrou commented Apr 3, 2024

pitrou commented Apr 3, 2024

github-actions bot commented Apr 3, 2024

pitrou commented Apr 3, 2024

ursabot commented Apr 3, 2024

conbench-apache-arrow bot commented Apr 3, 2024

conbench-apache-arrow bot commented Apr 4, 2024

westonpace commented Mar 21, 2024 •

edited by github-actions bot

thisisnic commented Mar 25, 2024 •

edited

mapleFU Apr 2, 2024 •

edited