Add trace of consumers to OOM error messages #17943

wiedld · 2025-10-06T19:32:08Z

Which issue does this PR close?

Drafted/proposed solution for #16904

Rationale for this change

Various changes to make the OOM error messages more readable.
If we agree with the basic approach, then I'll breakup this draft into smaller PRs for code review.

What changes are included in this PR?

General changes, NOT having to do with the OOM error stack:

These are the first 2 commits, which are on this separate PR: Make OOM error messages more readable. #17944

Changes for the OOM consumer stack:

add lineage information to each MemoryConsumer (such that we can later on build traces): 987192c
new ReportedConsumer which represents a snapshot: 0bc7630
- reduce lock holding, such that we can use this snapshot in other ways too (maybe realtime tracking?)
new ConsumerStackTrace: ab62fcb

Example usage:

use the consumer parent/child relationship in ParquetWriter: 3bc6820
see the changes in the ParquetWriter OOM error messages, when we enable for TrackConsumersPool::report_top: c7e869f

Are these changes tested?

Yes.

Are there any user-facing changes?

Only nicer error messages.

…eservations to make child reservations

…mers

wiedld · 2025-10-06T20:30:45Z

datafusion/core/tests/memory_limit/mod.rs

+            "Resources exhausted: Additional allocation failed for ParquetSink(ArrowColumnWriter(col=1)) with top memory consumers (across reservations) as:
+  ParquetSink(ArrowColumnWriter(col=8))#ID(can spill: false) consumed x KB, peak x KB:
+stack backtrace:
+   0: ParquetSink(ArrowColumnWriter(col=8))#ID(can spill: false) consumed x KB, peak x KB
+   1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B, peak x B
+   2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+,
+  ParquetSink(ArrowColumnWriter(col=14))#ID(can spill: false) consumed x KB, peak x KB:
+stack backtrace:
+   0: ParquetSink(ArrowColumnWriter(col=14))#ID(can spill: false) consumed x KB, peak x KB
+   1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B, peak x B
+   2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+,
+  ParquetSink(ArrowColumnWriter(col=0))#ID(can spill: false) consumed x KB, peak x KB:
+stack backtrace:
+   0: ParquetSink(ArrowColumnWriter(col=0))#ID(can spill: false) consumed x KB, peak x KB
+   1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B, peak x B
+   2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+,
+  ParquetSink(ArrowColumnWriter(col=2))#ID(can spill: false) consumed x KB, peak x KB:
+stack backtrace:
+   0: ParquetSink(ArrowColumnWriter(col=2))#ID(can spill: false) consumed x KB, peak x KB
+   1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B, peak x B
+   2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+,
+  ParquetSink(ArrowColumnWriter(col=1))#ID(can spill: false) consumed x KB, peak x KB:
+stack backtrace:
+   0: ParquetSink(ArrowColumnWriter(col=1))#ID(can spill: false) consumed x KB, peak x KB
+   1: ParquetSink(ParallelColumnWriters)#ID(can spill: false) consumed x B, peak x B
+   2: ParquetSink(ParallelWriter)#ID(can spill: false) consumed x B, peak x B
+.
+Error: Failed to allocate additional x KB for ParquetSink(ArrowColumnWriter(col=1)) with x KB already allocated for this reservation - x KB remain available for the total pool",


This is an example of using the parent/child relationship to build a trace of consumers.

Currently, this approach is limited to the current way that memory reservations work. Meaning, the parent's bytes (consumed & peak) do NOT include cummulative from all the children. If this is desired, we can make this change at report generation time using the snapshot ReportedConsumer (to not hold the lock).

2010YOUY01 · 2025-10-07T07:06:01Z

Thank you for this awesome work!

Making memory consumers hierarchical makes sense to me. There is one consideration: the existing memory pool strategies (e.g. FairSpillingPool) was implemented assuming one operator instance (xxStream in impl) has one-to-one mapping to one MemoryConsumer, and it's splitting the available memory evenly to available MemoryConsumers. If we want to make it hierarchical, I think we have to update memory pools -- for instance in FairSpillingPool, split the memory evenly according to top-level consumers, and ignore the child consumers.

Regarding the error message format, IIUC now it's displaying the lineage of the consumer that triggered OOM, for instance OOM error happened in ConsumerX, the error message will look like

OperatorX - 100M
ParentOfOperatorX - 200M
(root)GrandParentOfOpertorX - 500M

I think this lineage information might not be the most straightforward for troubleshooting. How about just printing the whole picture instead? It would be showing top memory consumers, along with all its child consumers recursively down to the leaf.

Also, here is a related issue, I think we want to make sure the changes are compatible (otherwise choose one to proceed): #17901

comphead · 2025-10-07T16:53:06Z

@andygrove FYI

comphead · 2025-10-07T17:00:26Z

Thanks @wiedld my question similar to @2010YOUY01 how to use the lineage, it sounds very interesting

comphead

Thanks @wiedld

comphead · 2025-10-07T16:59:30Z

datafusion/execution/src/memory_pool/mod.rs

+    pub fn new_with_parent(name: impl Into<String>, parent_id: usize) -> Self {
+        Self {
+            name: name.into(),
+            can_spill: false,


would it be always false? 🤔

It's meant to be an alternative to MemoryConsumer::new, and it's expected that the caller would switch from:

MemoryConsumer::new("foo").with_can_spill(true)

To have lineage with:

MemoryConsumer::new_wth_parent("foo", parent_id).with_can_spill(true)

Although, this does point out that the added MemoryReservation::new_child_reservation should provide a spill config option. It's currently:

datafusion/datafusion/execution/src/memory_pool/mod.rs

Lines 485 to 492 in 7f45ee8

/// Create a new [`MemoryReservation`] with a new [`MemoryConsumer] that

/// is a child of this reservation's consumer.

///

/// This is useful for creating memory consumers with lineage tracking.

pub fn new_child_reservation(&self, name: impl Into<String>) -> MemoryReservation {

MemoryConsumer::new_with_parent(name, self.consumer().id())

.register(&self.registration.pool)

}

I'll go add that now. Thank you.

Added: a77e316

Rather than add a bunch of new APIs for creating new with parent id, how about just adding a method like

fn with_parent_id(mut self, parent_id: ..) -> Self { .. }

That mirrors the other fields?

That might also make the defaults less confusing

wiedld · 2025-10-07T17:22:26Z

Thanks @wiedld my question similar to @2010YOUY01 how to use the lineage, it sounds very interesting

The lineage is created by using child reservations. I made an example commit here, using the parallel parquet writing:
3bc6820

That is why in the following commit, when I switch to using the reservation stacktrace (based on lineage) for the OOM error reporting (a.k.a. TrackConsumersPool::report_top), we have the expanded OOM error trace for the parallel parquet writing:
c7e869f

If I started using the child reservations for other physical plan nodes, I would also expect to see a similar change in the OOM error messaging -- to include these traces.

…tions

wiedld · 2025-10-07T17:44:13Z

I should also clarify. The request (in the issue) was to help better debug OOM ing, but also ways to see how the different memory consumers are related (at least in this example the lineage was apparent to me 🤷🏼 ).

I choose specific abstractions in hopes that we could repurpose the bits. We have a lineage of reservations, and a new snapshot construct (to perhaps grab snapshots of the locked state during a running query). I then created the stack trace, and used it for the OOM error message, as one specific example. Although we could also use it for other memory analysis/visualization tooling. 🤔

wiedld · 2025-10-07T19:02:55Z

I think this lineage information might not be the most straightforward for troubleshooting. How about just printing the whole picture instead? It would be showing top memory consumers, along with all its child consumers recursively down to the leaf.

@2010YOUY01 - Should I make this change to the OOM messages? Or should we use the snapshots & lineage trees with another debug tool which shows all consumers?

There is one consideration: the existing memory pool strategies (e.g. FairSpillingPool)

The TrackConsumersPool wraps around other pools, and keeps track of successful requests to change reservation sizing. As long as the size requested matches the size allotted/changed (on success), then I believe the TrackConsumersPool would reflect the reservation state? I'm not familiar with the FairSpillingPool but I think the actual allotment is the requested amount? (If it passes the fairness test.) Although I may be missing something.

alamb

I think this looks like a nice change -- thank you @wiedld

@2010YOUY01 and @rluvaton perhaps you might also be interested in this one

I left a few API comments, but the overall idea of introducing parent ids to memory reservations seems like a great first step towards more fine grained control

datafusion/core/tests/memory_limit/mod.rs

alamb · 2025-10-07T20:25:30Z

datafusion/datasource-parquet/src/file_format.rs

                let parallel_options_clone = parallel_options.clone();
-                let pool = Arc::clone(context.memory_pool());
+                // Create a reservation for the parallel parquet writing
+                let reservation = MemoryConsumer::new("ParquetSink(ParallelWriter)")


should this be a new child reservation of the reservation above ? Or maybe I am misreading the diff and it already is

Do you mean the MemoryConsumer::new(format("ParquetSink[path={path}]")) on line 1273? That one is for the non-parallel writing.

Whereas this reservation here is the root of the parallel path writing. Perhaps I should change naming?

alamb · 2025-10-07T20:26:15Z

datafusion/execution/src/memory_pool/mod.rs

    name: String,
    can_spill: bool,
    id: usize,
+    parent_id: Option<usize>,


it would be nice to expand out the memory consumer documentation to mention how the parent/child relationship was used

alamb · 2025-10-07T20:30:43Z

datafusion/execution/src/memory_pool/mod.rs

+    pub fn new_with_parent(name: impl Into<String>, parent_id: usize) -> Self {
+        Self {
+            name: name.into(),
+            can_spill: false,


Rather than add a bunch of new APIs for creating new with parent id, how about just adding a method like

fn with_parent_id(mut self, parent_id: ..) -> Self { .. }

That mirrors the other fields?

That might also make the defaults less confusing

alamb · 2025-10-07T20:32:33Z

datafusion/execution/src/memory_pool/mod.rs

+    pub fn new_child_reservation(
+        &self,
+        name: impl Into<String>,
+        can_spill: bool,


I am not sure about always passing can_spill here -- I think we can use the existing with_can_spill API instead of forcing a bunch of bool parameters

alamb · 2025-10-07T20:34:04Z

datafusion/execution/src/memory_pool/mod.rs

+    ///
+    /// This is useful for creating memory consumers with lineage tracking,
+    /// while dealing with multithreaded scenarios.
+    pub fn cloned_reservation(&self) -> MemoryReservation {


It is strange to me that clone_with_new_id doesn't also register the reservation 🤔

As now users have two APIs to think about and figure out which to use

Could we unify them somehow (e.g. maybe deprecate clone_with_new_id (as a follow on PR)?)

wiedld · 2025-10-09T11:51:34Z

Converting to draft, since @alamb 's comments have me changing approach a bit. I'm mark ready-for-review later.

github-actions bot added core Core DataFusion crate execution Related to the execution crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Oct 6, 2025

wiedld added 2 commits October 6, 2025 22:47

chore: make memory consumer naming more readible

f6c1f18

test: update OOM integration test to use full stdout

f4cd2db

wiedld force-pushed the 16904/fine-grain-usage branch from d703964 to cf6f807 Compare October 6, 2025 19:58

wiedld changed the title ~~16904/fine grain usage~~ Add trace of consumers to OOM error messages Oct 6, 2025

wiedld added 6 commits October 6, 2025 23:12

chore: handle CI error

4115b5c

feat: add lineage information to a given MemoryConsumer, and enable r…

987192c

…eservations to make child reservations

refactor: make a separate ReportedConsumer whcih represents a snapshot

0bc7630

feat: add consumer stacktrace

ab62fcb

feat: switch parquet writer to use the new lineage-based memory consu…

3bc6820

…mers

feat: turn on stacktrace (if available) in error reporting

c7e869f

wiedld force-pushed the 16904/fine-grain-usage branch from cf6f807 to c7e869f Compare October 6, 2025 20:14

wiedld commented Oct 6, 2025

View reviewed changes

test: update tests for OOM output

7f45ee8

wiedld marked this pull request as ready for review October 6, 2025 21:19

comphead reviewed Oct 7, 2025

View reviewed changes

refactor: enable can_spill config setting when creating child reserva…

a77e316

…tions

alamb reviewed Oct 7, 2025

View reviewed changes

wiedld marked this pull request as draft October 9, 2025 11:50

	/// Create a new [`MemoryReservation`] with a new [`MemoryConsumer] that
	/// is a child of this reservation's consumer.
	///
	/// This is useful for creating memory consumers with lineage tracking.
	pub fn new_child_reservation(&self, name: impl Into<String>) -> MemoryReservation {
	MemoryConsumer::new_with_parent(name, self.consumer().id())
	.register(&self.registration.pool)
	}

Add trace of consumers to OOM error messages #17943

Are you sure you want to change the base?

Add trace of consumers to OOM error messages #17943

Conversation

wiedld commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

wiedld Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 commented Oct 7, 2025

Uh oh!

comphead commented Oct 7, 2025

Uh oh!

comphead commented Oct 7, 2025

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wiedld Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wiedld commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wiedld commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wiedld commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wiedld commented Oct 9, 2025

Uh oh!

Uh oh!

wiedld commented Oct 6, 2025 •

edited

Loading

wiedld Oct 6, 2025 •

edited

Loading

wiedld Oct 7, 2025 •

edited

Loading

wiedld commented Oct 7, 2025 •

edited

Loading

wiedld commented Oct 7, 2025 •

edited

Loading

wiedld commented Oct 7, 2025 •

edited

Loading