Is your feature request related to a problem or challenge?
We see recurring regressions around OOMs within dataprime as we continually upgrade DataFusion. These generally boil down into some sort of tracked / untracked accounting oversights. These are often painful to investigate, and because we are in a diskless environment, I'm not certain our use case is common.
Describe the solution you'd like
Add a custom global allocator feature flag in cargo, enable it during certain tests (SLTs?), and ensure that the used memory doesn't go above the tracked memory by a certain %. Apply that to all tests (SLTs?) so that regressions don't happen in the future.
We could start with X% being high - no failures, then tune down over time to either catch specific regressions as we find them, or to find new ones by lowering, or to catch new ones before they are merged.
This should catch any time a DataFusion developer accidentally allocated a large amount of memory, even for a microsecond, but forgot to register it with MemoryTracker first.
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
We see recurring regressions around OOMs within dataprime as we continually upgrade DataFusion. These generally boil down into some sort of tracked / untracked accounting oversights. These are often painful to investigate, and because we are in a diskless environment, I'm not certain our use case is common.
Describe the solution you'd like
Add a custom global allocator feature flag in cargo, enable it during certain tests (SLTs?), and ensure that the used memory doesn't go above the tracked memory by a certain %. Apply that to all tests (SLTs?) so that regressions don't happen in the future.
We could start with X% being high - no failures, then tune down over time to either catch specific regressions as we find them, or to find new ones by lowering, or to catch new ones before they are merged.
This should catch any time a DataFusion developer accidentally allocated a large amount of memory, even for a microsecond, but forgot to register it with MemoryTracker first.
Describe alternatives you've considered
No response
Additional context
No response