Skip to content

[VL] Memory leak for version release 1.4.0 #9316

@j7nhai

Description

@j7nhai

Backend

VL (Velox)

Bug description

E20250414 17:02:31.939500   222 VeloxMemoryManager.cc:392] Failed to release Velox memory manager after 43350ms as there are still outstanding memory resources.
E20250414 17:02:31.939989   222 Exceptions.h:66] Line: /data/deploy/gluten/ep/build-velox/build/velox_ep/velox/common/memory/Memory.cpp:149, Function:~MemoryManager, Expression:  pools_.size() != 0 (1 vs 0). There are unexpected alive memory pools allocated by user on memory manager destruction:
Memory Manager[capacity UNLIMITED alignment 64B usedBytes 0B number of pools 4
List of root pools:
__sys_root__ usage 0B reserved 0B peak 0B
    __sys_shared_leaf__0 usage 0B reserved 0B peak 0B
    __sys_tracing__ usage 0B reserved 0B peak 0B
    __sys_spilling__ usage 0B reserved 0B peak 0B
    __sys_caching__ usage 0B reserved 0B peak 0B
root usage 0B reserved 0B peak 3.73GB
    task.Gluten_Stage_1_TID_7976_VTID_56 usage 0B reserved 0B peak 3.73GB
        node.0 usage 0B reserved 0B peak 112.00MB
            op.0.0.0.TableScan usage 0B reserved 0B peak 104.66MB
	refcount 2
Memory Allocator[MALLOC capacity UNLIMITED allocated bytes 3145728 allocated pages 0 mapped pages 0]
ARBITRATOR[GLUTEN] CAPACITY 8388608.00TB numRequests 0 numRunning 0 numSucceded 0 numAborted 0 numFailures 0 numNonReclaimableAttempts 0 reclaimedFreeCapacity 0B reclaimedUsedCapacity 0B maxCapacity 0B freeCapacity 0B freeReservedCapacity 0B], Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: pools_.size() != 0 (1 vs 0). There are unexpected alive memory pools allocated by user on memory manager destruction:
Memory Manager[capacity UNLIMITED alignment 64B usedBytes 0B number of pools 4
List of root pools:
__sys_root__ usage 0B reserved 0B peak 0B
    __sys_shared_leaf__0 usage 0B reserved 0B peak 0B
    __sys_tracing__ usage 0B reserved 0B peak 0B
    __sys_spilling__ usage 0B reserved 0B peak 0B
    __sys_caching__ usage 0B reserved 0B peak 0B
root usage 0B reserved 0B peak 3.73GB
    task.Gluten_Stage_1_TID_7976_VTID_56 usage 0B reserved 0B peak 3.73GB
        node.0 usage 0B reserved 0B peak 112.00MB
            op.0.0.0.TableScan usage 0B reserved 0B peak 104.66MB
	refcount 2
Memory Allocator[MALLOC capacity UNLIMITED allocated bytes 3145728 allocated pages 0 mapped pages 0]
ARBITRATOR[GLUTEN] CAPACITY 8388608.00TB numRequests 0 numRunning 0 numSucceded 0 numAborted 0 numFailures 0 numNonReclaimableAttempts 0 reclaimedFreeCapacity 0B reclaimedUsedCapacity 0B maxCapacity 0B freeCapacity 0B freeReservedCapacity 0B]
Retriable: False
Function: ~MemoryManager
File: /data/deploy/gluten/ep/build-velox/build/velox_ep/velox/common/memory/Memory.cpp
Line: 149
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  0x0000000001aef77e
# 4  _ZN6gluten18VeloxMemoryManagerD1Ev
# 5  _ZN6gluten18VeloxMemoryManagerD0Ev
# 6  _ZN6gluten13MemoryManager7releaseEPS0_
# 7  Java_org_apache_gluten_memory_NativeMemoryManagerJniWrapper_release
# 8  0x00007f1a157a6da7

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions