-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
Bug
I cannot del and collect table, and this is a big problem as I continuously loading and processing large dataset.
Platform and Versions:
Platform: Linux-5.15.0-57-generic-x86_64-with-glibc2.35
Python: 3.11.3 (main, May 15 2023, 15:45:52) [GCC 11.2.0]
pyarrow: 12.0.0
Data:
https://drive.google.com/file/d/1kigHHHcyx2hWi4xnctG0om3G_X2azcS6/view?usp=drive_link
Code
import gc
import os
import psutil
import pyarrow.feather as feather
def show_memory_info(msg):
info = psutil.virtual_memory()
print(f'\n{msg} -- current(MB): {psutil.Process(os.getpid()).memory_info().rss / 1024**2:.3f}')
print(f'{msg} -- total(MB): {info.total / 1024**2:.3f}')
print(f'{msg} -- account(MB): {info.percent:.3f}')
def main():
show_memory_info('before loading:')
table = feather.read_feather('data.feather')
show_memory_info('after loading:')
del table
gc.collect()
show_memory_info('after del and collect:')
if __name__ == '__main__':
main()
show_memory_info('final:')before loading: -- current(MB): 67.430
before loading: -- total(MB): 64052.953
before loading: -- account(MB): 2.500
after loading: -- current(MB): 918.852
after loading: -- total(MB): 64052.953
after loading: -- account(MB): 3.800
after del and collect: -- current(MB): 861.238
after del and collect: -- total(MB): 64052.953
after del and collect: -- account(MB): 3.700
final: -- current(MB): 861.238
final: -- total(MB): 64052.953
final: -- account(MB): 3.700
Process finished with exit code 0
Component(s)
Python