-
-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7.4.1 Memory Leak #981
Comments
Hello all, Python setup'os = Windows-10-10.0.19045-SP0'
'python = 3.11.1 (tags/v3.11.1:a7a450f, Dec 6 2022, 19:58:39) [MSC v.1934 64 bit (AMD64)]'
'asammdf = 7.4.2'
'numpy = 1.26.4' Dummy codeimport numpy as np
from asammdf import MDF
class Container:
def __init__(self):
self.data = np.empty((1,))
self.data = np.NAN
# not a @staticmethod as in the real code some other actions are performed
def get_data(self, fd):
with MDF(name=fd, memory='minimal') as file:
data = file.extract_bus_logging(
database_files={'CAN': [(data_base, can_channel)], }
).to_dataframe()['sig_1', 'sig_2', 'sig_n'].to_numpy(dtype=float)
return data
fds = ['file_0.mf4', 'file_1.mf4', 'file_n.mf4']
obj = Container()
for f in fds:
if np.isnan(obj.data).any():
obj.data = obj.get_data(f)
else:
np.append(
arr=obj.data,
values=obj.get_data(f),
axis=0
) DescriptionEach of the .mf4 files is about 900 MiB in size. The data extracted from each of these files and stored to the Workaround try: Threading/MultiprocessingSince it is known that Python does not necessarily release all the memory back to the OS once variables get cleared, I thought to try off-loading the for f in fds:
if np.isnan(obj.data).any():
with concurrent.futures.ThreadPoolExecutor() as executor:
tmp_data = executor.map(obj.get_data, [f])
obj.data = tmp_data.__next__()
del tmp_data
else:
with concurrent.futures.ThreadPoolExecutor() as executor:
tmp_data = executor.map(obj.get_data, [f])
np.append(
arr=obj.data,
values=tmp_data.__next__(),
axis=0
)
del tmp_data Unfortunately, even that did not do the trick. The memory leak is unchanged. Trying a Workaround try: copy.deepcopy()The intention behind utilizing with MDF(name=fd, memory='minimal') as file:
data = deepcopy(
file.extract_bus_logging(
database_files={'CAN': [(data_base, can_channel)], }
)
) Side note: Executing a with MDF(name=fd, memory='minimal') as file:
data = deepcopy(
file.extract_bus_logging(
database_files={'CAN': [(data_base, can_channel)], }
).to_dataframe()['sig_1', 'sig_2', 'sig_n'].to_numpy(dtype=float)
) |
Is this better? import numpy as np
from asammdf import MDF
class Container:
def __init__(self):
self.data = np.empty((1,))
self.data = np.NAN
# not a @staticmethod as in the real code some other actions are performed
def get_data(self, fd):
with MDF(name=fd, memory='minimal') as file:
data_mdf = file.extract_bus_logging(
database_files={'CAN': [(data_base, can_channel)], }
)
data = data_mdf.to_dataframe()['sig_1', 'sig_2', 'sig_n'].to_numpy(dtype=float)
data_mdf.close()
return data
fds = ['file_0.mf4', 'file_1.mf4', 'file_n.mf4']
obj = Container()
for f in fds:
if np.isnan(obj.data).any():
obj.data = obj.get_data(f)
else:
np.append(
arr=obj.data,
values=obj.get_data(f),
axis=0
) |
Unfortunately, assigning the numpy data to a new variable and closing the |
In a desperate attempt to try and bypass the memory leak issue I tried wrapping the # extract_bus_logging_wrapper.py
# Got turned into a standalone executable utilizing: pyinstaller --onefile extract_bus_logging_wrapper.py
# (pyinstaller v6.6.0)
def arg_parser():
parser = argparse.ArgumentParser()
for arg in ['fp', 'bus', 'db', 'ch']:
parser.add_argument(arg, type=str)
args = parser.parse_args()
return args.fp, args.bus, args.db, args.ch
def extract_bus_logging_wrapper(_fp, _bus, _db, _ch):
_db_dict = {_bus: [(_db, int(_ch))]}
return pickle.dumps(MDF(_fp).extract_bus_logging(_db_dict).to_dataframe())
if __name__ == '__main__':
fp, bus, db, ch = arg_parser()
sys.stdout.buffer.write(extract_bus_logging_wrapper(fp, bus, db, ch)) # main.py
try:
result = pickle.loads(
subprocess.check_output(
args=['extract_bus_logging_wrapper.exe', str('log.mf4'), str('CAN'), str('data_base.dbc'), str('2')]
)
)
except subprocess.CalledProcessError as e:
print(f'Error running {exe_path}: {e}')
Note: Calling the result = pickle.loads(extract_bus_logging_wrapper(str('log.mf4'), str('CAN'), str('data_base.dbc'), str('2'))) |
Why do you think you would get different result with pyinstaller? It's still python running |
I hoped that the .exe gets terminated for good once done with it's execution. Just like what one would manually do when terminating the execution in the IDE. Doing the latter releases the blocked memory. I am not that deep into the entire pyinstaller-game, though. |
I am running into the same issue, even when trying to explicitly close the files after being done with them. Each iteration uses about 200MB of RAM more when using mdf.extract_bus_logging(). Using tracemalloc shows line 7652 in mdf_v4.py using the memory, which is this snippet which goes directly to the cutils: vals = extract(signal_data, 1, vals - vals[0]) I tried deleting that variable after it is returned and not used anymore, however the memory was still not freed. |
Python version 3.11.7
When opening and closing multiple MF4 files, there is a memory leak, which causes asammdf to allocate a lot of memory, which is never freed.
Here you can find a tracemalloc output (tracing generates a lot of overhead, so only a few traces are shown).
The text was updated successfully, but these errors were encountered: