Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TdmsFile.read() leaks memory #199

Closed
Achilles1515 opened this issue May 12, 2020 · 3 comments · Fixed by #201
Closed

TdmsFile.read() leaks memory #199

Achilles1515 opened this issue May 12, 2020 · 3 comments · Fixed by #201

Comments

@Achilles1515
Copy link
Contributor

Achilles1515 commented May 12, 2020

Something seems fishy with memory usage of the TdmsFile.read() method. It appears to do as it states - load the entire TDMS file in memory - but the memory doesn't appear to be released (at least in any sort of timely fashion) afterward.

Take this simple example:

from nptdms import TdmsFile

count = 0
path = "...path to some 66 MB TDMS file"
while count < 10:
    tdms_file = TdmsFile.read(path)
    tdms_file.close()	# pretty sure this does nothing here
    count += 1

time.sleep(10)

After the loop has complete, memory usage of the Python script is 460 MB.
Now, manually clearing the channel._raw_data.data value will net the [more] expected memory usage of the script:

from nptdms import TdmsFile

count = 0
path = "...path to some 66 MB TDMS file"
while count < 10:
    tdms_file = TdmsFile.read(path)
    for group in tdms_file.groups():
        for channel in group.channels():
            channel._raw_data.data = None  # <-- HERE
    tdms_file.close()
    count += 1

time.sleep(10)

Memory usage at end of script = 17 MB

Though, simply accessing the channel data increases memory usage back up to 460 MB (the data is kept in cached properties?).

from nptdms import TdmsFile

count = 0
path = "...path to some 66 MB TDMS file"
while count < 10:
    tdms_file = TdmsFile.read(path)
    for group in tdms_file.groups():
        for channel in group.channels():
            z = channel[:]  # <-- HERE
            channel._raw_data.data = None
    tdms_file.close()
    count += 1

time.sleep(10)

Manually calling gc.collect() immediately after the while loop does free some memory, but the script memory usage is still ~80 MB afterwards (one TDMS file data, I suppose).

How are we supposed to be freeing memory here using TdmsFile.read()? In all these examples, the tdms_file variable is re-purposed inside the while loop, yet the old tdms_file memory is not being freed once exited or between loop iterations. My Python scripts are consistently running out of memory whenever I try to use TdmsFile.read() while looping over a number of TDMS files of decent size.

@adamreeve
Copy link
Owner

There are some circular references between TdmsFile and TdmsChannel which prevent Python from immediately freeing memory when a TdmsFile goes out of scope. Using weakrefs here might help but could also break some code. I think it should be possible to refactor to fix this and remove the circular references.

Until then calling gc.collect would be the best option if you need to free memory immediately.

Manually calling gc.collect() immediately after the while loop does free some memory, but the script memory usage is still ~80 MB afterwards (one TDMS file data, I suppose).

80 MB sounds about right. As you say, calling close doesn't have any effect when using TdmsFile.read, that's there to close file handles when using TdmsFile.open, so there will still be the memory used by one file.

Though, simply accessing the channel data increases memory usage back up to 460 MB (the data is kept in cached properties?).

Yes, accessing the data causes it to be cached internally, you'd also need to set _cached_prop_data to None if data has been accessed. This is because sometimes scaling is applied to the raw data, and the scaled data is then cached.

adamreeve added a commit that referenced this issue May 19, 2020
Fixes #199. This circular reference meant that when a TdmsFile went out of scope the memory it used wasn't immediately freed but was only freed after a GC collection.
@Achilles1515
Copy link
Contributor Author

Achilles1515 commented May 19, 2020

@adamreeve Thanks for the fix. Memory does appear to be freed in a timely fashion as expected.

Though with these new changes, the following code now throws the following exception:
AttributeError: 'NoneType' object has no attribute 'close'

tdms_file = TdmsFile.read(path)
tdms_file.close()

As previously mentioned, the tdms_file.close() doesn't really do anything with TdmsFile.read(). But this is technically a breaking change since it previously didn't throw an exception.

The problem is in the following code:

class TdmsReader(object):
    # ...

    def close(self):
        if self._file_path is not None:
            # File path was provided so we opened the file and
            # should close it.
            self._file.close()
        # Otherwise always remove reference to the file
        self._file = None

With the snippet I provided, self._file_path is not None, but self._file is None.
Can another check be added to ensure self._file is not None?

@adamreeve
Copy link
Owner

Ah sorry about that, yes it wasn't intentional to change that behaviour, I've pushed a fix for this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants