nptdms.TdmsFile crashes when opening large file #19

crlaugh · 2014-11-07T21:33:37Z

I have a very large TDMS file (220 Mb) that I am trying to open using nptdms; when I run a script which only includes the following lines, Python (or IPython) crashes most spectacularly.

import nptdms
inputFileNameString = "HAC-20141017-093246.tdms"
tdmsFile = nptdms.TdmsFile(inputFileNameString)

I can open smaller files without a problem using npTDMS, and can also open large files in other applications (more specifically, convertTDMS.m, available on the Mathworks Matlab user website at http://www.mathworks.com/matlabcentral/fileexchange/44206-converttdms--v10-). The resulting errors are listed below. Please let me know if this is a problem that can be fixed.

I am running Python 2.7.3, 32 bit, on a Windows 7 machine.

Traceback (most recent call last):
File "dataplot.py", line 5, in
myDataFrame = myFunc2.loadTDMSDataFrame(inputFileNameString)
File "C:\Folder\myFunc2.py", line 41, in loadTDMSDataFrame
tdmsFile = nptdms.TdmsFile(inputFileNameString)
File "C:\Python27\lib\site-packages\nptdms\tdms.py", line 148, in init
self._read_segments(tdms_file)
File "C:\Python27\lib\site-packages\nptdms\tdms.py", line 160, in _read_segments
previous_segment)
File "C:\Python27\lib\site-packages\nptdms\tdms.py", line 369, in read_metadata
segment_obj._read_metadata(f)
File "C:\Python27\lib\site-packages\nptdms\tdms.py", line 683, in _read_metadata
log.debug("Reading %d properties" % num_properties)
MemoryError

crlaugh · 2014-11-07T21:54:21Z

I just tried pyTDMS; that is able to successfully open the file, but there is so little documentation about the project that it would be lovely to use npTDMS instead.

adamreeve · 2014-11-09T08:37:47Z

Are you able to upload the file to somewhere like Dropbox for me to test? I haven't ever really looked at the memory consumption of npTDMS but I would have thought it could open a file that big. Does it work if you use the memmap file support, by passing a memmap_dir parameter to the TdmsFile initializer?

adamreeve · 2014-11-09T09:04:41Z

I did a small test using a 12 MB file, and loading that used a maximum of about 25 MB ram, so there's a bit of inefficiency there. Using the memmap_dir option reduced memory usage to 14 MB. I'm not sure how these results would scale to your larger file though, and the memory usage probably depends a lot on the file structure too. I'll do some more detailed profiling and see if I can find a way to reduce the memory usage.

adamreeve · 2014-11-09T09:17:50Z

So out of interest, I tested pyTDMS, and it used about 48 MB when reading my 12 MB file. So i'm surprised npTDMS crashes on your file but pyTDMS works...

crlaugh · 2014-11-09T15:02:13Z

Thanks for taking a look at this. I've uploaded the problematic datafile
to the following link:

http://1drv.ms/1zco4jB

Let me know if you have any problems accessing it.

On Sun, Nov 9, 2014 at 4:17 AM, Adam Reeve notifications@github.com wrote:

So out of interest, I tested pyTDMS, and it used about 48 MB when reading
my 12 MB file. So i'm surprised npTDMS crashes on your file but pyTDMS
works...

Reply to this email directly or view it on GitHub
#19 (comment).

Reading some TDMS files creates a lot of these objects, so try to reduce their memory usage. See issue #19.

adamreeve · 2014-11-10T00:42:24Z

I got the file thanks. Trying to load it with npTDMS used over 60% of my 8 GB of ram before I killed it. The above commit reduces the memory usage to 645 MB, which is still not great but is a big improvement. For comparison, pyTDMS uses 327 MB.

The problem is that the segments in your TDMS file alternate between three different structures, so each segment has a different set of objects to the previous segment. Because of the way npTDMS reads the file structure, a very large number of objects are allocated to represent every TDMS object in every segment. In many TDMS files, the segment structure repeats so the objects describing the segment structure can be reused.

That commit just reduces the amount of memory required by each of these objects. Ideally I could reduce the number of objects used, but that would require much bigger changes.

Gives a further small memory saving. See issue #19.

adamreeve · 2014-11-10T01:02:54Z

That second commit reduced memory usage a little bit further to 619 MB.

crlaugh · 2014-11-10T04:13:35Z

Thanks so much for addressing this - it appears to be working much better.
If you want to close this issue, that's okay with me (unless I have to do
it - I haven't done this with github much). I'll try converting some
other large files this week and will reopen this topic if need be.

Thanks again for fixing it so quickly - it is very much appreciated.

c.

On Sun, Nov 9, 2014 at 8:02 PM, Adam Reeve notifications@github.com wrote:

That second commit reduced memory usage a little bit further to 619 MB.

Reply to this email directly or view it on GitHub
#19 (comment).

adamreeve · 2014-11-10T04:52:50Z

Ok thanks, I'll close it for now. The memory usage is still not ideal but hopefully it will be good enough to use with your files now.

adamreeve · 2014-11-10T06:39:59Z

By the way, there's a TDMS defragment VI in LabView that should clean up your TDMS files and make them much faster to read: http://zone.ni.com/reference/en-XX/help/371361H-01/glang/tdms_defrag/. I'm not sure whether this would be useful for you but thought it's worth mentioning.

adamreeve added a commit that referenced this issue Nov 10, 2014

Apply some memory optimisations to the _TdmsSegmentObject class

56c8ae9

Reading some TDMS files creates a lot of these objects, so try to reduce their memory usage. See issue #19.

adamreeve added a commit that referenced this issue Nov 10, 2014

Use __slots__ for the _TdmsSegment class

b9f4d8e

Gives a further small memory saving. See issue #19.

adamreeve closed this as completed Nov 10, 2014

adamreeve mentioned this issue Feb 12, 2016

lazy loading of raw data #43

Closed

adamreeve mentioned this issue Feb 25, 2020

Avoid unnecessary copies of previous segment objects #156

Merged

adamreeve mentioned this issue Apr 9, 2020

Improve performance of reading segment metadata #181

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nptdms.TdmsFile crashes when opening large file #19

nptdms.TdmsFile crashes when opening large file #19

crlaugh commented Nov 7, 2014

crlaugh commented Nov 7, 2014

adamreeve commented Nov 9, 2014

adamreeve commented Nov 9, 2014

adamreeve commented Nov 9, 2014

crlaugh commented Nov 9, 2014

adamreeve commented Nov 10, 2014

adamreeve commented Nov 10, 2014

crlaugh commented Nov 10, 2014

adamreeve commented Nov 10, 2014

adamreeve commented Nov 10, 2014

nptdms.TdmsFile crashes when opening large file #19

nptdms.TdmsFile crashes when opening large file #19

Comments

crlaugh commented Nov 7, 2014

crlaugh commented Nov 7, 2014

adamreeve commented Nov 9, 2014

adamreeve commented Nov 9, 2014

adamreeve commented Nov 9, 2014

crlaugh commented Nov 9, 2014

adamreeve commented Nov 10, 2014

adamreeve commented Nov 10, 2014

crlaugh commented Nov 10, 2014

adamreeve commented Nov 10, 2014

adamreeve commented Nov 10, 2014