Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nptdms.TdmsFile crashes when opening large file #19

Closed
crlaugh opened this issue Nov 7, 2014 · 10 comments
Closed

nptdms.TdmsFile crashes when opening large file #19

crlaugh opened this issue Nov 7, 2014 · 10 comments

Comments

@crlaugh
Copy link

crlaugh commented Nov 7, 2014

I have a very large TDMS file (220 Mb) that I am trying to open using nptdms; when I run a script which only includes the following lines, Python (or IPython) crashes most spectacularly.

import nptdms
inputFileNameString = "HAC-20141017-093246.tdms"
tdmsFile = nptdms.TdmsFile(inputFileNameString)

I can open smaller files without a problem using npTDMS, and can also open large files in other applications (more specifically, convertTDMS.m, available on the Mathworks Matlab user website at http://www.mathworks.com/matlabcentral/fileexchange/44206-converttdms--v10-). The resulting errors are listed below. Please let me know if this is a problem that can be fixed.

I am running Python 2.7.3, 32 bit, on a Windows 7 machine.

Traceback (most recent call last):
File "dataplot.py", line 5, in
myDataFrame = myFunc2.loadTDMSDataFrame(inputFileNameString)
File "C:\Folder\myFunc2.py", line 41, in loadTDMSDataFrame
tdmsFile = nptdms.TdmsFile(inputFileNameString)
File "C:\Python27\lib\site-packages\nptdms\tdms.py", line 148, in init
self._read_segments(tdms_file)
File "C:\Python27\lib\site-packages\nptdms\tdms.py", line 160, in _read_segments
previous_segment)
File "C:\Python27\lib\site-packages\nptdms\tdms.py", line 369, in read_metadata
segment_obj._read_metadata(f)
File "C:\Python27\lib\site-packages\nptdms\tdms.py", line 683, in _read_metadata
log.debug("Reading %d properties" % num_properties)
MemoryError

@crlaugh
Copy link
Author

crlaugh commented Nov 7, 2014

I just tried pyTDMS; that is able to successfully open the file, but there is so little documentation about the project that it would be lovely to use npTDMS instead.

@adamreeve
Copy link
Owner

Are you able to upload the file to somewhere like Dropbox for me to test? I haven't ever really looked at the memory consumption of npTDMS but I would have thought it could open a file that big. Does it work if you use the memmap file support, by passing a memmap_dir parameter to the TdmsFile initializer?

@adamreeve
Copy link
Owner

I did a small test using a 12 MB file, and loading that used a maximum of about 25 MB ram, so there's a bit of inefficiency there. Using the memmap_dir option reduced memory usage to 14 MB. I'm not sure how these results would scale to your larger file though, and the memory usage probably depends a lot on the file structure too. I'll do some more detailed profiling and see if I can find a way to reduce the memory usage.

@adamreeve
Copy link
Owner

So out of interest, I tested pyTDMS, and it used about 48 MB when reading my 12 MB file. So i'm surprised npTDMS crashes on your file but pyTDMS works...

@crlaugh
Copy link
Author

crlaugh commented Nov 9, 2014

Thanks for taking a look at this. I've uploaded the problematic datafile
to the following link:

http://1drv.ms/1zco4jB

Let me know if you have any problems accessing it.

On Sun, Nov 9, 2014 at 4:17 AM, Adam Reeve notifications@github.com wrote:

So out of interest, I tested pyTDMS, and it used about 48 MB when reading
my 12 MB file. So i'm surprised npTDMS crashes on your file but pyTDMS
works...

Reply to this email directly or view it on GitHub
#19 (comment).

adamreeve added a commit that referenced this issue Nov 10, 2014
Reading some TDMS files creates a lot of these objects, so try to reduce
their memory usage.

See issue #19.
@adamreeve
Copy link
Owner

I got the file thanks. Trying to load it with npTDMS used over 60% of my 8 GB of ram before I killed it. The above commit reduces the memory usage to 645 MB, which is still not great but is a big improvement. For comparison, pyTDMS uses 327 MB.

The problem is that the segments in your TDMS file alternate between three different structures, so each segment has a different set of objects to the previous segment. Because of the way npTDMS reads the file structure, a very large number of objects are allocated to represent every TDMS object in every segment. In many TDMS files, the segment structure repeats so the objects describing the segment structure can be reused.

That commit just reduces the amount of memory required by each of these objects. Ideally I could reduce the number of objects used, but that would require much bigger changes.

adamreeve added a commit that referenced this issue Nov 10, 2014
Gives a further small memory saving.

See issue #19.
@adamreeve
Copy link
Owner

That second commit reduced memory usage a little bit further to 619 MB.

@crlaugh
Copy link
Author

crlaugh commented Nov 10, 2014

Thanks so much for addressing this - it appears to be working much better.
If you want to close this issue, that's okay with me (unless I have to do
it - I haven't done this with github much). I'll try converting some
other large files this week and will reopen this topic if need be.

Thanks again for fixing it so quickly - it is very much appreciated.

c.

On Sun, Nov 9, 2014 at 8:02 PM, Adam Reeve notifications@github.com wrote:

That second commit reduced memory usage a little bit further to 619 MB.

Reply to this email directly or view it on GitHub
#19 (comment).

@adamreeve
Copy link
Owner

Ok thanks, I'll close it for now. The memory usage is still not ideal but hopefully it will be good enough to use with your files now.

@adamreeve
Copy link
Owner

By the way, there's a TDMS defragment VI in LabView that should clean up your TDMS files and make them much faster to read: http://zone.ni.com/reference/en-XX/help/371361H-01/glang/tdms_defrag/. I'm not sure whether this would be useful for you but thought it's worth mentioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants