Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Data size is not a multiple of the chunk size #40

Closed
Nodd opened this issue Nov 4, 2015 · 18 comments
Closed

ValueError: Data size is not a multiple of the chunk size #40

Nodd opened this issue Nov 4, 2015 · 18 comments

Comments

@Nodd
Copy link

Nodd commented Nov 4, 2015

I'm trying to open a 4.4GB tdms file, and I get the following error from tdmsinfo:

Traceback (most recent call last):
  File "/usr/bin/tdmsinfo", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python3.5/site-packages/nptdms/tdmsinfo.py", line 26, in main
    tdmsfile = tdms.TdmsFile(args.tdms_file)
  File "/usr/lib/python3.5/site-packages/nptdms/tdms.py", line 153, in __init__
    self._read_segments(tdms_file)
  File "/usr/lib/python3.5/site-packages/nptdms/tdms.py", line 166, in _read_segments
    tdms_file, self.objects, previous_segment)
  File "/usr/lib/python3.5/site-packages/nptdms/tdms.py", line 456, in read_metadata
    self.calculate_chunks()
  File "/usr/lib/python3.5/site-packages/nptdms/tdms.py", line 485, in calculate_chunks
    "chunk size %d" % (total_data_size, data_size))
ValueError: Data size 4435200000 is not a multiple of the chunk size 4000256

I saw that there are closed issues with the same error, but I just installed v0.7.1 so I use the last version.
The error appear on multiple files. Those files can be read with a LabView program so they are valid. Also LabView didn't crash while writing the files.

I'm on Arch Linux 64 bits with python 3.5 with 32 GB of RAM.

@Nodd
Copy link
Author

Nodd commented Nov 4, 2015

with -d and -p options, there is more information:

DEBUG:nptdms.tdms:Reading segment at 0
DEBUG:nptdms.tdms:Property kTocDAQmxRawData is False
DEBUG:nptdms.tdms:Property kTocRawData is True
DEBUG:nptdms.tdms:Property kTocNewObjList is True
DEBUG:nptdms.tdms:Property kTocInterleavedData is False
DEBUG:nptdms.tdms:Property kTocBigEndian is False
DEBUG:nptdms.tdms:Property kTocMetaData is True
DEBUG:nptdms.tdms:Reading metadata at 28
DEBUG:nptdms.tdms:Creating a new segment object
DEBUG:nptdms.tdms:Reading metadata for object /
DEBUG:nptdms.tdms:Object has no data in this segment
DEBUG:nptdms.tdms:Reading 1 properties
DEBUG:nptdms.tdms:Property name (tdsTypeString): testcase7_manual_smb100_trigger_500MCarrier_CH0
DEBUG:nptdms.tdms:Creating a new segment object
DEBUG:nptdms.tdms:Reading metadata for object /'Group Name'
DEBUG:nptdms.tdms:Object has no data in this segment
DEBUG:nptdms.tdms:Reading 0 properties
DEBUG:nptdms.tdms:Creating a new segment object
DEBUG:nptdms.tdms:Reading metadata for object /'Group Name'/'Channel Name'
DEBUG:nptdms.tdms:Object data type: tdsTypeI16
DEBUG:nptdms.tdms:Object number of values in segment: 2000128
DEBUG:nptdms.tdms:Reading 0 properties
INFO:nptdms.tdms:Read metadata: Took 0.9210000000000051 ms
Traceback (most recent call last):
  File "/usr/bin/tdmsinfo", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python3.5/site-packages/nptdms/tdmsinfo.py", line 26, in main
    tdmsfile = tdms.TdmsFile(args.tdms_file)
  File "/usr/lib/python3.5/site-packages/nptdms/tdms.py", line 153, in __init__
    self._read_segments(tdms_file)
  File "/usr/lib/python3.5/site-packages/nptdms/tdms.py", line 166, in _read_segments
    tdms_file, self.objects, previous_segment)
  File "/usr/lib/python3.5/site-packages/nptdms/tdms.py", line 456, in read_metadata
    self.calculate_chunks()
  File "/usr/lib/python3.5/site-packages/nptdms/tdms.py", line 485, in calculate_chunks
    "chunk size %d" % (total_data_size, data_size))
ValueError: Data size 4435200000 is not a multiple of the chunk size 4000256

@Nodd
Copy link
Author

Nodd commented Nov 4, 2015

I was able to read the file simply by not raising the error:

        if total_data_size % data_size != 0:
            #raise ValueError(
            print(
                "Data size %d is not a multiple of the "
                "chunk size %d" % (total_data_size, data_size))
        #else:
        self.num_chunks = total_data_size // data_size

@adamreeve
Copy link
Owner

Hmm that's odd. Can you tell if you're missing any data at the beginning or end of the file compared to what LabView reads? The structure of the file looks pretty straightforward, so I'm not sure what's going wrong here. Does this only happen with very large files?

@Nodd
Copy link
Author

Nodd commented Nov 7, 2015

I only have large files :(
I'm missing some data at the end of the file.

@adamreeve
Copy link
Owner

Any chance you can upload it somewhere so I can have a look? If you could generate a smaller file that showed the same issue that would be awesome.

@Delicate-aRt
Copy link

Hi guys,

Have same issue here:

b = TdmsFile('small_tdms_file.tdms')
Data size 1075200 is not a multiple of the chunk size 2293760

File is pretty small, about 1Mb. You may download it here.

@adamreeve
Copy link
Owner

Thanks, I can reproduce the error with that file so hopefully will be able to figure out what's going on. The metadata says this file should have 448 channels (/Measurement/0 to /Measurement/447), each with 1280 single-precision (32 bit) floating point values, but the file size is less than what that would total (the 2293760 byte chunk size) so that can't be right. Do you know how many values should be in each channel?

@Delicate-aRt
Copy link

I was provided only with file and kind of custom python parser that tries to read it.

After reviewing the code and intermediate result - you are right, there should be 448x1280 values, but in fact there is only 448x600 values.

I have no idea whether there should be more or less. And believe me, I completely understand how weird this sounds to you :)

Should I consider this as corrupted tdms file? Or something that can be handled somehow?

@Nodd
Copy link
Author

Nodd commented Apr 6, 2016

It looks like the last chunk is not complete it the actual data is smaller that the chunk size. There is no padding to match the chunk size.
I don't know if it's legal, but it looks like it can happen easily.

@adamreeve
Copy link
Owner

Yeah it sounds like something that npTDMS should handle, although I couldn't find any mention of this being valid in NI's documentation and how exactly to determine which channels have data and how many values are in each channel if the size is less than expected. Hopefully we can just keep reading as normal up until hitting the actual end of the segment.

@adamreeve
Copy link
Owner

I haven't had a lot of time to look into this but have made a bit of progress and the changes in this branch should work for interleaved data, non-interleaved will be a bit more work: https://github.com/adamreeve/npTDMS/tree/not-multiple-chunk-size.

@Delicate-aRt, your file is interleaved but @Nodd's isn't.

@Delicate-aRt
Copy link

@adamreeve you are right: it is in interleaved mode. Sorry for not providing that details, TDMS is something completely new for me.

I'll check whether it works now and let you know!

@adamreeve
Copy link
Owner

@Nodd, are you able to test out the latest code on that not-multiple-chunk-size branch and let me know if that works and the data looks correct? I think it should work fine for you as your file only has one channel, but I'm not sure if it would be correct if the same issue happened in a file with multiple channels.

@fgroes
Copy link

fgroes commented Oct 25, 2016

hi, I have the same issue, I also fixed it temporarily by commenting out the exception and handling the case, that there is only one chunk, which is not long enough

I now tested your fix and it works for my data, please merge the changes into the next release

unfortunately I can't provide you with the date, because it belongs to a customer of ours

@adamreeve
Copy link
Owner

Thanks @fgroes, I've just released 0.8.2 with this fix in it.

@nmgeek
Copy link
Contributor

nmgeek commented Feb 8, 2019

I saw this same problem, today, with version 0.13.0. Could the bug have regressed?
Our LabView application can read the file without error. The data is interleaved.

@nmgeek
Copy link
Contributor

nmgeek commented Feb 8, 2019

It's not a regression. The problem was not fixed in 0.8.2 either ... for my TDMS file.
Here is the output from v0.8.2:

WARNING:nptdms.tdms:Data size 348793088 is not a multiple of the chunk size 2000000. Will attempt to read last chunk
Traceback (most recent call last):
  File "nptdms_test_env/bin/tdmsinfo", line 10, in <module>
    sys.exit(main())
  File "nptdms_test_env/local/lib/python2.7/site-packages/nptdms/tdmsinfo.py", line 26, in main
    tdmsfile = tdms.TdmsFile(args.tdms_file)
  File "nptdms_test_env/local/lib/python2.7/site-packages/nptdms/tdms.py", line 169, in __init__
    self._read_segments(tdms_file)
  File "nptdms_test_env/local/lib/python2.7/site-packages/nptdms/tdms.py", line 199, in _read_segments
    segment.read_raw_data(tdms_file)
  File "nptdms_test_env/local/lib/python2.7/site-packages/nptdms/tdms.py", line 521, in read_raw_data
    self._read_interleaved_numpy(f, data_objects, endianness)
  File "nptdms_test_env/local/lib/python2.7/site-packages/nptdms/tdms.py", line 552, in _read_interleaved_numpy
    combined_data = combined_data.reshape(-1, all_channel_bytes)
ValueError: cannot reshape array of size 793088 into shape (40)

@nmgeek
Copy link
Contributor

nmgeek commented Feb 8, 2019

The traceback produced by v0.13.0 is a bit different:

WARNING:nptdms.tdms:Data size 348793088 is not a multiple of the chunk size 2000000. Will attempt to read last chunk
Traceback (most recent call last):
  File "/usr/local/bin/tdmsinfo", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/nptdms/tdmsinfo.py", line 26, in main
    tdmsinfo(args.tdms_file, args.properties)
  File "/usr/local/lib/python2.7/dist-packages/nptdms/tdmsinfo.py", line 30, in tdmsinfo
    tdmsfile = tdms.TdmsFile(file)
  File "/usr/local/lib/python2.7/dist-packages/nptdms/tdms.py", line 94, in __init__
    self._read_segments(f)
  File "/usr/local/lib/python2.7/dist-packages/nptdms/tdms.py", line 124, in _read_segments
    segment.read_raw_data(f)
  File "/usr/local/lib/python2.7/dist-packages/nptdms/tdms.py", line 512, in read_raw_data
    self._read_interleaved_numpy(f, data_objects)
  File "/usr/local/lib/python2.7/dist-packages/nptdms/tdms.py", line 546, in _read_interleaved_numpy
    combined_data = combined_data.reshape(-1, all_channel_bytes)
ValueError: total size of new array must be unchanged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants