Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in reading .dat file #124

Closed
drhannah94 opened this issue Feb 7, 2019 · 30 comments
Closed

Bug in reading .dat file #124

drhannah94 opened this issue Feb 7, 2019 · 30 comments
Labels

Comments

@drhannah94
Copy link

Python version: 3.6.1

from asammdf import MDF

def merge_mdfs(files):
    return MDF.merge(files).resample(0.1)

def main():
    # Here I have code to read in a series of '.dat' files as a list into the variable 'files'
    merge_file = merge_mdfs(files)

    # The variable 'merge_file' contains channel names with the corrupt data

Code

MDF version

4.7.8

Code snippet

return MDF.merge(files).resample(0.1)

Traceback

This code doesn't produce an error or a traceback

Description

Let me preface by saying that I am relatively new to python and even newer to this library, but I understand the importance of collaboration so I will try my best to help. Since I work for a big company, I can't share exactly everything I am working on, especially the data files I am using. However, I will do my best to supply all available information I can to resolve this issue.

The issue I'm having is that when I read in a series of '.dat' files, sometimes the data gets read in perfectly, but other times the data gets all messed up and values that were not in the original data find their way in.

Example: I am reading in acceleration data from an accelerometer. The min and max values of this data trace are confirmed by one of the other tools my company uses to plot data to be max: ~6.5, min: ~-1.5 (units = m/s^2). When I read a series of these same files in I get a max of the same value and a min of ~-13 m/s^2. When I go in and look at the data there are more data points than their should be, and the data doesn't flow like what I would expect to see (i.e. a lot of repeating values).

Please let me know if anyone needs more information to help solve this issue. I will try my best to supply any additional information requested.

Thanks for supporting this awesome library! :)

@danielhrisca
Copy link
Owner

Hello @drhannah94 ,
I appreciate your effort; thank you!

I have some questions:

  1. if you open the files individually and plot the problematic channels do you still see wrong values?
MDF(filename).get(channelname).plot()
  1. what is the mdf version of your files?
print(MDF(filename).version
  1. how does the problematic channel look like?
mdf = MDF(filename)
for group_index, channel_index in mdf.whereis(channelname):
    print(mdf.groups[group_index]['channels'][channel_index])

@drhannah94
Copy link
Author

Hey @danielhrisca! Thanks for getting back to me so fast! Here is the output from the information you requested. I also attached a plot of what the actual data should look like.

*Note that the data in the Excel plot is sampled at 250 ms so the x-axis scale is different.

    print(MDF(files[0]).version) # --> 3.00 - version of the original files
    MDF(files[0]).get('Accel_Chasis_X').plot() # --> Figure 1
    merge_file = MDF.merge(files)
    print(merge_file.version) # --> 4.10 - version of the merged files
    merge_file.get('Accel_Chasis_X').plot() # --> Figure 2
    merge_file.resample(0.1)
    merge_file.get('Accel_Chasis_X').plot() # --> Figure 3

    for group_index, channel_index in merge_file.whereis('Accel_Chasis_X'):
        print(merge_file.groups[group_index]['channels'][channel_index])

    # Output from for loop:
    # <Channel (name: Accel_Chasis_X\ES650 / AD/Thermo:1, unit: m/s^2, comment: , address: 0x0,
    # conversion: <ChannelConversion (name: , unit: m/s^2, comment: , formula: , referenced blocks: None, address: 0x0, fields: {'id': b'##CC', 'reserved0': 0, 'block_len': 96, 'links_nr': 4, 'name_addr': 0, 'unit_addr': 0, 'comment_addr': 0, 'inv_conv_addr': 0, 'conversion_type': 1, 'precision': 1, 'flags': 0, 'ref_param_nr': 0, 'val_param_nr': 2, 'min_phy_value': 0.0, 'max_phy_value': 0.0, 'b': -24.608724153475002, 'a': 9.80665e-06})>,
    # source: None,
    # fields: {'id': b'##CN', 'reserved0': 0, 'block_len': 160, 'links_nr': 8, 'next_ch_addr': 0, 'component_addr': 0, 'name_addr': 0, 'source_addr': 0, 'conversion_addr': 0, 'data_block_addr': 0, 'unit_addr': 0, 'comment_addr': 0, 'channel_type': 0, 'sync_type': 0, 'data_type': 2, 'bit_offset': 0, 'byte_offset': 16, 'bit_count': 32, 'flags': 0, 'pos_invalidation_bit': 0, 'precision': 3, 'reserved1': 0, 'attachment_nr': 0, 'min_raw_value': 0, 'max_raw_value': 0, 'lower_limit': 0, 'upper_limit': 0, 'lower_ext_limit': 0, 'upper_ext_limit': 0})>

Figure 1: Output First File Before Merge (Looks correct)
accel_chasis_x plot from first file
Figure 2: Output of Corrupt Channel From Merge
accel_chasis_x resampled 100ms
Figure 3: Output of Corrupt Channel From Merge Resampled at 100ms
accel_chasis_x
Figure 4: Plot of What Acceleration Data is Supposed to Look Like
accel_chasis_x actual plot in excel

@danielhrisca
Copy link
Owner

danielhrisca commented Feb 7, 2019

What is the output of the snippet?

mdf = MDF(files[0])
group_index, _ = mdf.whereis(channelname)[0]
group = mdf.groups[group_index]
mdf._prepare_record(group)
print(*group['types'], sep='\n')

@danielhrisca
Copy link
Owner

Please run the snippet from point 3 using one of the original files (version 3.00)

mdf = MDF(files[0])
for group_index, channel_index in mdf.whereis(channelname):
    print(mdf.groups[group_index]['channels'][channel_index])

@drhannah94
Copy link
Author

I am getting an error when trying to print '*group['types']'

mdf = MDF(files[0])
group_index, _ = mdf.whereis('Accel_Chasis_X')[0]
group = mdf.groups[group_index]
mdf._prepare_record(group)
print(*group['types'], sep='\n') # --> Error Here

Output

File "My Directory", line 247, in merge_mdfs
print(*group['types'], sep='\n')
KeyError: 'types'

mdf = MDF(files[0])
for group_index, channel_index in mdf.whereis('Accel_Chasis_X'):
    print(mdf.groups[group_index]['channels'][channel_index])

Output

Channel (name: Accel_Chasis_X\ES650 / AD/Thermo:1, display name: Accel_Chasis_X\ES650 / AD/Thermo:1, comment: , address: 0x2406, fields: {'id': b'CN', 'block_len': 228, 'next_ch_addr': 0, 'conversion_addr': 5826, 'source_addr': 0, 'ch_depend_addr': 0, 'comment_addr': 9144, 'channel_type': 0, 'short_name': b'Accel_Chasis_X\ES650 / AD/Thermo', 'description': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'start_offset': 128, 'bit_count': 32, 'data_type': 1, 'range_flag': 0, 'min_raw_value': 0.0, 'max_raw_value': 0.0, 'sampling_rate': 0.001, 'long_name_addr': 2187, 'display_name_addr': 9149, 'aditional_byte_offset': 0})

Channel (name: Accel_Chasis_X\ES650 / AD/Thermo:1, display name: Accel_Chasis_X\ES650 / AD/Thermo:1, comment: , address: 0x2406, fields: {'id': b'CN', 'block_len': 228, 'next_ch_addr': 0, 'conversion_addr': 5826, 'source_addr': 0, 'ch_depend_addr': 0, 'comment_addr': 9144, 'channel_type': 0, 'short_name': b'Accel_Chasis_X\ES650 / AD/Thermo', 'description': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'start_offset': 128, 'bit_count': 32, 'data_type': 1, 'range_flag': 0, 'min_raw_value': 0.0, 'max_raw_value': 0.0, 'sampling_rate': 0.001, 'long_name_addr': 2187, 'display_name_addr': 9149, 'aditional_byte_offset': 0})

@danielhrisca
Copy link
Owner

Right, please run the snippet again without the asterix

print(group["types"])

@drhannah94
Copy link
Author

For some reason there is no key 'types'.

image

@danielhrisca
Copy link
Owner

You need to call _prepare_record first

@drhannah94
Copy link
Author

I'm still calling '_prepare_record' first.

mdf = MDF(files[0])
group_index, _ = mdf.whereis('Accel_Chasis_X')[0]
group = mdf.groups[group_index]
mdf._prepare_record(group)
print(group['types'], sep='\n')

group: Before _prepare_record

image

group: After _prepare_record

image

@danielhrisca
Copy link
Owner

This should finally be the correct snippet

mdf = MDF(files[0])
group_index, _ = mdf.whereis('Accel_Chasis_X')[0]
group = mdf.groups[group_index]
print(mdf._prepare_record(group))

@drhannah94
Copy link
Author

Output:

({0: ('time', 0), 1: ('Accel_Vertical_Wheel\ES650 / AD/Thermo:1', 0), 2: ('Accel_Chasis_Y\ES650 / AD/Thermo:1', 0), 3: ('Accel_Chasis_X\ES650 / AD/Thermo:1', 0)}, dtype([('time', '<f8'), ('Accel_Vertical_Wheel\ES650 / AD/Thermo:1', '<i4'), ('Accel_Chasis_Y\ES650 / AD/Thermo:1', '<i4'), ('Accel_Chasis_X\ES650 / AD/Thermo:1', '<i4')]))

@danielhrisca
Copy link
Owner

There is nothing really interesting in the print.

Let's try this:

mdf = MDF(files[0])
print(mdf.info())

@drhannah94
Copy link
Author

Output:

{'author': 'rdo9fe', 'department': '', 'project': '', 'subject': '', 'version': '3.00', 'groups': 26, 'group 0': {'cycles': 6, 'comment': 'AC', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="AC\CAN-Monitoring:1" type=value'}, 'group 1': {'cycles': 2671, 'comment': 'Accelerometer', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Accelerometer\CAN-Monitoring:1" type=value'}, 'group 2': {'cycles': 0, 'comment': 'Brake', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Brake\CAN-Monitoring:1" type=value'}, 'group 3': {'cycles': 2668, 'comment': 'Brake_Position', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Brake_Position\CAN-Monitoring:1" type=value'}, 'group 4': {'cycles': 0, 'comment': 'Message_HS_CAN_2701', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="TCC\CAN-Monitoring:1" type=value'}, 'group 5': {'cycles': 0, 'comment': 'Message_HS_CAN_33', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="gear_ratio1\CAN-Monitoring:1" type=value'}, 'group 6': {'cycles': 2668, 'comment': 'Vehicle_Speed', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="VehicleSpeed_Filtered\CAN-Monitoring:1" type=value'}, 'group 7': {'cycles': 2671, 'comment': 'Wheel_Speed', 'channels count': 5, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Wheel_Speed_RR\CAN-Monitoring:1" type=value', 'channel 2': 'name="Wheel_Speed_FR\CAN-Monitoring:1" type=value', 'channel 3': 'name="Wheel_Speed_FL\CAN-Monitoring:1" type=value', 'channel 4': 'name="Wheel_Speed_RL\CAN-Monitoring:1" type=value'}, 'group 8': {'cycles': 536, 'comment': 'Brake_bit', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Brake_bit\CAN-Monitoring:2" type=value'}, 'group 9': {'cycles': 55, 'comment': 'Drivemode', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="DriveMode\CAN-Monitoring:2" type=value'}, 'group 10': {'cycles': 79, 'comment': 'Gears1', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="CommandedGear\CAN-Monitoring:2" type=value'}, 'group 11': {'cycles': 5336, 'comment': 'Message_HS_CAN_27', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Unmanaged_Torque\CAN-Monitoring:2" type=value'}, 'group 12': {'cycles': 2668, 'comment': 'Message_HS_CAN_29', 'channels count': 3, 'channel 0': 'name="time" type=master', 'channel 1': 'name="EngineSpeed_leading_smooth\CAN-Monitoring:2" type=value', 'channel 2': 'name="DFCO\CAN-Monitoring:2" type=value'}, 'group 13': {'cycles': 5336, 'comment': 'Pedal', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Pedal_lagging_sharp\CAN-Monitoring:2" type=value'}, 'group 14': {'cycles': 5336, 'comment': 'Pedal2', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Demanded_or_available_acceleration\CAN-Monitoring:2" type=value'}, 'group 15': {'cycles': 536, 'comment': 'Selector_Lever', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Selector_Lever\CAN-Monitoring:2" type=value'}, 'group 16': {'cycles': 5336, 'comment': 'Torque_1', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Managed_Torque\CAN-Monitoring:2" type=value'}, 'group 17': {'cycles': 373, 'comment': 'Torque_4', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Torque_higher\CAN-Monitoring:2" type=value'}, 'group 18': {'cycles': 2668, 'comment': 'Vehicle_Speed', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="VehicleSpeed_Filtered\CAN-Monitoring:2" type=value'}, 'group 19': {'cycles': 90, 'comment': 'gears', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Gears_slow\CAN-Monitoring:2" type=value'}, 'group 20': {'cycles': 5336, 'comment': 'm0a0', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="TurbineSpeed\CAN-Monitoring:2" type=value'}, 'group 21': {'cycles': 536, 'comment': 'vGrp_100ms', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="INCA_timestamp" type=value'}, 'group 22': {'cycles': 53330, 'comment': '1ms', 'channels count': 4, 'channel 0': 'name="time" type=master', 'channel 1': 'name="Accel_Vertical_Wheel\ES650 / AD/Thermo:1" type=value', 'channel 2': 'name="Accel_Chasis_Y\ES650 / AD/Thermo:1" type=value', 'channel 3': 'name="Accel_Chasis_X\ES650 / AD/Thermo:1" type=value'}, 'group 23': {'cycles': 1, 'comment': 'VGEvent', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="$EVENT_COMMENTS" type=value'}, 'group 24': {'cycles': 0, 'comment': 'VGPause', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel
1': 'name="$PAUSE_COMMENTS" type=value'}, 'group 25': {'cycles': 0, 'comment': 'VGSnapshot', 'channels count': 2, 'channel 0': 'name="time" type=master', 'channel 1': 'name="$SNAPSHOT" type=value'}}

@danielhrisca
Copy link
Owner

please try to install the Py3 branch and check if you still have errors

pip install -I --no-deps https://github.com/danielhrisca/asammdf/archive/Py3.zip

@drhannah94
Copy link
Author

Py3 Branch Successfully Installed

image

After the install, I am still seeing the same errors.

print(MDF(files[0]).version) # --> 3.00
MDF(files[0]).get('Accel_Chasis_X').plot() # --> Figure 1
merge_file = MDF.concatenate(files)
print(merge_file.version) # --> 4.10
merge_file.get('Accel_Chasis_X').plot() # --> Figure 2
merge_file.resample(0.1)
merge_file.get('Accel_Chasis_X').plot() # --> Figure 3

Figure 1
image

Figure2
image

Figure 3
image

@danielhrisca
Copy link
Owner

I understand. Please share the output of print(mdf.info()) for all the original files

@drhannah94
Copy link
Author

So when I was running my code...

for file in files:
    print(MDF(file).info())

I was getting an error here in mdf_v3.py in the 'info' function.

value = self.header[key].decode("latin-1").strip(" \n\t\0") # Line 3058

Traceback:
AttributeError: 'str' object has no attribute 'decode'

I changed it to this to get the output...

try:
    value = self.header[key].decode("latin-1").strip(" \n\t\0")
except:
    value = self.header[key].strip(" \n\t\0")

Output:

Output from all individual files.txt

@danielhrisca
Copy link
Owner

Thank you @drhannah94 I appreciate your effort

@danielhrisca
Copy link
Owner

I've found the problem and I'm working on a fix

@danielhrisca
Copy link
Owner

@drhannah94
please check the Py3 branch. If this works I will include the fix in development as well

@drhannah94
Copy link
Author

I updated the Py3 branch and then re-ran this code

print(MDF(files[0]).version)
MDF(files[0]).get('Accel_Chasis_X').plot()
merge_file = MDF.concatenate(files)
print(merge_file.version)
merge_file.get('Accel_Chasis_X').plot()
merge_file.resample(0.1)
merge_file.get('Accel_Chasis_X').plot()

At the line...

merge_file = MDF.concatenate(files)

I got an error saying: internal structure of file {22} is different

image

When I look at the structure of the lists channel_names and chans, the value at indexes 1 and 3 are switched.

image

image

@danielhrisca
Copy link
Owner

Hello @drhannah94 ,

that was indeed the issue; the first file had the channels switched. How does the channel "Accel_Chasis_X" look like if you concatenate the other files and ignore the first one?

I could work on fallback that checks the channel names and would hand such strange cases like your first file, but that would certainly not be as memory and speed efficient as the case when the files have identical structure.

danielhrisca added a commit that referenced this issue Feb 12, 2019
@danielhrisca
Copy link
Owner

@drhannah94
I've had a go at trying to handle situations were the channels are not in the same order in the channel group. Please have a try with the Py3 branch code.

@drhannah94
Copy link
Author

@danielhrisca

That definitely did something. I re-ran the same code...

print(MDF(files[0]).version)
MDF(files[0]).get('Accel_Chasis_X').plot()
merge_file = MDF.concatenate(files)
print(merge_file.version)
merge_file.get('Accel_Chasis_X').plot()
merge_file.resample(0.1)
merge_file.get('Accel_Chasis_X').plot()

The output from the following code is...

image

image

image

The graphs look what I would expect them to look like. There is just weird spaces in between some of the files.

@drhannah94
Copy link
Author

Here is what the plot looks like if you ignore the first file.

merge_file = MDF.concatenate(files[1:])
merge_file.get('Accel_Chasis_X').plot()

image

@danielhrisca
Copy link
Owner

There is probably a problem with the way the way the recording software has set the start time in the measurement file header.

What does this print?

for f in files:
    print(MDF(f).header.start_time)

A possible fix would be to disable the sync for concatenation:

MDF.concatenate(files, sync=False)

@drhannah94
Copy link
Author

@danielhrisca
yeah it looks like there are jumps in the start times.

2018-10-25 10:20:16
2018-10-25 11:48:40
2018-10-25 11:50:07
2018-10-25 11:52:04
2018-10-25 11:53:59
2018-10-25 11:55:58
2018-10-25 11:58:06
2018-10-25 12:02:05
2018-10-25 12:06:00
2018-10-25 12:09:25
2018-10-25 12:38:26
2018-10-25 12:42:39
2018-10-25 12:47:57
2018-10-25 12:52:04
2018-10-25 12:55:20
2018-10-25 12:59:04
2018-10-25 13:02:31

Concatenating with sync set to false seems to have done the trick!!

image

@danielhrisca
Copy link
Owner

I'm glad this worked

@danielhrisca
Copy link
Owner

Just a quick suggestion: install pyqtgraph (preferably from the github develop branch pip install https://github.com/pyqtgraph/pyqtgraph/archive/develop.zip) to get nicer interactive plotting

@drhannah94
Copy link
Author

@danielhrisca Thank you so much for the help! I really appreciate the work you've done for this package!

And thanks for the suggestion. I'll definitely take a look at it. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants