Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: some MAVLink messages have CRC errors #1740

Open
1 task done
clydemcqueen opened this issue Jun 2, 2023 · 3 comments
Open
1 task done

bug: some MAVLink messages have CRC errors #1740

clydemcqueen opened this issue Jun 2, 2023 · 3 comments
Labels
bug Something isn't working triage Needs triage from developers ui User Interface feature

Comments

@clydemcqueen
Copy link

Bug description

I have found that some MAVLink2 messages generated by BlueOS, sent via mavlink2rest, and written to a tlog file by QGC, and read by pymavlink, have bad CRC values. I am not exactly sure where the bug lies, but I thought I'd share my findings to see if others can help narrow it down.

Steps to reproduce

I am looking at ~20 tlog files from 3 different BlueROV2 systems:

  • R1, upgraded to Navigator + BlueOS, with a Ping sonar
  • R2, Pixhawk + Companion, with a Ping sonar and WL UGPS
  • R4, Navigator + BlueOS, with a Ping sonar and WL UGPS

I created a Python tool to help me examine BAD_DATA messages, and found a few patterns:

  • Just about every tlog file generated by a system running BlueOS has BAD_DATA messages, in contrast to the Companion system, where I haven't found any (but I don't have a ton of data to look at yet)
  • The most common BAD_DATA messages are 132 (DISTANCE_SENSOR) and 232 (GPS_INPUT).
  • I am also seeing a few 259, 260, 262 and 269 (camera messages).
  • A few appear to come from QGC (sysid==255) but the vast majority appear to come from BlueOS (sysid==1 and compid<>1).

Here is a typical run of the tool I'm using:

$ tlog_bad_data.py -r .
Processing 4 files
-------------------
Results for ./2023_05_04/tlog/2023-05-04 10-20-04.tlog
msg_id 259 count 1
msg_id 260 count 1
msg_id 262 count 1
msg_id 269 count 1
4 BAD_DATA messages, 4 of them were CRC errors
-------------------
Results for ./2023_05_04/tlog/2023-05-04 10-41-19.tlog
msg_id 21 count 4
msg_id 77 count 4
msg_id 132 count 1730
msg_id 232 count 15
msg_id 259 count 3
msg_id 260 count 3
msg_id 262 count 3
msg_id 269 count 3
1765 BAD_DATA messages, 1765 of them were CRC errors
-------------------
Results for ./2023_05_04/tlog/2023-05-04 11-18-48.tlog
msg_id 21 count 2
msg_id 76 count 2
msg_id 77 count 4
msg_id 132 count 1382
msg_id 232 count 83
msg_id 259 count 2
msg_id 260 count 2
msg_id 262 count 2
msg_id 269 count 2
1481 BAD_DATA messages, 1481 of them were CRC errors
-------------------
Results for ./2023_05_04/tlog/2023-05-04 19-02-21.tlog
msg_id 21 count 1
msg_id 77 count 1
msg_id 132 count 3268
msg_id 259 count 2
msg_id 260 count 2
msg_id 262 count 2
msg_id 269 count 2
3278 BAD_DATA messages, 3278 of them were CRC errors

FYI that I am running a patched version of pymavlink, without this patch pymavlink will get confused by the BAD_DATA messages and crash.

Primary pain point(s)

At the moment this just makes log analysis a challenge. (I haven't yet traced these messages through ArduSub to see if they are causing problems in ArduSub.)

Prerequisites

  • I have checked to make sure that a similar request has not already been filed or fixed.
@clydemcqueen clydemcqueen added bug Something isn't working triage Needs triage from developers ui User Interface feature labels Jun 2, 2023
@clydemcqueen
Copy link
Author

I am tracing this down as time permits. I can reproduce the problem by calling mavlink2rest using curl. E.g., this will generate a "bad" message:

curl --verbose http://127.0.0.1:6040/mavlink -H "accept: application/json" --data \
'{
    "header": {
        "system_id": 1,
        "component_id": 194,
        "sequence": 52
    },
    "message": {
        "type": "DISTANCE_SENSOR",
        "time_boot_ms": 201509,
        "min_distance": 20,
        "max_distance": 5000,
        "current_distance": 2325,
        "mavtype": {"type": "MAV_DISTANCE_SENSOR_ULTRASOUND"},
        "id": 1,
        "orientation": {"type": "MAV_SENSOR_ROTATION_PITCH_270"},
        "covariance": 255,
        "horizontal_fov": 0.52,
        "vertical_fov": 0.52,
        "quaternion": [0, 0, 0, 0],
        "signal_quality": 0
    }
}'

Pymavlink is happy to receive and unpack this message live, but it barfs when reading the QGC-generated tlog file that contains this message. So I think it's a bug in QGC. If you don't mind, I'll keep this issue open as I continue isolating the problem.

@clydemcqueen
Copy link
Author

OK, I think I have figured this out. This occurs when you send a MAVLink2 message with trailing 0's. For DISTANCE_SENSOR messages, this happens when signal_quality is 0.

Then:

  • mavlink2rest does not truncate the zeros as required. The CRC is correct, but only if the zeros are included.
  • pymavlink will happily parse the message with the trailing zeros and the corresponding CRC.
  • QGC will truncate the message without re-computing the CRC. The truncated message (with the old CRC) is written to the tlog file.
  • pymavlink will crash when reading the tlog file with the truncated messages.

@clydemcqueen
Copy link
Author

The QGC bug is actually a pymavlink bug: ArduPilot/pymavlink#237

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Needs triage from developers ui User Interface feature
Projects
None yet
Development

No branches or pull requests

1 participant