Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UAVCAN 1 on master (2018-06-12) broken #9648

Closed
thomasgubler opened this issue Jun 12, 2018 · 13 comments
Closed

UAVCAN 1 on master (2018-06-12) broken #9648

thomasgubler opened this issue Jun 12, 2018 · 13 comments

Comments

@thomasgubler
Copy link
Contributor

thomasgubler commented Jun 12, 2018

I have different behaviours when using UAVCAN to control ESCs on CAN1 and CAN2 on a Pixhawk 4.

  • CAN1: everything works fine if we are not connected to QGC. When I launch QGC it will request the UAVCAN parameters and I get a hardfault in the send_uavcan function in mavlink_parameters.cpp. This happens because initially the _uavcan_open_request_list pointer is null. I tried returning the function until the pointer is different from null and then it works fine (after several retries).

  • CAN2: the issue that I have on CAN1 doesn't occur because the uavcan module fails to get the parameters and therefore send_uavcan is never called.
    This warning comes up when I connect to QGC and CAN2 is used:

WARN  [uavcan] UAVCAN command bridge: starting global param list with node 128
WARN  [uavcan] UAVCAN command bridge: couldn't start parameter count: -2
WARN  [uavcan] UAVCAN command bridge: completed param list for node 128

On a Pixracer's CAN1 I don't have any of the two issues.
Now I will test on other nuttx boards.

@thomasgubler thomasgubler added this to To do in v1.8 Release via automation Jun 12, 2018
@DanielePettenuzzo
Copy link
Contributor

FYI @LorenzMeier @davids5 @thomasgubler

@davids5
Copy link
Member

davids5 commented Jun 12, 2018

@DanielePettenuzzo as mentioned here #9534 (comment)

Also in case it is not mentioned anywhere, be advised that FW upgrade and dynamic node ID assignment ONLY will occur on the CAN1 bus. This is by design.

What this means is the FW server (for historical RAM size limitation) a separate task under uavcan on the FMU is that is started in the configuration phase and then stopped to conserver RAM. It is responsible for 1) dynamic node ID allocation (think DHCP on can) 2) FW upgrade - using the brickproof bootloader on a UAVCAN node to update the fw on the node. 3) Parameter setting on the nodes. This is where the hardfault was happening and why it is only related to CAN1.

@davids5 davids5 closed this as completed Jun 12, 2018
v1.8 Release automation moved this from To do to Done Jun 12, 2018
@DanielePettenuzzo
Copy link
Contributor

UAVCAN FW server is not running on CAN2 therefore the function that causes the hardfault is never called.

@thomasgubler
Copy link
Contributor Author

CAN1: everything works fine if we are not connected to QGC. When I launch QGC it will request the UAVCAN parameters and I get a hardfault in the send_uavcan function in mavlink_parameters.cpp. This happens because initially the _uavcan_open_request_list pointer is null. I tried returning the function until the pointer is different from null and then it works fine (after several retries).

I think it is not fixed for CAN1 @DanielePettenuzzo still has a fix to push

@thomasgubler thomasgubler changed the title UAVCAN 1 and 2 on master (2018-06-12) broken UAVCAN 1 on master (2018-06-12) broken Jun 12, 2018
@thomasgubler thomasgubler reopened this Jun 12, 2018
v1.8 Release automation moved this from Done to In progress Jun 12, 2018
@thomasgubler
Copy link
Contributor Author

Rough ETA I got from Daniele is noon CEST 2018-06-13

@LorenzMeier
Copy link
Member

@DanielePettenuzzo There is also the PR from David with the UAVCAN update which might address a couple of issues.

@dagar
Copy link
Member

dagar commented Jun 12, 2018

See #9650

@davids5
Copy link
Member

davids5 commented Jun 12, 2018

@DanielePettenuzzo - I think this was related to a race that happens because of the F7. I could have been speed, ordering or having 2 CAN interfaces. The Fix is #9652. I found that send_uavcan was called before enque_uavcan_request on the F7. This was not the case on a AUAV X2.1. I am not sure if it related to having 2 can interfaces or we would have seen it on FMUv2 w/2MB

@DanielePettenuzzo
Copy link
Contributor

@davids5 thanks you. Today I also tried with Pixhawk 3 Pro (fmu-v4pro) that has 2 can interfaces and I didn't have this issue.. Instead with the Pixhack v5 (also fmu-v5) I have the same issues as on Pixhawk 4.

@davids5
Copy link
Member

davids5 commented Jun 12, 2018

@DanielePettenuzzo For the record my assumption above about a race was wrong! The bug has been in the code forever. It was not a race. The dereferencing null on the F4 did not pitch the exception but does on the F7.

Please cross test the PR as well.

@DanielePettenuzzo
Copy link
Contributor

@davids5 will do one more test first thing tomorrow morning although this is the same change I did today and it was working fine.

@davids5
Copy link
Member

davids5 commented Jun 12, 2018

@DanielePettenuzzo - thank you! The value add will be testing on the new libuavcan that is now on master.

@DanielePettenuzzo
Copy link
Contributor

@davids5 perfect. will do.

v1.8 Release automation moved this from In progress to Done Jun 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
v1.8 Release
  
Done
Development

No branches or pull requests

5 participants