Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UAVCAN hardfault on "uavcan param list" #14736

Closed
JacobCrabill opened this issue Apr 22, 2020 · 9 comments
Closed

UAVCAN hardfault on "uavcan param list" #14736

JacobCrabill opened this issue Apr 22, 2020 · 9 comments

Comments

@JacobCrabill
Copy link
Member

Describe the bug
After a few calls uavcan param <list/get/set>, I encounter a hardfault that leads to a reboot.

To Reproduce
Steps to reproduce the behavior:

  1. Black Cube (px4_fmu-v3_default) running latest PX4 master branch
  2. Power via USB
  3. Attach a Here2 GPS to CAN1 (the "real" CAN1)
  4. Connect to serial console via an FTDI adapter
  5. Call uavcan status to ensure UAVCAN is running (this will produce a crap ton of node-spin and vehicle_air_data errors, these don't appear in v1.10...). It doesn't matter if you set UAVCAN_ENABLE to 0 and manually start it after boot, or set UAVCAN_ENABLE to 1.
  6. Do a command like uavcan param list <nodeid> a few times
  7. Cube will hardfault after anywhere from 2-5 calls from a fresh boot

Log Files and Screenshots
image

uavcan status error messages:
status erros

The hardfault after some get and set commands:
hardfault occurance

Reproducing the hardfault with 3 calls to uavcan param list 3:
image

@dagar
Copy link
Member

dagar commented Apr 23, 2020

This is also a bug.

Screenshot from 2020-04-23 09-24-08

@dagar
Copy link
Member

dagar commented Apr 23, 2020

This is probably just the command line uavcan param helpers missing locking.

@dagar
Copy link
Member

dagar commented Apr 23, 2020

Working on a fix in #14741.

@JacobCrabill
Copy link
Member Author

Forgot to update here - the possible fix #14741 does not seem to work. Still unclear what the root cause might be. The suggested solution makes sense (race condition on resource access), but the location of the error has not yet been found.

@dagar
Copy link
Member

dagar commented Apr 30, 2020

Thanks for the update, I'll try to reproduce locally with a debugger attached.

@JacobCrabill
Copy link
Member Author

Cool, that would make things much easier for sure. It doesn't always throw a hardfault though; whatever memory corruption is occurring seems to happen in a few possible locations.

@dagar dagar moved this from To Do to Blocked in Release 1.11 Blockers May 6, 2020
@stale
Copy link

stale bot commented Jul 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@stale stale bot added the stale label Jul 29, 2020
@amikhalev
Copy link
Contributor

I experienced this too, I think this may have been a stack overflow, potentially fixed by #15864

@stale stale bot removed the stale label Oct 16, 2020
@dagar
Copy link
Member

dagar commented Oct 17, 2020

Should be fixed by master, please re-open if that's not the case.

@dagar dagar closed this as completed Oct 17, 2020
Release 1.11 Blockers automation moved this from Blocked to Done Oct 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

3 participants