Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FW hardfault on beta #4530

Closed
kd0aij opened this issue May 13, 2016 · 46 comments
Closed

FW hardfault on beta #4530

kd0aij opened this issue May 13, 2016 · 46 comments
Assignees
Labels

Comments

@kd0aij
Copy link
Contributor

kd0aij commented May 13, 2016

observed on bench while setting up for a test flight

https://drive.google.com/open?id=0Bw3digSMQXDuQjR1MHJTWDNnbGM

@dagar
Copy link
Member

dagar commented May 13, 2016

Is it reproducible? This was probably indoors with no gps?

I've seen an ekf_att_pos_estimator hard fault intermittently with fixedwing HIL when I connect QGC and the ekf finally gets a position. I haven't looked into it, but it might be effectively the same conditions.

What's at 0x0808be22 in your build?

@dagar dagar added the bug label May 13, 2016
@kd0aij
Copy link
Contributor Author

kd0aij commented May 13, 2016

Occurred at least twice yesterday under a metal-roofed shelter (fairly good gps signal)
Between that and the mixer issues, it's unflyable. This setup was last tested before winter; has anything changed that would invalidate the custom mixer?

(gdb) info line *0x0808be22
Line 67 of "../src/lib/ecl/attitude_fw/ecl_pitch_controller.cpp"
starts at address 0x808be1e <ECL_PitchController::control_attitude(ECL_ControlData const&)+30>
and ends at 0x808be26 <ECL_PitchController::control_attitude(ECL_ControlData const&)+38>.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 13, 2016

crash is consistently reproducible on the bench by booting then faking gps at the console:
gps stop
gps start -f

crashes immediately after starting gps

@kd0aij
Copy link
Contributor Author

kd0aij commented May 14, 2016

to answer my own question, there was a bugfix here on 12 March: f7ac1f0
which also needed to be applied to my custom mixer.

@dagar What would be the best way to warn other users who might be using custom mixers?

@kd0aij kd0aij self-assigned this May 14, 2016
@dagar
Copy link
Member

dagar commented May 14, 2016

We're so tight on memory that a few extra lines in your mixer are this much of a problem? I've never looked at the mixer code, but at the very least we could add a conservative length check.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 14, 2016

The description of that commit says "unused lines... clobbering RAM". Seems like that should crash the IO processor though, not the FMU.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 14, 2016

@AndreasAntener I guess I shouldn't have called f7ac1f0 a "bugfix", since it changes the input files, not the parser (or whatever was blowing RAM). Is is also necessry to "strip" the comment lines from a custom mixer to avoid corrupting RAM? I'd be willing to spend some time on this if it will provide better protection from crashes due to bad mixer files. Especially since custom mixer files are the only mechanism for adjusting servo travel and endpoints.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 14, 2016

@dagar Apparently, the hardfault is not related to the mixer issues. I'm still seeing it when GPS is either lost or re-acquired (can't tell which):

Processes: 20 total, 2 running, 18 sleeping
CPU usage: 52.33% tasks, 0.62% sched, 47.05% idle
Uptime: 1176.962s total, 587.252s idle

 PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE 
   0 Idle Task                  587251 47.049     0/    0   0 (  0)  READY 
   1 hpwork                      44003  3.649   836/ 1592 192 (192)  w:sem 
   2 lpwork                       5960  0.543   572/ 1592  50 ( 50)  READY 
   3 init                         1713  0.000  1236/ 2496 100 (100)  w:sem 
 219 top                           143  2.562  1252/ 1696 100 (100)  RUN   
  87 gps                          3014  0.232   836/ 1192 220 (220)  w:sem 
  89 dataman                        22  0.000   652/ 1192  90 ( 90)  w:sem 
 113 sensors                     33574  2.872  1732/ 1992 250 (250)  w:sem 
 115 commander                   21952  1.785  3100/ 3592 140 (140)  w:sig 
 116 commander_low_prio            547  0.000   772/ 2992  50 ( 50)  w:sem 
 122 px4io                       39554  3.260  1036/ 1392 240 (240)  w:sem 
 132 mavlink_if0                 10728  0.854  2332/ 2792 100 (100)  w:sig 
 133 mavlink_rcv_if0               328  0.000  1188/ 2096 175 (175)  w:sem 
 166 mavlink_if1                 34865  2.950  2316/ 2792 100 (100)  w:sig 
 167 mavlink_rcv_if1                90  0.000   828/ 2096 175 (175)  w:sem 
 171 sdlog2                      13022  1.086  2156/ 3392 177 (177)  w:sem 
 201 ekf_att_pos_estimator      317154 27.251  3556/ 4592 235 (235)  w:sem 
 204 fw_att_control              29259  2.562   980/ 1296 250 (250)  w:sem 
 206 fw_pos_control_l1            5530  0.465   812/ 1392 250 (250)  w:sem 
 210 navigator                   27338  2.251   796/ 1496 105 (105)  w:sem 
nsh> WARN  navigator timed out
Assertion failed at file:armv7-m/up_hardfault.c line: 184 task: ekf_att_pos_estimator
sp:     200036a0
IRQ stack:
  base: 200036ec
  size: 000002e8
200036a0: 080d7f54 000000b8 0809b349 00000010 200033f0 00000003 00000000 0809fd29
200036c0: 0809fd15 080ab34f e000ed24 080a8f1d 00000000 2001b638 409a9034 2001adb0
200036e0: 00000001 0809fc99 2001cd54 00000000 00000000 10000010 00000001 00000000
sp:     2001ce28
User stack:
  base: 2001cfa0
  size: 000011f4
2001ce20: 60000010 2001b638 2001b638 409a9034 2001adb0 0808851f 00000000 2001ce8c
2001ce40: c05a4644 2001b518 666b655b 6572205d 4c203a66 39332041 3035372e 4f4c2c32
2001ce60: 30312d20 39302e35 412c3937 3120544c 2e303037 40003530 64060cff 00000000
2001ce80: 40000000 0805afa1 080cea24 89000000 bc01a36e 4043e005 20c140a7 c05a4644
2001cea0: 40000000 409a9034 c05a4644 20c140a7 c05a4644 bc01a36e 4043e005 256a047c
2001cec0: 3fe63360 2001adb0 2001cfb0 2001adb0 2001b5a8 0805b523 00000001 2001b5b0
2001cee0: 0000260b bd4ccccc 3ca3d70a 3ca3d70a 2001cf0f 00000000 3c23d70a 00000000
2001cf00: 00000000 00000000 00000000 00000000 00000000 000f423f 00000000 0805da79
2001cf20: 2001b130 2001b1c8 2001ae38 ffffffff 000182b8 00000000 00000000 00000000
2001cf40: 00000000 00000000 00000009 00000000 08090001 2001b85c 0000000c 00000000
2001cf60: 00000101 2001b88c 00000000 00000000 00000000 00000000 00000000 00000000
2001cf80: 00000000 00000000 00000000 00000000 00000000 0809c829 00000000 00000000
R0: 080d3c70 18826b30 2001ce40 2001ce40 2001b638 409a9034 2001adb0 00000001
R8: 20000134 00000000 20017ee0 2001b1c8 00000007 2001ce28 0808851f 0808c392
xPSR: 61000000 BASEPRI: 00000000 CONTROL: 00000000
EXC_RETURN: ffffffe9

@dagar
Copy link
Member

dagar commented May 14, 2016

For the mixers in ROMFS/px4fmu_common/mixers they're stripped when copied
to the build directory.

To sidetrack this bug further, it would be great to have a tool to help
adjust the mixer. I had a plane with a servo for each stab and I gave up
trying to use the mixer to center them and just used a servo programmer.

On Sat, May 14, 2016 at 9:58 AM, Mark Whitehorn notifications@github.com
wrote:

@AndreasAntener https://github.com/AndreasAntener I guess I shouldn't
have called f7ac1f0
f7ac1f0
a "bugfix", since it changes the input files, not the parser (or whatever
was blowing RAM). Is is also necessry to "strip" the comment lines from a
custom mixer to avoid corrupting RAM? I'd be willing to spend some time on
this if it will provide better protection from crashes due to bad mixer
files. Especially since custom mixer files are the only mechanism for
adjusting servo travel and endpoints.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#4530 (comment)

@dagar
Copy link
Member

dagar commented May 14, 2016

I'm fairly certain this is the same bug I get with fixed wing HIL fairly
often. Including the same "navigator timed out" due to global_pos.
Something to do with the timing of when ekf_att_pos_estimator finally gets
GPS. The next time it happens in my setup I'll dig into it.

On Sat, May 14, 2016 at 10:21 AM, Mark Whitehorn notifications@github.com
wrote:

@dagar https://github.com/dagar Apparently, the hardfault is not
related to the mixer issues. I'm still seeing it when GPS is either lost or
re-acquired (can't tell which):

Processes: 20 total, 2 running, 18 sleeping
CPU usage: 52.33% tasks, 0.62% sched, 47.05% idle
Uptime: 1176.962s total, 587.252s idle

PID COMMAND CPU(ms) CPU(%) USED/STACK PRIO(BASE) STATE
0 Idle Task 587251 47.049 0/ 0 0 ( 0) READY
1 hpwork 44003 3.649 836/ 1592 192 (192) w:sem
2 lpwork 5960 0.543 572/ 1592 50 ( 50) READY
3 init 1713 0.000 1236/ 2496 100 (100) w:sem
219 top 143 2.562 1252/ 1696 100 (100) RUN
87 gps 3014 0.232 836/ 1192 220 (220) w:sem
89 dataman 22 0.000 652/ 1192 90 ( 90) w:sem
113 sensors 33574 2.872 1732/ 1992 250 (250) w:sem
115 commander 21952 1.785 3100/ 3592 140 (140) w:sig
116 commander_low_prio 547 0.000 772/ 2992 50 ( 50) w:sem
122 px4io 39554 3.260 1036/ 1392 240 (240) w:sem
132 mavlink_if0 10728 0.854 2332/ 2792 100 (100) w:sig
133 mavlink_rcv_if0 328 0.000 1188/ 2096 175 (175) w:sem
166 mavlink_if1 34865 2.950 2316/ 2792 100 (100) w:sig
167 mavlink_rcv_if1 90 0.000 828/ 2096 175 (175) w:sem
171 sdlog2 13022 1.086 2156/ 3392 177 (177) w:sem
201 ekf_att_pos_estimator 317154 27.251 3556/ 4592 235 (235) w:sem
204 fw_att_control 29259 2.562 980/ 1296 250 (250) w:sem
206 fw_pos_control_l1 5530 0.465 812/ 1392 250 (250) w:sem
210 navigator 27338 2.251 796/ 1496 105 (105) w:sem
nsh> WARN navigator timed out
Assertion failed at file:armv7-m/up_hardfault.c line: 184 task: ekf_att_pos_estimator
sp: 200036a0
IRQ stack:
base: 200036ec
size: 000002e8
200036a0: 080d7f54 000000b8 0809b349 00000010 200033f0 00000003 00000000 0809fd29
200036c0: 0809fd15 080ab34f e000ed24 080a8f1d 00000000 2001b638 409a9034 2001adb0
200036e0: 00000001 0809fc99 2001cd54 00000000 00000000 10000010 00000001 00000000
sp: 2001ce28
User stack:
base: 2001cfa0
size: 000011f4
2001ce20: 60000010 2001b638 2001b638 409a9034 2001adb0 0808851f 00000000 2001ce8c
2001ce40: c05a4644 2001b518 666b655b 6572205d 4c203a66 39332041 3035372e 4f4c2c32
2001ce60: 30312d20 39302e35 412c3937 3120544c 2e303037 40003530 64060cff 00000000
2001ce80: 40000000 0805afa1 080cea24 89000000 bc01a36e 4043e005 20c140a7 c05a4644
2001cea0: 40000000 409a9034 c05a4644 20c140a7 c05a4644 bc01a36e 4043e005 256a047c
2001cec0: 3fe63360 2001adb0 2001cfb0 2001adb0 2001b5a8 0805b523 00000001 2001b5b0
2001cee0: 0000260b bd4ccccc 3ca3d70a 3ca3d70a 2001cf0f 00000000 3c23d70a 00000000
2001cf00: 00000000 00000000 00000000 00000000 00000000 000f423f 00000000 0805da79
2001cf20: 2001b130 2001b1c8 2001ae38 ffffffff 000182b8 00000000 00000000 00000000
2001cf40: 00000000 00000000 00000009 00000000 08090001 2001b85c 0000000c 00000000
2001cf60: 00000101 2001b88c 00000000 00000000 00000000 00000000 00000000 00000000
2001cf80: 00000000 00000000 00000000 00000000 00000000 0809c829 00000000 00000000
R0: 080d3c70 18826b30 2001ce40 2001ce40 2001b638 409a9034 2001adb0 00000001
R8: 20000134 00000000 20017ee0 2001b1c8 00000007 2001ce28 0808851f 0808c392
xPSR: 61000000 BASEPRI: 00000000 CONTROL: 00000000
EXC_RETURN: ffffffe9


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#4530 (comment)

@kd0aij
Copy link
Contributor Author

kd0aij commented May 14, 2016

Do you see it as a significant risk? Or do you still have manual control available with the FMU crashed?

@kd0aij
Copy link
Contributor Author

kd0aij commented May 14, 2016

Just verified that the IO coprocessor is still working after the hardfault, and manual RC control is OK. Except my gimbal doesn't work after removing those lines from the main mixer. Will try putting them back now.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 14, 2016

EKF crash fixed by 237bdfd

Closing but note there are unanswered questions about custom mixers

@kd0aij
Copy link
Contributor Author

kd0aij commented May 15, 2016

Reopening: observed another hardfault on the bench while trying to get RC failsafe configured for OBC rules:
This was commit 237bdfd which fixed the previous crash:

nsh> INFO  data link #1 lost
INFO  home: 39.7501808, -105.0979006, 1712.44
WARN  navigator timed out
WARN  navigator timed out
WAAssertion failed at file:armv7-m/up_hardfault.c line: 184 task: ekf_att_pos_estimator
sp:     20003f20
IRQ stack:
  base: 20003f6c
  size: 000002e8
20003f20: 080d0f74 000000b8 08095f3d 00000010 20003c78 00000003 00000000 0809a951
20003f40: 0809a93d 080a495b e000ed24 080a26d5 00000000 2001dc18 080c8728 000182b8
20003f60: 20017dd0 0809a8c1 2001ef94 00000000 00000000 10000010 00000001 00000000
sp:     2001f068
User stack:
  base: 2001f3e0
  size: 000011f4
2001f060: 80000010 2001dc18 2001dc18 080c8728 000182b8 08083187 2001e088 2001f0cc
2001f080: 2001f0c4 0809fecf 666b655b 65686320 205d6b63 65637865 76697373 79672065
2001f0a0: 6f206f72 65736666 00007374 08098167 080c88ac 2001f0c4 080290ac 0805b4d3
2001f0c0: 080c90ac 0805b203 080c88bc 080c90ac 00000000 00000000 00000000 080c9054
2001f0e0: 080c9070 080c9090 080c90ac 080c90c4 080c90ec 080c9104 37ee0bc6 00000000
2001f100: bf7d737d b4c17710 bdec1201 00000000 20020a00 2001f808 20020480 00000001
2001f120: 200205e0 2001f804 080c8700 424ec574 2001f2f8 20017dd0 2001f318 20017dec
2001f140: 2001d4ac 0808c5e7 080c8700 00000000 3f800000 00000000 080c8710 ba3ccbfc
2001f160: b8c90cd5 00000000 01010101 00000000 01010001 001fc931 001fc931 001ff8ff
2001f180: 3f7c4a65 3d494eec 3cecd6fd 3e23940c 3fd9dec6 c0cfe829 3c5be11b 4098a089
2001f1a0: c24da7c0 c4d12daa 35362ef3 b6dd2095 35455d86 391a7eca 00000000 00000000
2001f1c0: 3ea7f229 3d3ded25 3ec1a4ec 00000000 00000000 00000000 00000000 00000000
2001f1e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2001f200: 00000016 00000000 00000000 00000001 00000000 00000000 0214ee80 00000007
2001f220: 00000080 0130df00 00000000 080a7e9b 080ab036 21000200 3a8aae6d c05a464d
2001f240: 33930ac2 3eda22a7 3d4cb265 bd70d60c 3d89b6b1 bead5792 3da5a83e 3f6ffd88
2001f260: 3e0b5617 3b44a123 b92af034 3f7d70a4 48fe7f60 48fe7f40 20000010 2001d3c0
2001f280: 60000000 3fd08e0c 7d000000 2001d3c0 2001f324 00000001 60000000 080ad6f1
2001f2a0: 080a7759 2001d3c0 20012710 2001c4d0 0000260b 2001dc17 2001c4d0 0000260b
2001f2c0: 2001d848 08086dbf 08086d85 080cdb38 0000260b 080ae8a1 080ae88d 00000003
2001f2e0: 2001dc17 0809b48d 2001d3c0 2001db78 2001db38 0805d05f 2001dc08 2001c610
2001f300: 00000000 00000000 00000000 0000260b 7ce5f12c 00000000 2001dbe0 2001db90
2001f320: 0000260b 3e9f1700 bc64ae9b 00000000 3c23d70a 00000000 00000000 000f423f
2001f340: 00000000 000182b8 20017dd0 2001d3c0 2001db88 00000000 2001d7c0 0805d9f5
2001f360: 2001d730 2001d7c0 2001d448 ffffffff 000182b8 00000000 00000000 00000000
2001f380: 00000000 00000000 00000009 00000000 08090001 2001de3c 0000000c 00000000
2001f3a0: 00000101 2001de6c 00000000 00000000 00000000 00000000 00000000 00000000
2001f3c0: 00000000 00000000 00000000 00000000 00000000 08097459 00000000 00000000
R0: 20000cf4 b2c4d793 2001f080 2001f080 2001dc18 080c8728 000182b8 20017dd0
R8: 2001d3c0 2001db88 2001d3c0 00000004 2001f0a0 2001f068 08083187 08086a36
xPSR: 61000000 BASEPRI: 00000000 CONTROL: 00000000
EXC_RETURN: ffffffe9
sercon: Registering CDC/ACM serial driver
sercon: Successfully registered the CDC/ACM serial driver
param: selected parameter default file /fs/mtd_params
rgbled on I2C bus 1 at 0x55 (bus: 100 KHz, max: 100 KHz)
 1:   SYS_USE_IO: match
 2101: + SYS_AUTOSTART: match
[i] Custom: /fs/microsd/etc/config.txt 
px4io: CRCs match
dataman: Power on restart, data manager file '/fs/microsd/dataman' size is 103090 bytes
MS5611_SPI on SPI bus 1 at 3 (20000 KHz)
bst: no devices found
adc init done
ver hwcmp match: PX4FMU_V2
HMC5883_I2C on I2C bus 1 at 0x1e (bus: 100 KHz, max: 400 KHz)
lis3mdl: no device on bus 2
LIS3MDL bad ID: 3chmc5883: no device on bus 1
mpu6000: no device on this bus
MPU6000 on SPI bus 1 at 4 (1000 KHz)
L3GD20 on SPI bus 1 at 1 (11000 KHz)
LSM303D on SPI bus 1 at 2 (11000 KHz)
Airspeed on I2C bus 1 at 0x28 (bus: 100 KHz, max: 100 KHz)
nsh: sf10a: command not found
px4io default PWM output device
mavlink_if0: mode: 0, data rate: 1200 B/s on /dev/ttyS1 @ 57600B
mavlink_if0: offboard mission init: ERROR
 157600:   SYS_COMPANION: match
mavlink_if1: mode: 3, data rate: 1000 B/s on /dev/ttyS2 @ 57600B
ver hwcmp match: PX4FMU_V2
px4flow [166:100]
px4flow: scanning I2C buses for device..
mavlink_if2: mode: 5, data rate: 800000 B/s on /dev/ttyACM0 @ 57600B
FIXED WING 
[i] Mixer: /fs/microsd/etc/mixers/fpv_skywalker_stripped.mix on /dev/pwm_output0 
pwm: reading disarmed value of zero, disabling disarmed PWM
[i] Mixer: /etc/mixers/pass.aux.mix on /dev/pwm_output1 

param: Error: Parameter EKF2_REC_RPL not found.
 0:   SYS_LOGGER: match
WARN  log buffer size: 12288 bytes
[i] Addons script: /fs/microsd/etc/extras.txt 
device: /dev/pwm_output0
channel 1: 1000 us (default rate: 50 Hz failsafe: 1000, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 2: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 3: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 4: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 5: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 6: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 7: 0 us (default rate: 50 Hz failsafe: 0, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 8: 0 us (default rate: 50 Hz failsafe: 0, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel group 0: channels 1 2
channel group 1: channels 5 6 7 8
channel group 2: channels 3 4

NuttShell (NSH)
nsh> mavlink_if1: Disabling hardware flow control
ver git
FW git-hash: 237bdfdb61ea335df7c2baca0588fd7fa3e26c6d
FW version: v1.3.0rc3-261-g237bdfd (130)
OS version: 6.27 (627)
nsh> WARN  Not ready to fly: Sensors not set up correctly
INFO  home: 39.7501782, -105.0978645, 1712.38
gps stop
WARN  GPS module lost
WARN  exiting
nsh> gps start -f
nsh> 


(gdb) info line *0x08086a36
Line 311 of "../src/modules/uORB/uORBDevices_nuttx.cpp"
   starts at address 0x8086a36 <uORB::DeviceNode::publish(orb_metadata const*, void*, void const*)+2>
   and ends at 0x8086a3a <uORB::DeviceNode::publish(orb_metadata const*, void*, void const*)+6>.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 16, 2016

WARN  navigator timed out
WAAssertion failed at file:armv7-m/up_hardfault.c line: 184 task: ekf_att_pos_estimator
sp:     20003f20
IRQ stack:
  base: 20003f6c
  size: 000002e8
20003f20: 080d0f74 000000b8 08095f3d 00000010 20003c78 00000003 00000000 0809a951
20003f40: 0809a93d 080a495b e000ed24 080a26d5 00000000 2001d5e8 080c8728 000182b8
20003f60: 2001d910 0809a8c1 2001ef54 00000000 00000000 10000010 00000001 00000000
sp:     2001f028
User stack:
  base: 2001f3a0
  size: 000011f4
2001f020: 80000010 2001d5e8 2001d5e8 080c8728 000182b8 08083187 2001de38 2001f08c
2001f040: 2001f084 0809fecf 666b655b 65686320 205d6b63 65637865 76697373 79672065
2001f060: 6f206f72 65736666 00007374 08098167 080c88ac 2001f084 080290ac 0805b4d3
2001f080: 080c90ac 0805b203 080c88bc 080c90ac 00000000 00000000 00000000 080c9054
2001f0a0: 080c9070 080c9090 080c90ac 080c90c4 080c90ec 080c9104 b8626fdf 00000000
2001f0c0: bf785e28 37112113 be35f194 00000000 200209f4 2001f87c 20020524 00000001
2001f0e0: 200205d4 2001f874 080c8700 4260b2c6 2001f2b8 2001d910 2001f2d8 2001d92c
2001f100: 2001ce7c 0808c5e7 080c8700 00000000 3f800000 00000000 080c8710 3847a31c
2001f120: 3a568491 00000000 01010101 00000000 01010001 009a7abb 009a7abb 009a7abb
2001f140: 3f17638a 3e031457 bcbfd63e 3f4bbbf5 befd5634 4010e52e c0775e1f 402a8c42
2001f160: bd635604 c4d36d8d b514e1d3 b6935818 b7446881 3be536a7 00000000 00000000
2001f180: 3e2d1cd3 3cc3c4ce 3f059ed8 00000000 00000000 00000000 00000000 00000000
2001f1a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2001f1c0: 00000016 00000000 00000000 00000001 00000002 2001b9a0 2001d1e0 ffffffe9
2001f1e0: 3f0769e1 ba15e0db ba15e0d8 00000000 00000000 00000000 00000000 00000000
2001f200: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2001f220: 4a19ee4c 00267b93 00000000 4a19ee4c 00000000 08089807 08089826 91000000
2001f240: 80000000 3fd324fd 7d000000 2001cd90 2001f2e4 00000001 80000000 080ad6f1
2001f260: 080a7759 2001cd90 20012780 20022780 0000260b 2001d5e7 20022780 0000260b
2001f280: 2001d218 08086dbf 08086d85 080cdb38 0000260b 080ae8a1 080ae88d 00000003
2001f2a0: 2001d5e7 0809b48d 2001cd90 2001d548 2001d508 0805d05f 2001d5d8 200228c0
2001f2c0: 00000000 00000000 00000000 0000260b 5b6f7bb3 00000002 2001d5b0 2001d560
2001f2e0: 0000260b 3da8d500 bdc129bb 00000000 3c23d70a 00000000 00000000 000f423f
2001f300: 00000000 000182b8 2001d910 2001cd90 2001d558 00000000 2001d190 0805d9f5
2001f320: 2001d100 2001d190 2001ce18 ffffffff 000182b8 00000000 19186b38 00000000
2001f340: 00000000 3e7edfc7 00000009 00000000 08090001 2001dbec 0000000c 00000000
2001f360: 00000101 2001dc1c 00000000 00000000 00000000 00000000 00000000 00000000
2001f380: 00000000 00000000 00000000 00000000 00000000 08097459 00000000 00000000
R0: 20000cf4 584490fc 2001f040 2001f040 2001d5e8 080c8728 000182b8 2001d910
R8: 2001cd90 2001d558 2001cd90 00000004 2001f060 2001f028 08083187 08086a36
xPSR: 61000000 BASEPRI: 00000000 CONTROL: 00000000
EXC_RETURN: ffffffe9

CTRL-A Z for help | 57600 8N1 | NOR | Minicom 2.7 | VT102 | Online 89:27 | ttyUSB0                      

@kd0aij
Copy link
Contributor Author

kd0aij commented May 16, 2016

@LorenzMeier I guess the two (identical?) hardfaults above are also due to RAM corruption in EKF, but I don't see a good way to track it down. Looks like a subscription handle was trashed. Is there a way to monitor stack limits?

@dagar
Copy link
Member

dagar commented May 16, 2016

@kd0aij
Copy link
Contributor Author

kd0aij commented May 16, 2016

Thanks, I was just trying to build with that; is that the only file that needs to change?

@dagar
Copy link
Member

dagar commented May 16, 2016

If you're using a pixhawk you'll likely exceed the size limit and need to disable several modules. mpu9250 and uavcan are good candidates.
I would also throw away the entire build directory.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 16, 2016

I'm getting a hardfault early in the boot process:

sercon: Registering CDC/ACM serial driver
sercon: Successfully registered the CDC/ACM serial driver
Assertion failed at file:irq_unexpectedisr.c line: 86 task: mtd
sp:     20003150
IRQ stack:
  base: 200031bc
  size: 000002e8
20003140: 20003150 100071a8 000003fc 0807c2cd 20003150 100071a8 200031bc 0807c2e1
20003160: 00000056 080b26b3 1000700c 0807c511 08087b79 0807c3b1 00000010 0807581d
20003180: 08075801 0808aa89 20002ec8 00000002 1000700c 08087ba5 00000000 00000002
200031a0: 08027987 20001824 10007100 00000002 00000000 0807c511 1000700c 00000000
sp:     100070e0
User stack:
  base: 100071a8
  size: 000003fc
100070e0: 00000002 10004610 20001824 08027987 ffffffff ffffffff ffffffff ffffffff
10007100: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
10007120: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
10007140: 00000000 00000002 10004610 1000412c 00000000 00000000 00000000 10006df0
10007160: 00000000 08027d75 00000000 00000000 00000000 00000000 00000000 00000000
10007180: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 080776c5
100071a0: 00000000 00000000 ffffffff ffffffff ffffffff ffffffff 00008e40 00000410
R0: 0807bea9 00000004 80000000 e000ed04 00000002 08027987 20001824 10007100
R8: 00000002 00000000 200000fc 20001821 00000000 100070e0 0807beb7 0807c99c
xPSR: 81000000 BASEPRI: 00000000 CONTROL: 00000000
EXC_RETURN: ffffffe9

probably don't have the right compiler switches set; how do I look at the actual compiler commands generated by cmake?

@dagar
Copy link
Member

dagar commented May 16, 2016

From the actual build directory

cd build_px4fmu-v2_default

then

ninja -v -j1

or

make -j1 VERBOSE=1

It's entirely possible this is going to find lots of little potential issues that haven't bitten us yet. If that's the case once we get a hardware test harness back in the builds (#4532) we should have a variation with STACKCHECK enabled.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 16, 2016

it looks like these switches are missing: (from nsh/Make.defs)

# enable precise stack overflow tracking
ifeq ($(CONFIG_ARMV7M_STACKCHECK),y)
INSTRUMENTATIONDEFINES   = -finstrument-functions -ffixed-r10
endif

@kd0aij
Copy link
Contributor Author

kd0aij commented May 18, 2016

@dagar I've had no luck so far trying to get those compiler switches applied. Can you give any advice on how to accomplish it?

@dagar
Copy link
Member

dagar commented May 18, 2016

I can give it a try tonight.
What's an easy way to reproduce? Current master, fixed wing model, and restarting gps as you described above?

@kd0aij
Copy link
Contributor Author

kd0aij commented May 18, 2016

This last fault isn't occurring very often, and I don't know what triggers it. But I was talking about building with stack checking; neither I nor google search is up to the task of figuring out why the build doesn't have those compiler switches specified.

@dagar
Copy link
Member

dagar commented May 18, 2016

I was going to try both. For stack check I'm guessing we just need to get the instrumentation flags into the rest of the PX4 cmake build.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 18, 2016

Thanks, nothing I tried had any effect on the cmake build.

@davids5
Copy link
Member

davids5 commented May 19, 2016

@kd0aij, @dagar
Here is a PR (on nuttx_v3) that will get the the instrumentation flags in Correctly. Just be aware the code size and cpu usage is going to balloon

@dagar
Copy link
Member

dagar commented May 19, 2016

Thanks @davids5 that seems to be working.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 19, 2016

@davids5 Thanks, I was able to understand the patch.
@dagar I still get a fault during startup; are you getting any further?

sercon: Registering CDC/ACM serial driver
sercon: Successfully registered the CDC/ACM serial driver
param: selected parameter default file /fs/mtd_params
rgbled on I2C bus 1 at 0x55 (bus: 100 KHz, max: 100 KHz)
 1:   SYS_USE_IO: match
 2101: + SYS_AUTOSTART: match
[i] Custom: /fs/microsd/etc/config.txt 
px4io: CRCs match
dataman: Power on restart, data manager file '/fs/microsd/dataman' size is 103090 bytes
MS5611_SPI on SPI bus 1 at 3 (20000 KHz)
Assertion failed at file:irq_unexpectedisr.c line: 86 task: ms5611
sp:     20003140
IRQ stack:
  base: 200031ac
  size: 000002e8
20003140: 20003140 1000ad18 200031ac 080994b9 00000056 080d233b 1000a954 080996e9
20003160: 080a4d6d 08099589 00000010 080929f5 080929d9 080a7c7d 20002eb8 00000002
20003180: 1000a954 080a4d99 00000000 1000aa8c 080a1abb 0000002f 080d0b70 00000040
200031a0: 1000ab00 080996e9 1000a954 00000000 00000000 10000010 00000001 00000000
sp:     1000aa28
User stack:
  base: 1000ad18
  size: 000003fc
1000aa20: 00000010 00000000 080d0b70 1000aa8c 1000aac4 080a1abb 00000000 00000000
1000aa40: 080928c5 00000004 1000a9ec 0809616d 00000000 08093d45 080928c8 21000000
1000aa60: 00000000 00000000 1000aac4 08085557 08096139 080d0b70 00000040 1000ab00
1000aa80: 10004620 0809616d 1000aac4 080a1e49 080a9341 00000000 1000ab00 0000003f
1000aaa0: 0809397d 080ba58a 0808540b 00000000 1000ab00 20000dec 20000dec 08085557
1000aac0: 080d0b70 080bdced 080f85e8 00000000 00002600 1000ab9c 10003f30 1000789c
1000aae0: 1000ab9c 0808540b 10003f40 00000000 0809397d 080b23e7 20002eb8 08093b23
1000ab00: 10003b00 08093b29 08006101 08093583 10003f40 00000000 08006101 08093b29
1000ab20: 080b2511 080b27ef 080060e9 10003f30 10007860 08006101 10003f30 00000000
1000ab40: 0809a84d 080b23c1 1000789c 080853d9 1000ab9c 00002600 10004620 080b23e7
1000ab60: 080b2511 080b23c1 0808453d 1000ab9c 00002600 00000000 0808465f 0809a84d
1000ab80: 080b2511 20000dec 080845e3 00000003 0000004b 0808453d 3fc85a7a 20000dec
1000aba0: 10004110 0000004b 4054af4b 1000ac80 10004110 20000dec 00000001 080845e3
1000abc0: 08000b03 08084f9b 10007f90 00000020 080b1da9 00000001 1000ac40 080151a3
1000abe0: 1000a960 00000000 00000000 10003d70 080149ad 08014be5 10007f90 0807f95f
1000ac00: 10004110 08083e2b 1000ac80 20000dec 10004620 0000004b 00000000 0808465f
1000ac20: 00000001 10004110 0000004b 00000000 10007f90 00000020 00000002 080151cf
1000ac40: 08083e01 10004110 1000ac80 20000dec 0000004b 08083e2b 0000004b 1000ac80
1000ac60: 10004110 10004060 00000000 08014be5 10004110 00000008 00000000 080151cf
1000ac80: 002025f0 00000000 00000000 00000000 444ed8f5 44d19480 41bc0000 00000000
1000aca0: 20000050 20000050 10004060 10003ff0 08014c7d 08014be5 b3600804 6ddfbe3b
1000acc0: 7be462b8 62f46e78 00000000 20000050 00000000 00000003 2000008c 08014c7d
1000ace0: 00000002 00000003 10006dd0 10003cac 00000003 0801506f 00000000 00000000
1000ad00: 00000000 00000000 00000000 0809489d 00000000 00000000 ffffffff 00000000
R0: 080a1e49 00000004 80000000 e000ed04 1000aa8c 080a1abb 0000002f 080d0b70
R8: 00000040 1000ab00 1000a960 10004620 080b24e1 1000aa28 080a1e59 08099b7c
xPSR: 81000000 BASEPRI: 00000000 CONTROL: 00000000
EXC_RETURN: ffffffe9

@dagar
Copy link
Member

dagar commented May 19, 2016

I started fixing all the stack sizes in the stackcheck branch on master - https://github.com/PX4/Firmware/tree/stackcheck

I had to strip down px4fmu-v2_default to just fixed wing controllers, and estimator to get it building. At the moment there's still a weird hard fault when the sensors main exits. I don't know if this is getting us closer to fixing your issue, but the number of issues is concerning. We should think about either keeping stack check on for px4fmu-v4 or have a special _stackcheck build that's auto tested.

@davids5 is there a way to get a good stack trace where the hard fault occurs?

@kd0aij
Copy link
Contributor Author

kd0aij commented May 20, 2016

Observed another hardfault on FW in ekf_att_pos_estimator context (local branch rebased on recent master ):

ver git
FW git-hash: 288733fbe939427d4e47962d5074938025877444
FW version: v1.3.0rc3-323-g288733f (130)
OS version: 6.27 (627)

sercon: Registering CDC/ACM serial driver
sercon: Successfully registered the CDC/ACM serial driver
param: selected parameter default file /fs/mtd_params
rgbled on I2C bus 1 at 0x55 (bus: 100 KHz, max: 100 KHz)
 1:   SYS_USE_IO: match
 2101: + SYS_AUTOSTART: match
[i] Custom: /fs/microsd/etc/config.txt 
px4io: CRCs match
dataman: Power on restart, data manager file '/fs/microsd/dataman' size is 103090 bytes
MS5611_SPI on SPI bus 1 at 3 (20000 KHz)
bst: no devices found
adc init done
ver hwcmp match: PX4FMU_V2
HMC5883_I2C on I2C bus 1 at 0x1e (bus: 100 KHz, max: 400 KHz)
lis3mdl: no device on bus 2
LIS3MDL bad ID: 3chmc5883: no device on bus 1
mpu6000: no device on this bus
MPU6000 on SPI bus 1 at 4 (1000 KHz)
L3GD20 on SPI bus 1 at 1 (11000 KHz)
LSM303D on SPI bus 1 at 2 (11000 KHz)
Airspeed on I2C bus 1 at 0x28 (bus: 100 KHz, max: 100 KHz)
nsh: sf10a: command not found
sensors: mag cal status changed 
px4io default PWM output device
mavlink_if0: mode: 0, data rate: 1200 B/s on /dev/ttyS1 @ 57600B
mavlink_if0: offboard mission init: ERROR
 157600:   SYS_COMPANION: match
mavlink_if1: mode: 3, data rate: 1000 B/s on /dev/ttyS2 @ 57600B
ver hwcmp match: PX4FMU_V2
px4flow [166:100]
px4flow: scanning I2C buses for device..
mavlink_if2: mode: 5, data rate: 800000 B/s on /dev/ttyACM0 @ 57600B
param: Error: Parameter EKF2_REC_RPL not found.
 0:   SYS_LOGGER: match
WARN  log buffer size: 12288 bytes
FIXED WING 
[i] Mixer: /fs/microsd/etc/mixers/fpv_skywalker_stripped.mix on /dev/pwm_output0 
pwm: reading disarmed value of zero, disabling disarmed PWM
[i] Mixer: /etc/mixers/pass.aux.mix on /dev/pwm_output1 

[i] Addons script: /fs/microsd/etc/extras.txt 
device: /dev/pwm_output0
channel 1: 1000 us (default rate: 50 Hz failsafe: 1000, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 2: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 3: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 4: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 5: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 6: 1500 us (default rate: 50 Hz failsafe: 1500, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 7: 0 us (default rate: 50 Hz failsafe: 0, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel 8: 0 us (default rate: 50 Hz failsafe: 0, disarmed: 0 us, min: 1000 us, max: 2000 us)
channel group 0: channels 1 2
channel group 1: channels 5 6 7 8
channel group 2: channels 3 4

NuttShell (NSH)
nsh> mavlink_if1: Disabling hardware flow control
commander_low_prio: settings saved.
INFO  data link #0 lost
INFO  home: 39.7502281, -105.0979193, 1711.91
WARN  no GPS - navigator timed out
WARN  no GPS - navigator timed out
WARN  no GPS - navigator timed out
WAAssertion failed at file:armv7-m/up_hardfault.c line: 184 task: ekf_att_pos_estimator
sp:     20003f80
IRQ stack:
  base: 20003fcc
  size: 000002e8
20003f80: 080d2894 000000b8 0809715d 00000010 20003cd0 00000003 00000000 0809bb71
20003fa0: 0809bb5d 080a5b7b e000ed24 080a38f5 00000000 2001f758 080ca048 000182b8
20003fc0: 2001de90 0809bae1 20020e54 00000000 00000000 10000010 00000001 00000000
sp:     20020f28
User stack:
  base: 200212a0
  size: 000011f4
20020f20: 80000010 2001f758 2001f758 080ca048 000182b8 080843a7 2001fd68 20020f8c
20020f40: 20020f84 080a10ef 666b655b 65686320 205d6b63 65637865 76697373 79672065
20020f60: 6f206f72 65736666 00007374 08099387 080ca1cc 20020f84 0802a9cc 0805c4f3
20020f80: 080ca9cc 0805c223 080ca1dc 080ca9cc 00000000 00000000 00000000 080ca974
20020fa0: 080ca990 080ca9b0 080ca9cc 080ca9e4 080caa0c 080caa24 381b9551 00000000
20020fc0: bf7d1bc1 b5117cc0 bd8e0957 00000000 00000002 20022484 20021cd0 20022460
20020fe0: 200224d8 20021b3c 080ca020 420e4fe7 200211b8 2001de90 200211d8 2001deac
20021000: 2001efec 0808d807 080ca020 00000000 3f800000 00000000 080ca030 3a23165e
20021020: b8d952af 00000000 01010101 00000000 01010001 0039e3ef 0039e3ef 003a6165
20021040: 3f58fccf 3d770a73 3d232834 3f0692d4 bf8ae002 c08289f9 bc24b7be c13a9a9d
20021060: c242b19e c4d3a700 b55080b0 b6d3f91a b55f352f 39c30271 00000000 00000000
20021080: 3e3c2ef0 3cd4d016 3f046ad1 00000000 00000000 00000000 00000000 00000000
200210a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
200210c0: 00000016 00000000 00000000 00000001 00000000 2001dd50 2001f350 ffffffe9
200210e0: 4003507f 200211b8 00000000 2001ef00 200211e4 2001f338 2001f388 2001f6e0
20021100: 2001f700 00000003 2001f350 ffffffe9 3b830a0b 3c23d70a 00000000 00000000
20021120: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
20021140: 00000000 00000000 00000000 00000000 00000000 00000000 e40c71af 00000000
20021160: 00000000 0805e75f 20012c70 20024680 0000260b 2001f757 20024680 0000260b
20021180: 2001f388 08087fdf 08087fa5 080cf458 0000260b 080afac1 080afaad 00000003
200211a0: 2001f757 0809c6ad 2001ef00 2001f6b8 2001f678 0805e07f 2001f748 200247c0
200211c0: 00000000 00000000 00000000 0000260b e40c857b 00000000 2001f720 2001f6d0
200211e0: 0000260b 3dd6ac00 bda718e6 00000000 3c23d70a 00000000 00000000 000f423f
20021200: 00000000 000182b8 2001de90 2001ef00 2001f6c8 00000000 2001f300 0805ea15
20021220: 2001f270 2001f300 2001ef88 ffffffff 000182b8 00000000 00000000 00000000
20021240: 00000000 00000000 00000009 00000000 08090001 2001fb1c 0000000c 00000000
20021260: 00000101 2001fb4c 00000000 00000000 00000000 00000000 00000000 00000000
20021280: 00000000 00000000 00000000 00000000 00000000 08098679 00000000 00000000
R0: 20000f48 130872ed 20020f40 20020f40 2001f758 080ca048 000182b8 2001de90
R8: 2001ef00 2001f6c8 2001ef00 00000004 20020f60 20020f28 080843a7 08087c56
xPSR: 61000000 BASEPRI: 00000000 CONTROL: 00000000
EXC_RETURN: ffffffe9

(gdb) info line *0x08087c56
Line 311 of "../src/modules/uORB/uORBDevices_nuttx.cpp" starts at address 0x8087c56 <uORB::DeviceNode::publish(orb_metadata const*, void*, void const*)+2>
   and ends at 0x8087c5a <uORB::DeviceNode::publish(orb_metadata const*, void*, void const*)+6>.

@LorenzMeier
Copy link
Member

@davids5 @bkueng I would appreciate if you could have a look at this one as well. I've already spent some time digging into it, but not yet sure what is going on.

@davids5
Copy link
Member

davids5 commented May 20, 2016

@LorenzMeier

master's version of the PR is wrong - it is missing the second half of the changes

beta does not have the PR

@kd0aij

Please push the rebased version you are working on - so I can pull and test. Also please tell me what
config you are building, the value of SYS_AUTOSTART and how and to what it is connected to.

Also any tips on get to the crash

@dagar
Copy link
Member

dagar commented May 21, 2016

@davids5 fixed here #4593

@kd0aij I haven't been able to reproduce this. I tried fixed wing in HIL and on the bench.

@davids5
Copy link
Member

davids5 commented May 21, 2016

@LorenzMeier, @kd0aij, @dagar @bkueng

If the build and testing was done for stackchecking before #4593
it should be done again.

Any mismatch in Nuttx CONFIG_ARMV7M_STACKCHECK and px4 notion of the instrumentation setting is highly unstable to say the least.

@dagar
Copy link
Member

dagar commented May 21, 2016

@davids5 I think it's okay. It looks like the stack check instrument_flags were lost in a bad cherry-pick 5da9e7e#diff-23a5ada890263edf0da49d40d9c58f8c which happened after the stack checking. I verified a few of the recent stack changes with #4593.

I should emphasize again that this was not at all exhaustive and I only checked a FW config at startup. We should still run through FW, MC, VTOL with stack check enabled and actual usage.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 21, 2016

@davids5 I created the PR for which my faulting branch was intended: #4594

fault occurred with a px4fmu-v2_default build, SYS_AUTOSTART=2101
I think it was initially connected to QGC, but after shutting QGC down, I left it powered via USB on the bench. I noticed it had crashed sometime later, possibly several hours later.

@LorenzMeier
Copy link
Member

@kd0aij Can you please upload the binary next time this happens?

@LorenzMeier
Copy link
Member

LorenzMeier commented May 21, 2016

Hunted down to and fixed in fe60a43. Testing now.

@LorenzMeier
Copy link
Member

Confirmed to fix it. @kd0aij Please let me know if any issues remain and please try to test.

@kd0aij
Copy link
Contributor Author

kd0aij commented May 21, 2016

running beta now

@kd0aij
Copy link
Contributor Author

kd0aij commented May 21, 2016

@LorenzMeier Regarding the use of an uninitialized pointer as an ORB topic handle, the result in the publish method would be to overwrite an arbitrary memory location, not "stack smashing", if I understand the term correctly. Perhaps the ORB should be maintaining a list of valid handles in order to perform run-time validation, since that seems to be the only way to guarantee that it doesn't trash memory.

@dagar
Copy link
Member

dagar commented May 21, 2016

I realize this isn't helpful now, but out of curiosity I built and ran beta in the SITL gazebo simulation with address sanitizer enabled and our typical optimization level (-0s) and it caught the problem immediately. Another entry for the testing wishlist...

pxh> ekf_att_pos_estimator start
ERROR px4_task_spawn_cmd: failed to set name of thread 34 0

INFO  TONE_SET_ALARM 9
INFO  .
pxh> WARN  Not ready to fly: Sensors not set up correctly
INFO  filter ref off: baro_alt:  -0.2550
ASAN:SIGSEGV
=================================================================
==24597==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x00000055d8ee bp 0x2b240a9e0850 sp 0x2b240a9e0820 T23)
#0 0x55d8ed in uORB::DeviceNode::publish(orb_metadata const*, void*, void const*) /home/dagar/git/Firmware/build_posix_sitl_default/../src/modules/uORB/uORBDevices_posix.cpp:329:6
#1 0x7dbfbc in mavlink_vasprintf /home/dagar/git/Firmware/build_posix_sitl_default/../src/modules/systemlib/mavlink_log.c:77:3
#2 0x5f04b4 in AttitudePositionEstimatorEKF::initReferencePosition(unsigned long, bool, double, double, float, float) /home/dagar/git/Firmware/build_posix_sitl_default/../src/modules/ekf_att_pos_estimator/ekf_att_pos_estimator_main.cpp:788:3
#3 0x5f5645 in AttitudePositionEstimatorEKF::initializeGPS() /home/dagar/git/Firmware/build_posix_sitl_default/../src/modules/ekf_att_pos_estimator/ekf_att_pos_estimator_main.cpp:822:2
#4 0x5f128a in AttitudePositionEstimatorEKF::task_main() /home/dagar/git/Firmware/build_posix_sitl_default/../src/modules/ekf_att_pos_estimator/ekf_att_pos_estimator_main.cpp:707:7
#5 0x50cf08 in entry_adapter(void*) /home/dagar/git/Firmware/build_posix_sitl_default/../src/platforms/posix/px4_layer/px4_posix_tasks.cpp:104:2
#6 0x2b2404255181 in start_thread /build/eglibc-3GlaMS/eglibc-2.19/nptl/pthread_create.c:312
#7 0x2b24051a047c in clone /build/eglibc-3GlaMS/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@dagar dagar mentioned this issue May 21, 2016
dagar added a commit to dagar/PX4-Firmware that referenced this issue May 21, 2016
@kd0aij
Copy link
Contributor Author

kd0aij commented May 21, 2016

@dagar Is an "address sanitizer" able to catch all usages of uninitialized pointers? I assume the answer is no, else the compiler would be doing it.

@dagar
Copy link
Member

dagar commented May 21, 2016

Address santizier (ASan) adds instrumentation, but you need to actually run the code (http://clang.llvm.org/docs/AddressSanitizer.html).

The tool can detect the following types of bugs:

  • Out-of-bounds accesses to heap, stack and globals
  • Use-after-free
  • Use-after-return (to some extent)
  • Double-free, invalid free
  • Memory leaks (experimental)
    Typical slowdown introduced by AddressSanitizer is 2x.

ASan happened to make this bug obvious (just running the right sitl simulation in gdb would have too), but memory sanitizer is actually what we'd want for catching uninitialized reads.
http://clang.llvm.org/docs/MemorySanitizer.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants