Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UAVCAN external mag use #2912

Closed
mhkabir opened this issue Sep 26, 2015 · 32 comments
Closed

UAVCAN external mag use #2912

mhkabir opened this issue Sep 26, 2015 · 32 comments
Assignees
Labels

Comments

@mhkabir
Copy link
Member

mhkabir commented Sep 26, 2015

The compass auto-switching seemed to be working on the delta angles pull without any timing hacks, but now again only the internal one is being used. This needs to be vetted and fixed.
The UAVCAN mag gets instance ID 1 currently, but has a higher prio, so should be used instead of the internal.

@pavel-kirienko Some help on debugging this appreciated.

@pavel-kirienko
Copy link
Member

Should be fixed by #2914

@mhkabir
Copy link
Member Author

mhkabir commented Sep 27, 2015

Ok, issue isn't related to timestamping it seems. I'm beginning to thing that Brian's test on #2818 and mine were somehow false positives and it worked because of boot timing artifacts. Or mayhe this is a a regression.

attitude_estimator_q: mag status:
...
INFO  sensor #1:
INFO    val:   0.4045, lp:   0.4040 mean dev:   0.0000 RMS:   0.0003
INFO    val:  -0.0166, lp:  -0.0156 mean dev:   0.0000 RMS:   0.0005
INFO    val:   0.2360, lp:   0.2358 mean dev:  -0.0000 RMS:   0.0073
INFO  sensor #2:
INFO    val:   0.4045, lp:   0.4042 mean dev:   0.0000 RMS:   0.0003
INFO    val:  -0.0166, lp:  -0.0161 mean dev:  -0.0000 RMS:   0.0005
INFO    val:   0.2360, lp:   0.2359 mean dev:  -0.0000 RMS:   0.0073
  • The internal one #0 is used in the estimator only. The uavcan one isn't being used by default.
  • Stopping the lsm303d driver "testerror" shows fallback to external. SENSOR FAILSAFE!"
attitude_estimator_q: mag status:
INFO  validator: best: 1, prev best: 0, failsafe: YES (# 12)

INFO  sensor #0:
INFO    val:   0.4067, lp:   0.4068 mean dev:  -0.0000 RMS:   0.0010
INFO    val:  -0.0409, lp:  -0.0408 mean dev:  -0.0000 RMS:   0.0009
INFO    val:   0.2849, lp:   0.2840 mean dev:  -0.0000 RMS:   0.0012
INFO  sensor #1:
INFO    val:   0.4045, lp:   0.4040 mean dev:   0.0000 RMS:   0.0003
INFO    val:  -0.0166, lp:  -0.0156 mean dev:   0.0000 RMS:   0.0005
INFO    val:   0.2360, lp:   0.2358 mean dev:  -0.0000 RMS:   0.0073
INFO  sensor #2:
INFO    val:   0.4045, lp:   0.4042 mean dev:   0.0000 RMS:   0.0003
INFO    val:  -0.0166, lp:  -0.0161 mean dev:  -0.0000 RMS:   0.0005
INFO    val:   0.2360, lp:   0.2359 mean dev:  -0.0000 RMS:   0.0073

@mhkabir
Copy link
Member Author

mhkabir commented Sep 27, 2015

@pavel-kirienko

EDIT : Nevermind. I added some debugging (and formatting fixes :P ) to the voting class, the prios are OK. UAVCAN is higher than internal.

attitude_estimator_q: mag status:

INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)

INFO  sensor #0:
INFO    priority : 50
INFO    val:   0.3810, lp:   0.3809 mean dev:  -0.0000 RMS:   0.0010
INFO    val:  -0.1397, lp:  -0.1398 mean dev:   0.0000 RMS:   0.0008
INFO    val:   0.2767, lp:   0.2768 mean dev:   0.0000 RMS:   0.0012

INFO  sensor #1:
INFO    priority : 75
INFO    val:   0.3778, lp:   0.3779 mean dev:  -0.0000 RMS:   0.0002
INFO    val:  -0.1233, lp:  -0.1233 mean dev:   0.0000 RMS:   0.0003
INFO    val:   0.2333, lp:   0.2333 mean dev:  -0.0000 RMS:   0.0004

Still don't get why:

  1. Three mags show up (1,2 are duplicates) even though there are only 2 mag instances (0-internal and 1-uavcan)
  2. The validator isn't setting UAVCAN to it's "best" sensor when it is added later.

@mhkabir
Copy link
Member Author

mhkabir commented Sep 27, 2015

@LorenzMeier Any ideas on the above? I'm trying to debug it in any case.

@mhkabir
Copy link
Member Author

mhkabir commented Sep 27, 2015

Things seem to be quite messed up. How many magnetometers do we have onboard anyway? :O Booting without UAVCAN gives me this directly :

attitude_estimator_q: mag status:
INFO  validator: best: 1, prev best: 0, failsafe: YES (# 5)
INFO  sensor #0:
INFO    priority: 100
INFO    val:   0.3995, lp:   0.3995 mean dev:   0.0000 RMS:   0.0003
INFO    val:  -0.1543, lp:  -0.1543 mean dev:   0.0000 RMS:   0.0002
INFO    val:   0.2610, lp:   0.2610 mean dev:   0.0000 RMS:   0.0007
INFO  
INFO  sensor #1:
INFO    priority: 50
INFO    val:   0.3829, lp:   0.3827 mean dev:   0.0000 RMS:   0.0010
INFO    val:  -0.1448, lp:  -0.1447 mean dev:  -0.0000 RMS:   0.0009
INFO    val:   0.2762, lp:   0.2745 mean dev:  -0.0000 RMS:   0.0012
INFO  
INFO  sensor #2:
INFO    no data

And what the hell do we have here?

nsh> free
             total       used       free    largest
Mem:        230000     187504  268520716  268489740

When a sensor failsafe is triggerred, it seems that we have a memory(?) issue as well. No NSH commands work:

nsh> attitude_estimator_q: command not found

@pavel-kirienko
Copy link
Member

Um, you're running out of memory.

I'm going to try to reproduce it tomorrow again, with the timestamping fix.

@pavel-kirienko
Copy link
Member

Booted with UAVCAN sensors disconnected:

attitude_estimator_q: mag status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0:

INFO    priority: 50

INFO    val:   0.0201, lp:   0.0202 mean dev:  -0.0000 RMS:   0.0006

INFO    val:  -0.1301, lp:  -0.1297 mean dev:   0.0000 RMS:   0.0012

INFO    val:   0.3862, lp:   0.3861 mean dev:   0.0000 RMS:   0.0007

INFO  sensor #1:

INFO    no data

INFO  sensor #2:

INFO    no data

attitude_estimator_q: running

So far so good.

Connecting Zubax GNSS while PX4 is running:

attitude_estimator_q: mag status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0:

INFO    priority: 50

INFO    val:   0.0132, lp:   0.0136 mean dev:  -0.0000 RMS:   0.0007

INFO    val:  -0.1322, lp:  -0.1324 mean dev:  -0.0000 RMS:   0.0016

INFO    val:   0.3874, lp:   0.3876 mean dev:   0.0000 RMS:   0.0007

INFO  sensor #1:

INFO    priority: 75

INFO    val:  -0.2091, lp:  -0.2092 mean dev:  -0.0000 RMS:   0.0001

INFO    val:   0.0488, lp:   0.0488 mean dev:   0.0000 RMS:   0.0001

INFO    val:   0.4819, lp:   0.4819 mean dev:  -0.0000 RMS:   0.0001

INFO  sensor #2:

INFO    priority: 75

INFO    val:  -0.2119, lp:  -0.2105 mean dev:  -0.0000 RMS:   0.0001

INFO    val:   0.0506, lp:   0.0505 mean dev:   0.0000 RMS:   0.0001

INFO    val:   0.4838, lp:   0.4837 mean dev:   0.0000 RMS:   0.0001

attitude_estimator_q: running

Wat?

Ok, rebooting PX4 with Zubax GNSS still connected:

attitude_estimator_q: mag status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0:

INFO    priority: 50

INFO    val:   0.0139, lp:   0.0138 mean dev:   0.0000 RMS:   0.0006

INFO    val:  -0.1322, lp:  -0.1322 mean dev:  -0.0000 RMS:   0.0012

INFO    val:   0.3880, lp:   0.3880 mean dev:  -0.0000 RMS:   0.0007

INFO  sensor #1:

INFO    priority: 75

INFO    val:  -0.2099, lp:  -0.2099 mean dev:  -0.0000 RMS:   0.0005

INFO    val:   0.0515, lp:   0.0515 mean dev:   0.0000 RMS:   0.0004

INFO    val:   0.4802, lp:   0.4801 mean dev:  -0.0000 RMS:   0.0004

INFO  sensor #2:

INFO    no data

The phantom magnetometer disappeared again.

This is what uavcan status tells me:

nsh> uavcan status
ESC actuators control groups: sub: 0 / req: 0 / fds: 2
ESC mixer: NONE
Sensor 'gnss':
RX errors: 0, receiver node id: 125

Sensor 'mag':
devname: /dev/mag
channel 0: node id 125 --> class instance 1
channel 1: empty
channel 2: empty
channel 3: empty
channel 4: empty

Sensor 'baro':
devname: /dev/baro
channel 0: empty
channel 1: empty
channel 2: empty
channel 3: empty
channel 4: empty

Where did the third magnetometer come from?
Why the priority of mags 1 and 2 is 75, whereas it is configured as ORB_PRIO_HIGH, which is 100?
Why does the estimator prefer a sensor with priority 50 instead of 75?

@LorenzMeier
Copy link
Member

75 is ORB_PRIO_DEFAULT, so something is not going right when those are advertised or evaluated.

@LorenzMeier
Copy link
Member

Please re-try with master. I've improved the failover logic, in particular for cases where a higher priority sensor becomes available later.

@davids5
Copy link
Member

davids5 commented Oct 3, 2015

Is the fw server running? Could there be 2 node ids, one at boot, then switched to a new one based on the uuId byte ordering or parameter stored fo node Id?

@pavel-kirienko
Copy link
Member

David, no, since the bootloader doesn't publish sensor msgs
On Oct 3, 2015 4:32 PM, "David Sidrane" notifications@github.com wrote:

Is the fw server running? Could there be 2 node ids, one at boot, then
switched to a new one based on the uuId byte ordering or parameter stored
fo node Id?


Reply to this email directly or view it on GitHub
#2912 (comment).

@pavel-kirienko
Copy link
Member

System should have one mag

Doesn't seem to be the case - see my posts from yesterday.

https://github.com/PX4/Firmware/blob/b067a0c09438575dc42601b156aa426b26bf07d2/src/lib/ecl/validation/data_validator_group.cpp#L111-L113
If there is an issue with the priority handling, its there

Seems to be unrelated - see below.

Check if it advertises twice

Checked, no it doesn't advertise twice.

However, I found something weirder that I don't understand.

1. Stack size

When dealing with weird issues I check stack usage, which I did this time. I noticed that top reports that the UAVCAN thread has only 90 bytes of stack free, which may lead to undetectable stack overflows, so I increased it by 1.5K (just to be on the safe side), and peak stack usage increased by ~20 bytes as well, under the same testing conditions. It didn't resolve any of the known issues though. Nevertheless, I think it makes sense to enlarge the stack by at least 300 bytes (@davids5 is going to like it).

2. Advertisement logic

Either it's flawed or I'm missing something.

We have this code in UavcanCDevSensorBridgeBase which tries to acquire a class instance for a given UAVCAN sensor node, see https://github.com/PX4/Firmware/blob/master/src/modules/uavcan/sensors/sensor_bridge.cpp#L110. Once a class instance is acquired, it will be used as a suffix for device name, e.g. /obj/sensor_magX in place of the X. First problem: the code that does this acquiring doesn't seem to influence actual ORB advertisement in any way - follow the code path and you'll see that they are completely unrelated. The issue seems to be introduced with this change.

Then, the call to orb_advertise_multi() eventually brings us to uORB::DeviceMaster::ioctl(). It fails to init() a new DeviceNode because /obj/sensor_mag0 already exists, so it deletes it and claims the existing node instead. Priority seems to be disregarded completely, or I don't see where and how it is passed to the claimed node.

Also, delete node caused my GDB to complain about missing vtable, which normally hints to memory corruption, but I couldn't find any issues with that manually, so it could be caused by the IDE trying to update fields of a deleted object in the Variables view - I'm not sure.

Any suggestions?

@LorenzMeier
Copy link
Member

You need to untangle the class instance from the ORB publication. They are entirely unrelated.

The issue is potentially here:
https://github.com/PX4/Firmware/blob/master/src/modules/uavcan/sensors/sensor_bridge.cpp#L123

&channel->orb_instance

Has to be a valid pointer:
https://github.com/PX4/Firmware/blob/master/src/modules/uORB/uORBDevices_nuttx.cpp#L593

The logic then tries to find the first free uORB group ID and allocates it. It should never try to allocate an existing one.

@pavel-kirienko
Copy link
Member

@LorenzMeier

You need to untangle the class instance from the ORB publication. They are entirely unrelated.

They are untangled. I just compared the logic of uORB advertisements in UAVCAN with the HMC5883 driver, seems identical.

Has to be a valid pointer:

It is a valid pointer.

So the issue is elsewhere, apparently.

@pavel-kirienko
Copy link
Member

@LorenzMeier I would appreciate if you could have a look at it. I checked with a debugger that the UAVCAN driver advertises the mag topic only once, yet the attitude estimator sees two sensors.

@mhkabir
Copy link
Member Author

mhkabir commented Oct 10, 2015

@pavel-kirienko Can you try checking the status output when UAVCAN is enabled, but with the cable unplugged? I remember something was weird, which might help us.
I don't have the uavcan MAV right now with me to test.

@pavel-kirienko
Copy link
Member

Looks sane to me (cable unplugged):

nsh> uavcan status
uavcan: application not running
nsh> uavcan start
uavcan: Node ID 1, bitrate 1000000
uavcan: SW version vcs_commit: 0x11a42b0b
uavcan: sensor bridge 'gnss' init ok
uavcan: sensor bridge 'mag' init ok
uavcan: sensor bridge 'baro' init ok
nsh> uavcan start fw
nsh> attitude_estimator_q status
attitude_estimator_q: gyro status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 100
INFO    priority: 100

INFO    val:  -0.0022, lp:  -0.0022 mean dev:   0.0000 RMS:   0.0011 conf:   1.0000
INFO    val:  -0.0047, lp:  -0.0049 mean dev:   0.0000 RMS:   0.0011 conf:   1.0000
INFO    val:   0.0126, lp:   0.0126 mean dev:   0.0000 RMS:   0.0005 conf:   1.0000
INFO  sensor #1, prio: 75
INFO    priority: 75

INFO    val:   0.0035, lp:   0.0074 mean dev:   0.0000 RMS:   0.0048 conf:   1.0000
INFO    val:   0.0099, lp:   0.0086 mean dev:   0.0000 RMS:   0.0075 conf:   1.0000
INFO    val:  -0.0013, lp:   0.0003 mean dev:  -0.0000 RMS:   0.0029 conf:   1.0000
attitude_estimator_q: accel status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 100
INFO    priority: 100

INFO    val:  -0.3575, lp:  -0.3588 mean dev:   0.0000 RMS:   0.0069 conf:   1.0000
INFO    val:   0.1259, lp:   0.1217 mean dev:   0.0000 RMS:   0.0069 conf:   1.0000
INFO    val:  -9.8629, lp:  -9.8600 mean dev:  -0.0000 RMS:   0.0131 conf:   1.0000
INFO  sensor #1, prio: 75
INFO    priority: 75

INFO    val:  -0.3626, lp:  -0.3782 mean dev:  -0.0000 RMS:   0.0203 conf:   1.0000
INFO    val:   0.1893, lp:   0.1935 mean dev:  -0.0000 RMS:   0.0172 conf:   1.0000
INFO    val:  -9.9017, lp:  -9.9288 mean dev:   0.0000 RMS:   0.0301 conf:   1.0000
attitude_estimator_q: mag status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 50
INFO    priority: 50

INFO    val:  -0.3459, lp:  -0.3460 mean dev:   0.0000 RMS:   0.0015 conf:   1.0000
INFO    val:   1.0389, lp:   1.0405 mean dev:  -0.0000 RMS:   0.0024 conf:   1.0000
INFO    val:   0.5118, lp:   0.5113 mean dev:   0.0000 RMS:   0.0009 conf:   1.0000
attitude_estimator_q: running

@mhkabir
Copy link
Member Author

mhkabir commented Oct 10, 2015

Hmm. Can you try with UAVCAN startup at boot? UAVCAN_ENABLED 2
Also, did you tryout this pull? : #2918

If it works for now, we can track down the phantom mag later.

@pavel-kirienko
Copy link
Member

Can you try with UAVCAN startup at boot?

With UAVCAN started at boot, cable unplugged, still one mag:

nsh> attitude_estimator_q status
attitude_estimator_q: gyro status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 100
INFO    val:  -0.0025, lp:  -0.0012 mean dev:  -0.0000 RMS:   0.0015 conf:   1.0000
INFO    val:  -0.0053, lp:  -0.0047 mean dev:  -0.0000 RMS:   0.0011 conf:   1.0000
INFO    val:   0.0154, lp:   0.0154 mean dev:  -0.0000 RMS:   0.0005 conf:   1.0000
INFO  sensor #1, prio: 75
INFO    val:   0.0034, lp:   0.0079 mean dev:  -0.0000 RMS:   0.0054 conf:   1.0000
INFO    val:  -0.0020, lp:  -0.0006 mean dev:  -0.0000 RMS:   0.0080 conf:   1.0000
INFO    val:  -0.0020, lp:  -0.0023 mean dev:   0.0000 RMS:   0.0030 conf:   1.0000
attitude_estimator_q: accel status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 100
INFO    val:  -0.3348, lp:  -0.3336 mean dev:  -0.0000 RMS:   0.0069 conf:   1.0000
INFO    val:   0.1040, lp:   0.1012 mean dev:   0.0000 RMS:   0.0068 conf:   1.0000
INFO    val:  -9.7792, lp:  -9.7841 mean dev:   0.0000 RMS:   0.0117 conf:   1.0000
INFO  sensor #1, prio: 75
INFO    val:  -0.2869, lp:  -0.3072 mean dev:   0.0000 RMS:   0.0207 conf:   1.0000
INFO    val:   0.0754, lp:   0.0678 mean dev:  -0.0000 RMS:   0.0173 conf:   1.0000
INFO    val:  -9.8411, lp:  -9.8289 mean dev:  -0.0000 RMS:   0.0313 conf:   1.0000
attitude_estimator_q: mag status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 50
INFO    val:  -0.4028, lp:  -0.4028 mean dev:   0.0000 RMS:   0.0013 conf:   1.0000
INFO    val:   1.1798, lp:   1.1807 mean dev:  -0.0000 RMS:   0.0026 conf:   1.0000
INFO    val:   0.4452, lp:   0.4450 mean dev:  -0.0000 RMS:   0.0010 conf:   1.0000
attitude_estimator_q: running

Also, did you tryout this pull? : #2918

Yes, doesn't help:

nsh> attitude_estimator_q status
attitude_estimator_q: gyro status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 100
INFO    val:  -0.0026, lp:  -0.0019 mean dev:   0.0000 RMS:   0.0006 conf:   1.0000
INFO    val:  -0.0042, lp:  -0.0041 mean dev:   0.0000 RMS:   0.0006 conf:   1.0000
INFO    val:   0.0137, lp:   0.0140 mean dev:  -0.0000 RMS:   0.0005 conf:   1.0000
INFO  sensor #1, prio: 75
INFO    val:   0.0055, lp:   0.0023 mean dev:  -0.0000 RMS:   0.0047 conf:   1.0000
INFO    val:  -0.0073, lp:  -0.0053 mean dev:  -0.0000 RMS:   0.0076 conf:   1.0000
INFO    val:  -0.0028, lp:  -0.0011 mean dev:  -0.0000 RMS:   0.0031 conf:   1.0000
attitude_estimator_q: accel status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 100
INFO    val:  -0.5097, lp:  -0.5113 mean dev:   0.0000 RMS:   0.0067 conf:   1.0000
INFO    val:   0.3201, lp:   0.3159 mean dev:   0.0000 RMS:   0.0067 conf:   1.0000
INFO    val:  -9.8050, lp:  -9.8082 mean dev:  -0.0000 RMS:   0.0119 conf:   1.0000
INFO  sensor #1, prio: 75
INFO    val:  -0.4967, lp:  -0.4718 mean dev:  -0.0000 RMS:   0.0199 conf:   1.0000
INFO    val:   0.3577, lp:   0.3489 mean dev:   0.0000 RMS:   0.0166 conf:   1.0000
INFO    val:  -9.9595, lp:  -9.9534 mean dev:  -0.0000 RMS:   0.0297 conf:   1.0000
attitude_estimator_q: mag status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 50
INFO    val:  -0.4109, lp:  -0.4111 mean dev:   0.0000 RMS:   0.0012 conf:   1.0000
INFO    val:   1.1172, lp:   1.1181 mean dev:   0.0000 RMS:   0.0020 conf:   1.0000
INFO    val:   0.4852, lp:   0.4854 mean dev:  -0.0000 RMS:   0.0009 conf:   1.0000
attitude_estimator_q: running
nsh> uavcan start fw
nsh> uavcan_mag adding channel 124...
uavcan_mag channel 124 class instance 1 ok
uavcan: GNSS receiver node ID: 124
position_estimator_inav: init ref: lat=55.6046045, lon=37.7156301, alt=222.8660
commander: home: lat = 55.6046042, lon = 37.7156303, alt = 222.86
commander: mission ok
attitude_estimator_q status
attitude_estimator_q: gyro status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 100
INFO    val:  -0.0012, lp:  -0.0008 mean dev:   0.0000 RMS:   0.0006 conf:   1.0000
INFO    val:  -0.0034, lp:  -0.0040 mean dev:   0.0000 RMS:   0.0006 conf:   1.0000
INFO    val:   0.0150, lp:   0.0152 mean dev:   0.0000 RMS:   0.0006 conf:   1.0000
INFO  sensor #1, prio: 75
INFO    val:   0.0090, lp:   0.0086 mean dev:  -0.0000 RMS:   0.0047 conf:   1.0000
INFO    val:   0.0111, lp:   0.0125 mean dev:   0.0000 RMS:   0.0077 conf:   1.0000
INFO    val:  -0.0008, lp:  -0.0007 mean dev:  -0.0000 RMS:   0.0032 conf:   1.0000
attitude_estimator_q: accel status:
INFO  validator: best: 0, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 100
INFO    val:  -0.5085, lp:  -0.5141 mean dev:   0.0000 RMS:   0.0069 conf:   1.0000
INFO    val:   0.3092, lp:   0.3087 mean dev:   0.0000 RMS:   0.0067 conf:   1.0000
INFO    val:  -9.7710, lp:  -9.7778 mean dev:   0.0000 RMS:   0.0117 conf:   1.0000
INFO  sensor #1, prio: 75
INFO    val:  -0.4786, lp:  -0.4809 mean dev:  -0.0000 RMS:   0.0199 conf:   1.0000
INFO    val:   0.3447, lp:   0.3509 mean dev:   0.0000 RMS:   0.0166 conf:   1.0000
INFO    val:  -9.9586, lp:  -9.9612 mean dev:  -0.0000 RMS:   0.0293 conf:   1.0000
attitude_estimator_q: mag status:
INFO  validator: best: 1, prev best: 0, failsafe: NO (# 0)
INFO  sensor #0, prio: 50
INFO    val:  -0.4124, lp:  -0.4115 mean dev:   0.0000 RMS:   0.0013 conf:   1.0000
INFO    val:   1.1149, lp:   1.1152 mean dev:  -0.0000 RMS:   0.0020 conf:   1.0000
INFO    val:   0.4876, lp:   0.4875 mean dev:   0.0000 RMS:   0.0009 conf:   1.0000
INFO  sensor #1, prio: 75
INFO    val:  -0.1311, lp:  -0.1311 mean dev:   0.0000 RMS:   0.0004 conf:   1.0000
INFO    val:  -0.1273, lp:  -0.1273 mean dev:  -0.0000 RMS:   0.0002 conf:   1.0000
INFO    val:   0.4509, lp:   0.4509 mean dev:  -0.0000 RMS:   0.0002 conf:   1.0000
INFO  sensor #2, prio: 75
INFO    val:  -0.1338, lp:  -0.1338 mean dev:   0.0000 RMS:   0.0004 conf:   1.0000
INFO    val:  -0.1264, lp:  -0.1265 mean dev:  -0.0000 RMS:   0.0002 conf:   1.0000
INFO    val:   0.4528, lp:   0.4523 mean dev:   0.0000 RMS:   0.0002 conf:   1.0000
attitude_estimator_q: running

Shall we escalate the issue to uORB or sensor management?

@LorenzMeier
Copy link
Member

@pavel-kirienko Does your phantom mag change values, or is it frozen at one value? It seems different but very close to the 2nd mag, so the publication has to come from UAVCAN.

@pavel-kirienko
Copy link
Member

Does your phantom mag change values, or is it frozen at one value?

Yes it does change values.

@mhkabir
Copy link
Member Author

mhkabir commented Oct 10, 2015

@pavel-kirienko #2918 gets the mag into a working state atleast, irrespective of the phantom magnetometer.

@pavel-kirienko
Copy link
Member

@LorenzMeier it would help a lot if you or someone else familiar with internals of uORB could have a look at this issue (related #3122). It looks like the problem is caused by uORB rather than UAVCAN driver (see my descriptions above). Thanks in advance.

@LorenzMeier LorenzMeier added this to the Release v1.1.0 milestone Nov 7, 2015
@LorenzMeier LorenzMeier self-assigned this Nov 7, 2015
@LorenzMeier LorenzMeier added the bug label Nov 7, 2015
@LorenzMeier
Copy link
Member

Could you please repro with current master, just so we're sure we're not chasing a ghost? I will try to come up with a unit test catching this.

@mhkabir
Copy link
Member Author

mhkabir commented Nov 7, 2015

Yep 100% repro. Just tried.

@LorenzMeier
Copy link
Member

Thanks. Will dig into this.

@mhkabir
Copy link
Member Author

mhkabir commented Nov 12, 2015

@LorenzMeier @pavel-kirienko
I had a in-depth look into the voter class today, and I noticed the reason why we keep going into failsafe.

It keeps switching between mags 0(internal) and 1 (uavcan) continiously. Some debugging shows
that :

  1. First it takes the mag 1 as best (this is good)
  2. Then it switches down to mag 0 (this is a failsafe event), even though mag 1 has greater priority.
INFO  precheck confidence :   0.0000
INFO  precheck prio : 75
INFO  new confidence :   1.0000
INFO  new prio : 50
telem> Mag failure!
telem> SENSOR FAILSAFE! RETURN TO LAND IMMEDIATELY

So - our UAVCAN mag confidence keeps dropping to 0 for some reason. I added debugging, which shows that we fail here :
https://github.com/PX4/Firmware/blob/master/src/lib/ecl/validation/data_validator.cpp#L124-L127
3. Then again goes back to 1 as best (not failsafe)

2 and 3 keep happening continuously, with some delay in-between.

@mhkabir
Copy link
Member Author

mhkabir commented Nov 12, 2015

Woohoo. Gotcha - we got the exact same sensor value 100 times in a row.
That's what causes the failsafe.

@pavel-kirienko Your game now, I guess.
@LorenzMeier It could also be that the same mag values are pushed into sensor_combined even when no new values are available.

Increasing the threshold to 200 instead of 100 'fixes' it.

@pavel-kirienko
Copy link
Member

@mhkabir I can't repro this. See the log: ftp://pkir.net/px4_uavcan_mag_samples.txt.

I have added this to your branch to print the mag values:

diff --git a/src/modules/uavcan/sensors/mag.cpp b/src/modules/uavcan/sensors/mag.cpp
index e184ff1..d2a3926 100644
--- a/src/modules/uavcan/sensors/mag.cpp
+++ b/src/modules/uavcan/sensors/mag.cpp
@@ -160,5 +160,6 @@ void UavcanMagnetometerBridge::mag_sub_cb(const uavcan::ReceivedDataStructure<ua
        _report.z = (msg.magnetic_field_ga[2] - _scale.z_offset) * _scale.z_scale;
        unlock();

+       DEVICE_LOG("V %.6f %.6f %.6f", double(_report.x), double(_report.y), double(_report.z));
        publish(msg.getSrcNodeID().get(), &_report);
 }

It can be seen in the log that the sensor values do not get anywhere close to 100 successive identical values prior to failure.

@mhkabir
Copy link
Member Author

mhkabir commented Nov 12, 2015

I put in debug printfs here:

  1. https://github.com/PX4/Firmware/blob/master/src/lib/ecl/validation/data_validator.cpp#L126
  2. https://github.com/PX4/Firmware/blob/master/src/lib/ecl/validation/data_validator_group.cpp#L157

Whenever the switching was occurring, I can clearly see the validator dropping the confidence to zero, trigerring the first printf, and then the second one. Please see if you can repro this from inside the validator.

It may be so that the sensors app (since it paces the sensor_combined publication at gyro rate) is pushing the same data multiple times into the message.

@mhkabir
Copy link
Member Author

mhkabir commented Nov 14, 2015

This is almost completely fixed by :

  1. Commander : Effectively track failure reporting and handle hotplug sensors #3171
  2. Improved sensor failsafe reporting  #3183
  3. Zubax GNSS update to increase magnetometer data rate (@pavel-kirienko)

These, once done gives us complete and working UAVCAN sensor support (related : #3158) just in time for the v1.1.0 release :)

The apparent uORB bug still remains which causes a phantom magnetometer to be reported, but this does not affect performance.

@LorenzMeier
Copy link
Member

I'm closing this for now. Looks like we need to set up our own vehicle with UAVCAN to debug this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants