Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copter: Bad GPS Health Too Aggressive #13459

Open
cglusky opened this issue Feb 2, 2020 · 16 comments
Open

Copter: Bad GPS Health Too Aggressive #13459

cglusky opened this issue Feb 2, 2020 · 16 comments

Comments

@cglusky
Copy link

cglusky commented Feb 2, 2020

Is your feature request related to a problem? Please describe.
User gets a CRT: Bad GPS Health message if GPA Delta goes above 200ms. It's a bit scary and perhaps a bit aggressive if it just happens on occasion. I think to most users it would mean land as soon as practical with a switch to a non GPS assist mode.

This is happening on a PixRacer setup with mRo Purple GPS. It also has every serial port stuffed and full logging enabled. But it seems to get worse with more satellites so it could be similar to:

https://discuss.cubepilot.org/t/here-gps-bad-gps-health/999

So likely a serial bus a bit busy. But based on how things are trending with flow and lidar and other serial sensors, I don't think that is going to get better.

Describe the solution you'd like
Filter GPA Delta to reduce triggering the message. Say if it happens three times in three seconds then consider it Bad Health. Not sure what a good filter would look like so that's just a guess.

Describe alternatives you've considered
Per the cube pilot post you could change elevation mask in GPS so you are not seeing sats that are likely not aiding anyway. Assuming the delay is on the GPS side.

Platform
[ ] All
[ ] AntennaTracker
[x] Copter
[ ] Plane
[ ] Rover
[ ] Submarine

@Naterater
Copy link
Contributor

AGREED. The Here+ GNSS unit does this at least 3x in an hour in my experience, and the message relayed to the GCS causes major concern when in reality there is really not a problem.

@WickedShell
Copy link
Contributor

WickedShell commented Feb 3, 2020

I'm going to strongly disagree with this. If you are exceeding this value the EKF is rejecting the data from position fusion, which means you are falling back to a no GPS mode in the EKF, in that mode clearly your GPS is unhealthy.

There are a couple of ways people typically get themselves into this:

  • First requesting triple constellation on a M8n GPS unit. The GPS is only specced as being able to maintain 5Hz with dual constellation, triple constellation it just can't keep up with. As an aside some mRo units are known to ship in this configuration which is problematic, I'm unsure if they still ship in this configuration or not. EDIT: This was corrected about a year ago, so my comments on the mRo side were in error.
  • Requesting 10Hz updates with dual constellation on a M8N. Again the GPS on a M8N is only able to do 5Hz with dual constellation (IE GPS + GLONASS). The M8Q can do 10Hz in this situation.
  • Enabling raw logging, which in some situations can saturate the link, which can introduce delay and jitter. Since the M8 units can't do this I'm happy to rule that out in this case.

The serial bus being busy is something I think we can rule out. A serial link is dedicated to each device, it's not a multidrop network. The only way the serial devices should be able to interfere with eachother is if processing the data off a link takes to long, and if you are getting delays of 25+ ms from processing serial data buffers then we have serious other issues to look at, and the GPS warning isn't the root problem.

Looking at the posted screenshots on discuss I'd guess that the GPS is either trying to process to many constellations and running out of processing power, and thats causing your jitter to rise once you cross a threshold number of SV's. The other one that comes to mind is the Here2 has a processor sitting inbetween the ArduPilot and uBlox chips, and there may be some weird condition inducing jitter there. But given that it appears to correspond with the number of SV's I'd guess it's to many constellations/to high an update rate.

I'll tag this as a devcall topic, but I'm pretty opposed to making this any higher. (I could see possibly raising the threshold by 5ms, but I'd really like to be able to rule out the 4Hz GPS units more generally which is why I haven't done that).

@cglusky
Copy link
Author

cglusky commented Feb 3, 2020

In my case they are mRo units. I will have to plug them into uCenter to see how they are configured.

Based on what I am seeing in my logs it's a single blip going to about 300 or 350ms. Does that mean my GPS is unhealthy or just temporarily busy?

Perhaps a message saying GPS Slow Response or similar and if you get too many of those in a certain timeframe then it's not healthy?

@Pedals2Paddles
Copy link
Contributor

As far as initializing, I don't think reporting that as unhealthy is a good thing since that is false. It's not unhealthy, it's initializing. BUT, we also there for do not know if it is healthy. So calling it healthy while initializing is also not accurate. Not Unhealthy != Healthy.

@Naterater
Copy link
Contributor

I'm going to rule out misconfiguring unless Here+ units using default parameters (5Hz) yields consistent missing events. Remember I said maybe 3 times per hour. Not consistently every few seconds. This isn't an initializing issue IMO, it's once they are operational. Big red messages about GPS health due to a single event missing is annoying. That's 0.016% of messages at 5Hz if they happen once every 20 minutes. Is a single missing event reason for the nasty message causing user major concern?

@rmackay9
Copy link
Contributor

rmackay9 commented Feb 4, 2020

On the dev call we agreed that we could/should redesign the filter used for reporting to only report an issue if there are at least two lost messages within 30seconds. The missing GPS message should also be recorded as a counter in the PM (?) message.

Also we'd like to see a log of a message where the GPS is generally good but there is an occasional loss of a GPS message.

@tridge
Copy link
Contributor

tridge commented Feb 4, 2020

we really need a log showing this issue

@cglusky
Copy link
Author

cglusky commented Feb 4, 2020

@tridge This should be a good sample. Just testing loiter and got Bad GPS Health via yaapu/frsky telem so switched to stabilize and landed.

https://drive.google.com/open?id=1RC-M0FBgtdqDJqpKhHscGgKUyyGLAtYf

At least I think that's one of the flights in question. Sorry, have three new devFrames on the bench I have been testing the last month and it's all a bit of a fog at this point. I can more than likely reproduce with a fresh flight if needed.

The reason it stuck out to me is it was flying great in loiter and the Bad GPS Health popped up and caused me to switch to stabilize and land. Although looking at that log it looks like I decided to bang it around a bit in stabilize before I landed.

@WickedShell
Copy link
Contributor

WickedShell commented Feb 4, 2020

@cglusky Your log is interesting, it's not a single bad reading, it's actually 2 in a row that are slow. Interestingly this actually corresponds to a jump/inconsistency in the GPS data output:

Figure_1

Figure_1

Figure_1

This would imply to me that the error is actually inside the GPS unit, and you actually did get bad data for this time. I can't see anything else yet that would explain this, but I'll keep looking.

@cglusky
Copy link
Author

cglusky commented Feb 4, 2020

Thanks for having a look @WickedShell - Very interesting. One of my goals for 2020 is to become better at analyzing logs.

Before I posted this feature request I did consider abstracting it to entire alerting system. Figured it was a bit much. But I think it is worth noting as it is already well documented:

https://www.faa.gov/documentLibrary/media/Advisory_Circular/AC_25.1322-1.pdf

Specifically, my Bad GPS Health alert felt a bit binary when there are typically different levels of alerts - Warnings, Cautions and Advisory given as feedback in aviation systems. The challenge is finding the right balance which would obviously require some judgment from devs and feedback from the wider user community.

Having messages pop-up that people start to ignore are perhaps just as dangerous as no message at all, as Human factors cause most aviation accidents:

https://www.faa.gov/data_research/research/med_humanfacs/oamtechreports/2000s/media/200618.pdf

@WickedShell
Copy link
Contributor

Part of the problem here is that the MAVLink messaging only supports a healthy/not healthy light.

Sticking with the manned aviation example though it's actually typical to have warning lights on the panel that come on, and if they persist (or are coupled with any other abnormalities) become a land immediately item, but aren't always a land immediately. The way the information is actually displayed to you also matters a lot, as it can make it harder to tell how bad it is. Some of the more popular GCS's will continue to show you the warning for 10 seconds after it's cleared, so you have a time to read it, while mine will print a warning start/stop with a timestamp, but the actual warning indicator itself will go out the moment the warning isn't valid anymore. This latter one makes it much easier to assess intermittent warnings like this.

@Naterater
Copy link
Contributor

This conversation about "BAD" things is now on two topics. Users are tired of these "BAD" warning messages that aren't really ultimately that bad. #13457 is a another similar topic with similar discussion.

@cglusky
Copy link
Author

cglusky commented Feb 5, 2020

Just doing a bit of homework. It does appear there are some issues with how MAVLink handles alerting.

First, some of the messages got tied to an RFC for syslog.

https://tools.ietf.org/html/rfc5424
https://mavlink.io/en/messages/common.html#MAV_SEVERITY

That RFC could map to aviation standards.

And as @WickedShell pointed out the health status is binary which does not help us much...
https://mavlink.io/en/messages/common.html#SYS_STATUS
https://mavlink.io/en/messages/common.html#MAV_SYS_STATUS_SENSOR

And CAN nodes appear to have their own mapping...
https://mavlink.io/en/messages/common.html#UAVCAN_NODE_HEALTH

Feels like the foundation is there but it just needs to be standardized. Easy for me to say.

@Sekilsgs2
Copy link

Hi.

On 4.1 i'm have always this warning for 30 min - about 70 errors.
4.0.7 dont have this problems!

What i'm find - on 4.0.7 when i'm download logs from data flash - AP show this warnings - i'm think this is because downloading data from flash block other threads - and in 4.1 we have very bad optimisations when many threads cant working good with proper latency and maybe some irq's lost or have bad priority - i'm think this is main problem in 4.1 with many internal errors.

@andyp1per
Copy link
Collaborator

@Sekilsgs2 what board is this on?

@Sekilsgs2
Copy link

Sekilsgs2 commented Jul 22, 2021

what board is this on?

Mamba f405 mk2. Official have only 4.1, but i'm compile and using 4.0.7 now, because on 4.1 b5 i'm have crash - fc rebooting in flight..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants