Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode high duty-cycle CRSF frames using frame marker rather than timeouts #26183

Merged
merged 3 commits into from Feb 22, 2024

Conversation

andyp1per
Copy link
Collaborator

@andyp1per andyp1per commented Feb 10, 2024

This fixes an issue with CRSF and ELRS where fast update rates and/or high CPU load can lead to radio failsafes due to scheduling delays preventing full parsing of frames and then the re-synchronization taking too long. The approach is to look for the CRSF frame marker (0xC8) and parse from that point. If parsing fails then the next frame marker is searched for and intermediate bytes are dropped. This means that re-synchronization should take place in a single frame.

Copy link
Contributor

@tridge tridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it should work, but it seems a bit awkward.
might be easier if you had a union on Frame with a uint8_t array?

libraries/AP_RCProtocol/AP_RCProtocol_CRSF.cpp Outdated Show resolved Hide resolved
@andyp1per
Copy link
Collaborator Author

might be easier if you had a union on Frame with a uint8_t array?

Maybe, but that then makes it a very large change as all the references to _frame have to be updated. I think it would be better to do that as a separate NFC PR.

@andyp1per
Copy link
Collaborator Author

Flew this on the copter that was having multiple RC failsafes - clean as a whistle, no issues at all and flew great. I think this is a good solution and should definitely go in 4.5. Should be possibly considered for 4.4 after further user testing.

@andyp1per andyp1per changed the title AP_RCProtocol: decode high duty-cycle CRSF frames using frame marker rather than timeouts Decode high duty-cycle CRSF frames using frame marker rather than timeouts Feb 11, 2024
Copy link
Contributor

@tridge tridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer the union/memmove approach, it is just much clear

libraries/AP_RCProtocol/AP_RCProtocol_CRSF.cpp Outdated Show resolved Hide resolved
@timtuxworth
Copy link
Contributor

timtuxworth commented Feb 13, 2024

@andyp1per I think you might only be using Crossfire, should this be tested with Express LRS (which also uses CRSF)?

@andyp1per
Copy link
Collaborator Author

Tested on ELRS at 250Hz 1:2 and 500Hz 1:2 without issue

@andyp1per
Copy link
Collaborator Author

i would prefer the union/memmove approach, it is just much clear

Upon reflection I don't agree with this. If I was going to do a union I was only going to do that to make the maths a bit easier, I wasn't planning on doing a memmove. The problem is if we do a memmove we still have to process the bytes to see if the new start makes any sense or not. The new code would then have to have a separate parser to look at frames rather than bytes and we still need to do byte-based processing in the happy path. We still don't know when we have got a frame until we have finally read the CRC and validated it. We don't even know that we have enough bytes to process - in fact quite likely that we don't since that is the general problem that cause the failsafes in the first place. memmove also adds in the overhead of the copy as well as processing the data. So although I agree its a little hard to get your head around at first I think the approach I have implemented is actually the most efficient and the best for the kind of data stream we are dealing with.

@tridge tridge removed the DevCallEU label Feb 14, 2024
@tridge
Copy link
Contributor

tridge commented Feb 14, 2024

@andyp1per valgrind is not happy with the decoder:
image

@tridge
Copy link
Contributor

tridge commented Feb 14, 2024

The problem is if we do a memmove we still have to process the bytes to see if the new start makes any sense or not

no, you don't. The basis of a union based decoder is this:

  • a function which decodes a frame buffer. No shifts, no reading inside the fn, just takes an array of bytes and returns success/fail and the results on success
  • an outside fn which creates does the reading, filling in a union byte array. When it has the right header and length it calls the frame decoder. If it succeeds then we have a frame done and can zero the input buffer. On fail it uses memchr to find a header byte. If no header byte it zeros the buffer. If it finds a header byte it does a memmove and then sets the length of the input buffer to the bytes starting at the header it found. It then is in a state just as if the bad input bytes never existed

@tridge
Copy link
Contributor

tridge commented Feb 15, 2024

@andyp1per note that to use valgrind, you typically build for the linux target with:

  • ./waf configure --board linux --debug
  • ./waf --target examples/RCProtocolTest
  • valgrind --soname-synonyms=somalloc=nouserintercepts -q build/linux/examples/RCProtocolTest

@andyp1per
Copy link
Collaborator Author

@andyp1per valgrind is not happy with the decoder: image

I am unable to get valgrind to complain, would be good to understand what else I might need to do.

@tridge
Copy link
Contributor

tridge commented Feb 15, 2024

@andyp1per can you have a look at this branch? I've added a commit that reworks the parser using the more conventional memchr/memmove approach for re-sync. I think it makes the code clearer
https://github.com/tridge/ardupilot/commits/pr-high-duty-rc/

@tridge
Copy link
Contributor

tridge commented Feb 16, 2024

@andyp1per I've updated that branch some more to fix the test code under SITL and moved the crc calculation to be done once we have the full frame, which makes the skip to next frame work properly

@tridge
Copy link
Contributor

tridge commented Feb 16, 2024

@andyp1per note that the CRSF4 test fails with your PR as-is

@andyp1per andyp1per force-pushed the pr-high-duty-rc branch 2 times, most recently from 7f9f01b to 30a7138 Compare February 17, 2024 16:27
@andyp1per
Copy link
Collaborator Author

@tridge I have incorporated your changes into my PR, fixed up, reviewed and tested. Seems to be working well - thanks.

@tridge
Copy link
Contributor

tridge commented Feb 17, 2024

@peterbarker would you mind checking the parser for any logic errors that could cause a crash?

@peterbarker
Copy link
Contributor

@andyp1per can we remove AP_SerialProtocol_CRSF?

It's been broken for 18 months and nobody noticed.

We could save bytes and code complexity by removing it.

9b8ea84 was the commit which killed it AFAICS.

@tridge
Copy link
Contributor

tridge commented Feb 19, 2024

@andyp1per to expand on the comment from peter, the AP_RCProtocol_CRSF::update() function calls process_byte() with invalid arguments (passes in micros as the byte). The if (_uart) section of update() is only used when SERIALn_PROTOCOL is set to CRSF as opposed to RC. I don't see the point of setting it to CRSF, what is that meant to do?

@andyp1per
Copy link
Collaborator Author

andyp1per commented Feb 19, 2024

I'll fix. Most of the TBS kit supports direct CRSF control without RC frames - I have many. This was the original implementation and I don't want to lose it.

@andyp1per
Copy link
Collaborator Author

Fixed and tested

libraries/AP_RCProtocol/AP_RCProtocol_CRSF.cpp Outdated Show resolved Hide resolved
@@ -299,7 +349,7 @@ void AP_RCProtocol_CRSF::update(void)
for (uint8_t i = 0; i < n; i++) {
int16_t b = _uart->read();
if (b >= 0) {
_process_byte(AP_HAL::micros(), uint8_t(b));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain what the "standalone" mode is for? It seems to be setup by setting SERIALn_PROTOCOL to CRSF instead of RCIN, but I don't see how it fits in. It has clearly been broken for a long time so I wonder if it is really needed at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much all TBS kit supports CRSF control without RC packets. e.g. like SmartAudio for VTX or I have a TBS VTX that supports OSD natively via CRSF - the standalone mode allows you to support these devices. The format is pretty much the same, its just how you integrate that is not - you can't do it via the RC protocol because that essentially assumes that you have data always coming in that you need to reply to. These devices are genuine full-duplex so the FC is often initiating the comms. This was the original CRSF support, I have several copters using it I just haven't flown them for a while - I'm not planning on losing it.

@tridge tridge force-pushed the pr-high-duty-rc branch 2 times, most recently from a63276d to a85fbc8 Compare February 21, 2024 03:24
@tridge
Copy link
Contributor

tridge commented Feb 21, 2024

@andyp1per I've force pushed a fix for the trailing bytes error

@tridge tridge removed the DevCallEU label Feb 21, 2024
fixed RCProtocolTest on SITL and make it pass/fail with an exit code

Co-authored-by: Andrew Tridgell <andrew@tridgell.net>
@andyp1per
Copy link
Collaborator Author

andyp1per commented Feb 21, 2024

Failsafe's not current working - will need to investigate

@andyp1per
Copy link
Collaborator Author

Squashed, cherry-picked @peterbarker's commit in as a separate commit and re-tested. All looks good. Happy for this to go in now @tridge

@tridge tridge merged commit b19f8ed into ArduPilot:master Feb 22, 2024
92 checks passed
@andyp1per andyp1per deleted the pr-high-duty-rc branch February 22, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 4.5.0-beta3
Development

Successfully merging this pull request may close these issues.

None yet

5 participants