Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sudden LQ drop and RX loss at strong RSSI-dBm #934

Closed
0crap opened this issue Sep 27, 2021 · 18 comments
Closed

Sudden LQ drop and RX loss at strong RSSI-dBm #934

0crap opened this issue Sep 27, 2021 · 18 comments
Labels
bug 🐛 Something isn't working

Comments

@0crap
Copy link

0crap commented Sep 27, 2021

As discussed on Discord a new issue for the record.

On two different occasions we encountered a sudden LQ drop and RX loss, while at close range with strong rssi-dBm.
During these occasions my buddy and I where both in the air. (Only one affected at that moment.)
The LQ drop goes in steps to zero and the link gets disconnected.
In both cases the link reestablished itself with an LQ of 100.

Next flight, using a fresh LiPo, all is fine and we can fly together for 4KM out without issues.
We fly both an AR.Wing 900 with iNav 3.0.2 installed.
Both have an HM EP1 installed with ExpressLRS version 1.1.0
(The EP1's installed do NOT have the SDG regulator.)

Our 2.4GHz gear:
I'm using a DIY 2400 TX (JR-Bay)
My buddy uses the HM ES24TX (JR-Bay)
We both have a Taranis QX7S with OpenTX v2.3.14 installed. (Both have the TBS inverter mod installed.)

We also often fly our ExpressLRS quads together, running Betaflight v4.2.9, without issues. (Both HM EP1's.)

Below the DVR recording of the events.

https://streamable.com/l9rva7
https://streamable.com/t3mk1y

If there is any more info needed, just ask.

@0crap 0crap added the bug 🐛 Something isn't working label Sep 27, 2021
@CapnBry
Copy link
Member

CapnBry commented Sep 27, 2021

Excellent, thanks for the post with all the details in one place. This has all the symptoms of a Nonce sync slip, although I do not understand how not only can it happen once (which I never see) but happens over and over. I am going to try to update my long-run testing procedure for roughly the same LQ/RSSI conditions and to watch specifically for this behavior.

@CapnBry
Copy link
Member

CapnBry commented Sep 28, 2021

Here was the test I've run (all on master):

  • Pair 1: Q X7 with invertermod OpenTX 2.3.14 + SIYI FM30 (no antenna) 10mW 150Hz 1:8 / BetaFPV RX + Betaflight 4.2. Transmitter 10m away with RSSI around -70, LQ 90-100. Quad was "armed", to lock us out if the connection ever failsafes.
  • Pair 2: FR Mini as TX 10mW 150Hz 1:8 / EP2. Transmitter sitting right next to Pair 1's receiver for maximum interference.

Test was run for 18 hours. No LQ less than 75 was ever experienced (that's where the warning level was set) but I personally never saw anything less than 90 LQ in OpenTX telemetry.

Possible conclusions:

  • We don't have this issue on master
  • We don't have this issue on STM32 transmitters paired with ESP receivers
  • I'm just unlucky

I do not have an ESP32-based Team2.4 TX to test with so I can't think of anything else I can try.

@0crap
Copy link
Author

0crap commented Sep 28, 2021

If it makes any difference, my TX is always set at 250Hz, TLM 1:32.
Advanced telemetry is enabled and 23 sensors are recorded.

Probably not related, but can you flash iNav 3.0 on that FC?
We got unlucky twice when iNav 3.0 was in use. ¯_(ツ)_/¯

@CapnBry
Copy link
Member

CapnBry commented Sep 28, 2021

I wouldn't think the FC would come into play at all, since we're reporting the receiver LQ and I can't think of anything the FC can do to influence our received LQ. You're not running any sort of Lua script for telemetry are you?

@0crap
Copy link
Author

0crap commented Sep 28, 2021

.... You're not running any sort of Lua script for telemetry are you?

For sure not, banned Yaapu and the like a long time ago.
I know the QX7S is very resource limited...

@amdnikos
Copy link

Here was the test I've run (all on master):

  • Pair 1: Q X7 with invertermod OpenTX 2.3.14 + SIYI FM30 (no antenna) 10mW 150Hz 1:8 / BetaFPV RX + Betaflight 4.2. Transmitter 10m away with RSSI around -70, LQ 90-100. Quad was "armed", to lock us out if the connection ever failsafes.
  • Pair 2: FR Mini as TX 10mW 150Hz 1:8 / EP2. Transmitter sitting right next to Pair 1's receiver for maximum interference.

Test was run for 18 hours. No LQ less than 75 was ever experienced (that's where the warning level was set) but I personally never saw anything less than 90 LQ in OpenTX telemetry.

Possible conclusions:

  • We don't have this issue on master
  • We don't have this issue on STM32 transmitters paired with ESP receivers
  • I'm just unlucky

I do not have an ESP32-based Team2.4 TX to test with so I can't think of anything else I can try.

Are there differences on master vs 1.1.0? I have pretty much same symptoms on r9. I could give a try in master and retest the interference thing.

@0crap
Copy link
Author

0crap commented Sep 29, 2021

Are there differences on master vs 1.1.0? I have pretty much same symptoms on r9. I could give a try in master and retest the interference thing.

Fair question, I hope Capn is willing to run his test one more night on v1.1.0
That is the version we experienced this issue.

@dragnea
Copy link

dragnea commented Sep 29, 2021

@0crap you're not the only one.
https://www.youtube.com/watch?v=q0t9hCwVdJc&ab_channel=DragneaMihai
at minute 7 - where i lost the quad
details in video description. elrs 0.9 also, so nothing since then

@0crap
Copy link
Author

0crap commented Sep 29, 2021

Ohhh no! That is no fun to lose a quad like that.
Also at a very strong signal -64dBm.
You are running R9 900MHz gear from what I read in your video description.

Anyway, thx for sharing.

@dragnea
Copy link

dragnea commented Sep 29, 2021

I did not rescued it even with a professional scuba diver, the water was so milky... GoPro 9, fr7 quad

Yes, it's R9 868mhz. I had the same issue later on a wing, and the same failsafed at 40m from home, lost the control, entered in rth then it started the landing, but I had time to regain control becaus... the only mode to regain the control, it was to restart the radio.

So, you have plenty of signal, but the data remains stuck and you get the FS. I've posted on the facebook group too, past months... (my sort of conclusion: if it works for the most of people, why bother to fix this... and it still makes "victims"). In the end, maybe offtopic, I bought crossfire and I got over the pain, simple and easy.
I hope this bug will be fixed someday

@0crap
Copy link
Author

0crap commented Sep 29, 2021

..... I've posted on the facebook group too, past months... (my sort of conclusion: if it works for the most of people, why bother to fix this... and it still makes "victims"). In the end, maybe offtopic, I bought crossfire and I got over the pain, simple and easy. I hope this bug will be fixed someday

I think the devs look at every issue seriously.
Problem is that they are not always aware of them, if they don't encounter them personally.
Posting stuff on social media is fine, but often missed because it's not possible to check all that.

If there are issues, GitHub is the best place to let them know.
So I can only recommend to everyone, use GitHub to report issues if you encounter some.
Little effort and in the end we all benefit! 👍
Just my take on it.

@CapnBry
Copy link
Member

CapnBry commented Sep 29, 2021

In 9 hours of 1.1.0 testing with the same setup except 250Hz 1:32, I ran into one nonce desync, similar to what you experience. One happens, then a few seconds later, it does it again. Unfortunately, it was off by enough that it never got a sync packet again so I could tell which direction it was shifted off.
image

It stayed at <50LQ until I finally rebooted the TX which connected right back up. I'm not sure what I can do to make this happen more frequently to be able to narrow down where it is happening so I am open to suggestions. I've spent a couple hours staring at the receiver phase lock code and slapping in some debugging to verify it is working correctly. The only thing I can think of is that it somehow phase shifts a whole interval in one period, but that shouldn't be possible given the constraints and I've even got code in there to spit out debug if the value seems to large.

EDIT and I got another one. RRRRRRT.R_RRR_RRR_ so in all three cases it was during a FHSS that included telemetry and also had a CRC error on the RX. The TX was one Nonce ahead of the RX, and the RX did not have any phase offset applied >100us. Maybe we're starting to narrow down where this is happening.

@0crap
Copy link
Author

0crap commented Sep 30, 2021

Nice catch!

I would continue with a second test.

Do the exact same, for at least 12 hours, but now switch off the interference creating radio.

To answer the question, can it do this slip on it's own?

@SunjunKim
Copy link
Contributor

SunjunKim commented Sep 30, 2021

I'm nothing knowledgable how ELRS nonce works in detail, but I wonder a basic thing:
What happens if the timers on TX/RX are making some offsets and drifts, and making nonce offset, plausibly +-1?
Is there any recovery algorithm for handling the de-synced nonce with an offset of just one?

@CapnBry
Copy link
Member

CapnBry commented Sep 30, 2021

I would continue with a second test.

Oh look at this guy thinking I have only done two tests and not continuously tried to make it fail 😋 The problem with testing to see if something doesn't affect it is that it isn't something that happens very often. Even 12h of it not happening is a significant result.

What happens if the timers on TX/RX are making some offsets and drifts, and making nonce offset, plausibly +-1?

The timers are synced by a phase lock so they can't drift apart unless there is a lack of packets on the RX to help keep its timer synced. This takes hundreds missing of packets to drift enough that they're not on the same nonce any more, not just one or two. They're resynced by sync packets, which you see happening in one of 0crap's videos. It only works on a slip of 1 in 1.0, or up to 3 in 2.0. It shouldn't ever get out of sync though, because the sync packet isn't there to fix this, it is there to help the RX establish a connection or resync after a long stretch of missing packets.

@0crap
Copy link
Author

0crap commented Sep 30, 2021

I would continue with a second test.

Oh look at this guy thinking I have only done two tests and not continuously tried to make it fail 😋 The problem with testing to see if something doesn't affect it is that it isn't something that happens very often. Even 12h of it not happening is a significant result.

Shame on me, thx for the effort! :-)

@SantiCO19
Copy link

I do not have an ESP32-based Team2.4 TX to test with so I can't think of anything else I can try.

I know you squashed this bug already, but if you still need an ESP based 2.4 TX, I'd be happy to have one sent to you.

@0crap
Copy link
Author

0crap commented Oct 6, 2021

Well, let's close this one gents! 💯
The fix is merged in master and 1.1.x-maintenance branch for everyone to try.
Or just wait for the upcoming release version.

Big thanks to Capn for smashing this one and all others involved! 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants