Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[next] Compare the 3 sync methods in ofdm-processor.cpp (freqsyncMethod) #247

Closed
AlbrechtL opened this issue May 1, 2018 · 46 comments
Closed
Assignees
Labels
Projects
Milestone

Comments

@AlbrechtL
Copy link
Owner

There are 3 sync methods in ofdm-processor.cpp (freqsyncMethod). Compare!

@AlbrechtL AlbrechtL created this issue from a note in Tasks (To do) May 1, 2018
@AlbrechtL AlbrechtL added the task label May 1, 2018
@AlbrechtL
Copy link
Owner Author

@mpbraendli Please describe how to enable/disable the different methods.

@mpbraendli
Copy link
Collaborator

It is set through the last argument of the ofdmProcessor constructor: https://github.com/AlbrechtL/welle.io/blob/next/src/backend/radio-receiver.cpp#L54

Currently fixed to the value 3, it is used in https://github.com/AlbrechtL/welle.io/blob/next/src/backend/ofdm/ofdm-processor.cpp#L493

I'm not sure it is necessary to present this to the user, but it would be good to understand how these methods compare.

@mpbraendli
Copy link
Collaborator

All three methods behave badly in presence of two signal components (two peak in CIR), when the first peak is weaker than the second. The receiver locks onto the stronger one, but should lock on the earlier one instead. Locking on the second leads to increased inter-symbol interference and can kill reception.

@mpbraendli
Copy link
Collaborator

See #256

@mpbraendli
Copy link
Collaborator

I've added the ability to chose the sync method in the settings page. The default seems to be the best option in general.

@AlbrechtL do we close this ticket?

@mpbraendli mpbraendli moved this from To do to In progress in Tasks Jul 26, 2018
@AlbrechtL
Copy link
Owner Author

Thanks for the GUI options.
I moved the options into the expert settings in 2ebb201. Is it necessary to save which options is selected and restore them after a restart?
I think also we should document these options somewhere. Wiki or welle.io web site?

@mpbraendli
Copy link
Collaborator

It would be nice to keep these settings permanent and store them.

I wrote https://github.com/AlbrechtL/welle.io/wiki/Backend-Implementation-Remarks with some more details.

@AlbrechtL
Copy link
Owner Author

Thanks for the docs. I can also make the settings permanent.

@mpbraendli
Copy link
Collaborator

Ok, I'll let you close the ticket as soon as you think it's ok to do so.

@AlbrechtL
Copy link
Owner Author

The settings are permanent now (ce3f8c1).

But I discovered an issue with the new frequency synchronization. After some time the receiver lost the lock - even with a good signal. At the moment I'm working with rtl-sdr dongles without an TCXO. In this case the frequency offset (> 4 kHz) is shifting permanently.

@mpbraendli
Copy link
Collaborator

So this is with coarse correction enabled but with the new FFT window placement?

Is the old FFT window placement fine?

@AlbrechtL
Copy link
Owner Author

AlbrechtL commented Aug 16, 2018

I discovered the issues a little bit more in detail and I found 3 major problems. All test are applied with enabled coarse correction (PatternOfZeros), new FFT window and a rtl-sdr without a TCXO (3.5 kHz offset in warm state). First of all I let the synchronization succeed successfully so that a station is played. After some time the following issues occurred (sometimes).

  1. The FIC CRC is OK but there are a lot of frame errors (25) while the reception is OK (no multipath, SNR 15)
  2. Multipath situation: The receiver locks to the correct first peak but from one time to another it locks to the wrong peak and stays there (no playing)
  3. The coarse correction is OK (3.5 kHz) but from one time to another it runs wrong to around 500 Hz and stays there (no playing).

I discovered some of these issues also in the old qt-dab and welle.io 1.0 implementation. Therefor I added a timer to check the sync status after some time and I resented the backend if somethings goes wrong. In the current next branch I removed this timer because it was just a workaround to get welle.io 1.0 working.

@mpbraendli
Copy link
Collaborator

The tricky part with these changes is that there is no right nor wrong thing to do, there is "relatively better" and "relatively worse", in average or on specific cases.

I tried to put together some meaningful test metrics in welle-cli with the -t option, so that we can run the receiver against recordings and be able to objectively compare versions/settings. But there is more to do:

  • define more meaningful metrics
  • automate the execution of the tests against the recordings we have
  • take time to look at and understand the results

If you're able to make an IQ recording that exhibits bad behaviour (especially locking to an invalid state), that would be really good, because it would be easy to take it into the tests.

I believe it is the backend's responsibility to get out of a locked state, and having a "restart" timer in the frontend feels like a hack. I'll think about a way to do this in the backend...

@AlbrechtL
Copy link
Owner Author

If you're able to make an IQ recording that exhibits bad behaviour (especially locking to an invalid state), that would be really good, because it would be easy to take it into the tests.

That's a plan. I'm thinking about to add an IQ recording including some sort of ring buffer recording functionality. For example, the IQ samples of the last 5 minutes (5 min * 60 s * 4 MB = 1.2 GB) will be stored inside the RAM and if the user discovers an issue he or she just hit a button or key and the data will be saved into a file. The time span will be also flexible.

I believe it is the backend's responsibility to get out of a locked state, and having a "restart" timer in the frontend feels like a hack. I'll think about a way to do this in the backend...

Great! That's also what I'm thinking.

@mpbraendli
Copy link
Collaborator

That would work. 5 minutes is already a long time, both for file sizes and for the duration of the tests.

Keep in mind the recording format should depend on the device you're using, it's not always u8.

@AlbrechtL
Copy link
Owner Author

AlbrechtL commented Nov 15, 2018

In 10a6bdd I added a simple RAW recorder. The very interesting discovery is, even if the backend lost the lock the recording is fine. So losing of the lock must be something else. I saw this only on slow CPUs where welle.io needs around 70 % of the CPU power.

Can it be that IQ ring buffer overflows and then we are losing the time synchronization because of the lost data?

On my Windows 10 testing machine with a Intel Z3736F CPU it can happen that Windows is doing updates or something else in the background and most likely the lock losing happens then.

@andimik
Copy link
Contributor

andimik commented Nov 15, 2018

You mean the recorded file plays better than the live signal?

@AlbrechtL
Copy link
Owner Author

Yes, but on a different CPU.

@AlbrechtL
Copy link
Owner Author

Here is a test RAW file: https://transfer.sh/YN7yp/sample_sync_lost.iq
Near the end of the recording the sync gets lost. On the computer where I created the recording it never got the sync back.

@andimik
Copy link
Contributor

andimik commented Nov 18, 2018

So the recording was made with the new function in welle.io?

The file has the same problem with qt-dab and qirx. Sync lost.

@AlbrechtL
Copy link
Owner Author

So the recording was made with the new function in welle.io?

Yes

@mpbraendli Can you give a recommendation how to proceed with the sync lost / resync issue?

@mpbraendli
Copy link
Collaborator

I didn't follow the discussion so closely, but the hypothesis that the buffer between input and backend can lose samples and break sync sounds plausible.
One possible improvement would be to drop 96ms worth of data if there is an overflow. Then you lose a complete transmission frame, and can keep lock onto the phase reference symbol.

@AlbrechtL
Copy link
Owner Author

I added an overflow detection in e62e39c. First I would like to see if the overflow is the origin of the sync lost.

@AlbrechtL
Copy link
Owner Author

@mpbraendli Do you know an alternative ring buffer implementation? I don't trust the current implementation.

@mpbraendli
Copy link
Collaborator

No I don't know one in particular. I also don't like the implementation because lockless designs are more difficult to prove to be correct.

@AlbrechtL
Copy link
Owner Author

What do you think about https://github.com/dhess/c-ringbuf?

@AlbrechtL
Copy link
Owner Author

Finally I caught the bug! A ring buffer overflow was not an issue.

The issue was in line 364. If the first sync was successfully then ficHandler.syncReached() got true but never got false unless a channel change.

if (!disableCoarseCorrector and !ficHandler.syncReached()) {
int correction = processPRS(ofdmBuffer.data());
if (correction != 100) {
coarseCorrector += correction * params.carrierDiff;
if (abs (coarseCorrector) > kHz(35))
coarseCorrector = 0;
}
}

In 7b1765f I changed ficHandler.syncReached() to use the FIC CRC. If the sync is not OK then the CRC is false and the coarse offset calculation is starting again.

As default I enabled the coarse frequency correction and the PRS correlation as algorithm in the GUI version.

I think we can close this issue now.

@mpbraendli
Copy link
Collaborator

Good observation! I mostly run with disabled coarse corrector, and have not noticed this behaviour.

(The pedantic guy inside me would argue that syncReached is not an appropriate name anymore, but that's not very important :-D )

@AlbrechtL
Copy link
Owner Author

The pedantic guy inside me would argue that syncReached is not an appropriate name anymore, but that's not very important :-D

That's why I changed it into getIsCrcValid() ;-)

if (!disableCoarseCorrector and !ficHandler.getIsCrcValid()) {

@AlbrechtL
Copy link
Owner Author

Reopened because there is still an instability inside coarse corrector.

@AlbrechtL
Copy link
Owner Author

It seems that there is an instability inside coarse corrector. With all three coarse correctors there are drop outs (the drop out counts are different).
Another observation is that the signal is not so bad so I'm wondering why the sync is getting lost generally.

IQ file: https://transfer.sh/S11OT/20181226_Paderborn_5C.7z (link valid until 10. Jan. 2019)

CorrelatePRS
correlateprs
Drop outs: 9

GetMiddle
getmiddel
Drop outs: 7

PatternOfZeros
patternofzeros
Drop outs: 5

@mpbraendli Do you have an idea whats going wrong?
@andimik Can you try the IQ file with the other DAB SDRs?

@AlbrechtL
Copy link
Owner Author

To analyze the sync issue I added a waterfall plot.
waterfall

Just from the waterfall plot I could not see why the sync is getting lost.

@mpbraendli
Copy link
Collaborator

Interesting view! If you plot the CIR over time in the same way (as a waterfall), you will see more easily where sync gets lost. It will however be a view of the effect, not the causes.
It looks like that: https://twitter.com/mpbraendli/status/1002088951488962561

@AlbrechtL
Copy link
Owner Author

In 7e77191 I added the waterfall plot also for the CIR.

bildschirmfoto von 2019-02-01 20-59-32

What can be the cause?

@mpbraendli
Copy link
Collaborator

Great, that's a nice visualisation! Attention: x axis isn't in Hz, it's in samples (i.e. it's time, not frequency)

In order to understand the cause it would be good to have IQ files which reliably trigger this behaviour.

@mpbraendli
Copy link
Collaborator

I replaced the new FFT placement algorithm in ccc87b5 that applies a window on the peaks and uses a threshold. (Active when threshold == -1) I still need to test it in heavy multi-path scenarios.

Maybe that approach is better than the "find N peaks". (Active when threshold == 0)

@AlbrechtL do you still see the lock loss happening in your scenario?

@AlbrechtL
Copy link
Owner Author

Thanks for looking into it.

IQ file: https://transfer.sh/S11OT/20181226_Paderborn_5C.7z (link valid until 10. Jan. 2019)

Unfortunately this link is not valid anymore. I uploaded the IQ file again.
https://transfer.sh/SVdu0/20181226_Paderborn_5C.7z (link valid until 19. Feb. 2019)

@AlbrechtL
Copy link
Owner Author

I tested the three new options FFT window placement algorithm StrongestPeak, EarliestPeakWithBinning and ThresholdBeforePeak with ac62932.

Used Settings

  • File: 20181226_Paderborn_5C.iq
  • Enabled coarse corrector
  • Coarse corrector algorithm: CorrelatePRS

StrongestPeak
strongestpeak

  • Total 6 drop outs (1 not shown)
  • Sometimes long re-sync activities (high [...] after xxx frames value)

EarliestPeakWithBinning
earliestpeakwithbinning

  • Total 9 drop outs
  • Fast re-sync

ThresholdBeforePeak
thresholdbeforepeak

  • Total 4 drop outs
  • Missing SyncOnPhase messages
  • Sometimes extrem long re-sync activities (high [...] after xxx frames value)

Which is the algorithm that you mentioned in ?

Maybe that approach is better than the "find N peaks". (Active when threshold == 0)

@mpbraendli
Copy link
Collaborator

Before the change, threshold == 0 was "the new algorithm", which is now called EarliestPeakWithBinning. The "old algorithm" is StrongestPeak.

I will try to understand the significance of the SyncOnPhase failed message.

@mpbraendli
Copy link
Collaborator

I have slightly improved ThresholdBeforePeak, because the lack of SyncOnPhase failed messages showed that it did not properly signal failure of peak detection.

I used your Paderborn recording to test. It's interesting because it has a large enough frequency offset so that it requires the coarse offset. This makes it more complex because of the interplay between coarse offset corrector and PRS detection.

@gvanem
Copy link
Contributor

gvanem commented Feb 14, 2019

I tried to verify this, but I get a Not Found for https://transfer.sh/YN7yp/sample_sync_lost.iq.

Please try to create a link for it here.

@AlbrechtL
Copy link
Owner Author

https://transfer.sh/SVdu0/20181226_Paderborn_5C.7z (link valid until 19. Feb. 2019)

Please use this one.

@AlbrechtL
Copy link
Owner Author

This makes it more complex because of the interplay between coarse offset corrector and PRS detection.

@mpbraendli With cheap rtl-sdrs without the TCXO you see such high frequency offsets.

@AlbrechtL
Copy link
Owner Author

Any news here?
I still have the sync issue during my daily use (frequency shift > 1 kHz).

@AlbrechtL
Copy link
Owner Author

From my point of view the sync issue is improved with commit 54f66c9.
@mpbraendli Thanks!

@mpbraendli
Copy link
Collaborator

Good to hear it does improve the behaviour! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Tasks
  
In progress
Development

No branches or pull requests

4 participants