[next] Compare the 3 sync methods in ofdm-processor.cpp (freqsyncMethod) #247

AlbrechtL · 2018-05-01T07:58:10Z

There are 3 sync methods in ofdm-processor.cpp (freqsyncMethod). Compare!

AlbrechtL · 2018-05-01T08:00:16Z

@mpbraendli Please describe how to enable/disable the different methods.

mpbraendli · 2018-05-01T09:07:00Z

It is set through the last argument of the ofdmProcessor constructor: https://github.com/AlbrechtL/welle.io/blob/next/src/backend/radio-receiver.cpp#L54

Currently fixed to the value 3, it is used in https://github.com/AlbrechtL/welle.io/blob/next/src/backend/ofdm/ofdm-processor.cpp#L493

I'm not sure it is necessary to present this to the user, but it would be good to understand how these methods compare.

mpbraendli · 2018-05-07T13:04:13Z

All three methods behave badly in presence of two signal components (two peak in CIR), when the first peak is weaker than the second. The receiver locks onto the stronger one, but should lock on the earlier one instead. Locking on the second leads to increased inter-symbol interference and can kill reception.

mpbraendli · 2018-05-07T13:11:09Z

See #256

mpbraendli · 2018-07-26T14:47:14Z

I've added the ability to chose the sync method in the settings page. The default seems to be the best option in general.

@AlbrechtL do we close this ticket?

AlbrechtL · 2018-08-03T21:49:13Z

Thanks for the GUI options.
I moved the options into the expert settings in 2ebb201. Is it necessary to save which options is selected and restore them after a restart?
I think also we should document these options somewhere. Wiki or welle.io web site?

mpbraendli · 2018-08-06T07:13:05Z

It would be nice to keep these settings permanent and store them.

I wrote https://github.com/AlbrechtL/welle.io/wiki/Backend-Implementation-Remarks with some more details.

AlbrechtL · 2018-08-06T20:40:12Z

Thanks for the docs. I can also make the settings permanent.

mpbraendli · 2018-08-07T14:36:25Z

Ok, I'll let you close the ticket as soon as you think it's ok to do so.

AlbrechtL · 2018-08-07T22:47:24Z

The settings are permanent now (ce3f8c1).

But I discovered an issue with the new frequency synchronization. After some time the receiver lost the lock - even with a good signal. At the moment I'm working with rtl-sdr dongles without an TCXO. In this case the frequency offset (> 4 kHz) is shifting permanently.

mpbraendli · 2018-08-08T05:33:22Z

So this is with coarse correction enabled but with the new FFT window placement?

Is the old FFT window placement fine?

AlbrechtL · 2018-08-16T14:12:01Z

I discovered the issues a little bit more in detail and I found 3 major problems. All test are applied with enabled coarse correction (PatternOfZeros), new FFT window and a rtl-sdr without a TCXO (3.5 kHz offset in warm state). First of all I let the synchronization succeed successfully so that a station is played. After some time the following issues occurred (sometimes).

The FIC CRC is OK but there are a lot of frame errors (25) while the reception is OK (no multipath, SNR 15)
Multipath situation: The receiver locks to the correct first peak but from one time to another it locks to the wrong peak and stays there (no playing)
The coarse correction is OK (3.5 kHz) but from one time to another it runs wrong to around 500 Hz and stays there (no playing).

I discovered some of these issues also in the old qt-dab and welle.io 1.0 implementation. Therefor I added a timer to check the sync status after some time and I resented the backend if somethings goes wrong. In the current next branch I removed this timer because it was just a workaround to get welle.io 1.0 working.

mpbraendli · 2018-08-17T14:20:50Z

The tricky part with these changes is that there is no right nor wrong thing to do, there is "relatively better" and "relatively worse", in average or on specific cases.

I tried to put together some meaningful test metrics in welle-cli with the -t option, so that we can run the receiver against recordings and be able to objectively compare versions/settings. But there is more to do:

define more meaningful metrics
automate the execution of the tests against the recordings we have
take time to look at and understand the results

If you're able to make an IQ recording that exhibits bad behaviour (especially locking to an invalid state), that would be really good, because it would be easy to take it into the tests.

I believe it is the backend's responsibility to get out of a locked state, and having a "restart" timer in the frontend feels like a hack. I'll think about a way to do this in the backend...

AlbrechtL · 2018-08-18T07:13:57Z

If you're able to make an IQ recording that exhibits bad behaviour (especially locking to an invalid state), that would be really good, because it would be easy to take it into the tests.

That's a plan. I'm thinking about to add an IQ recording including some sort of ring buffer recording functionality. For example, the IQ samples of the last 5 minutes (5 min * 60 s * 4 MB = 1.2 GB) will be stored inside the RAM and if the user discovers an issue he or she just hit a button or key and the data will be saved into a file. The time span will be also flexible.

I believe it is the backend's responsibility to get out of a locked state, and having a "restart" timer in the frontend feels like a hack. I'll think about a way to do this in the backend...

Great! That's also what I'm thinking.

mpbraendli · 2018-08-19T06:49:05Z

That would work. 5 minutes is already a long time, both for file sizes and for the duration of the tests.

Keep in mind the recording format should depend on the device you're using, it's not always u8.

AlbrechtL · 2018-11-15T19:12:31Z

In 10a6bdd I added a simple RAW recorder. The very interesting discovery is, even if the backend lost the lock the recording is fine. So losing of the lock must be something else. I saw this only on slow CPUs where welle.io needs around 70 % of the CPU power.

Can it be that IQ ring buffer overflows and then we are losing the time synchronization because of the lost data?

On my Windows 10 testing machine with a Intel Z3736F CPU it can happen that Windows is doing updates or something else in the background and most likely the lock losing happens then.

andimik · 2018-11-15T21:24:46Z

You mean the recorded file plays better than the live signal?

AlbrechtL · 2018-11-16T07:07:14Z

Yes, but on a different CPU.

AlbrechtL · 2018-11-18T16:37:12Z

Here is a test RAW file: https://transfer.sh/YN7yp/sample_sync_lost.iq
Near the end of the recording the sync gets lost. On the computer where I created the recording it never got the sync back.

andimik · 2018-11-18T19:56:39Z

So the recording was made with the new function in welle.io?

The file has the same problem with qt-dab and qirx. Sync lost.

AlbrechtL · 2018-11-19T18:44:48Z

So the recording was made with the new function in welle.io?

Yes

@mpbraendli Can you give a recommendation how to proceed with the sync lost / resync issue?

mpbraendli · 2018-11-20T09:54:44Z

I didn't follow the discussion so closely, but the hypothesis that the buffer between input and backend can lose samples and break sync sounds plausible.
One possible improvement would be to drop 96ms worth of data if there is an overflow. Then you lose a complete transmission frame, and can keep lock onto the phase reference symbol.

AlbrechtL · 2018-11-20T20:55:21Z

I added an overflow detection in e62e39c. First I would like to see if the overflow is the origin of the sync lost.

AlbrechtL · 2018-11-21T21:29:35Z

@mpbraendli Do you know an alternative ring buffer implementation? I don't trust the current implementation.

mpbraendli · 2018-11-22T08:35:01Z

No I don't know one in particular. I also don't like the implementation because lockless designs are more difficult to prove to be correct.

AlbrechtL · 2018-11-22T20:00:34Z

What do you think about https://github.com/dhess/c-ringbuf?

AlbrechtL · 2018-11-25T20:10:02Z

Finally I caught the bug! A ring buffer overflow was not an issue.

The issue was in line 364. If the first sync was successfully then ficHandler.syncReached() got true but never got false unless a channel change.

welle.io/src/backend/ofdm-processor.cpp

Lines 364 to 371 in ae2e668

    
           if (!disableCoarseCorrector and !ficHandler.syncReached()) { 
        
               int correction = processPRS(ofdmBuffer.data()); 
        
               if (correction != 100) { 
        
                   coarseCorrector += correction * params.carrierDiff; 
        
                   if (abs (coarseCorrector) > kHz(35)) 
        
                       coarseCorrector = 0; 
        
               } 
        
           }

In 7b1765f I changed ficHandler.syncReached() to use the FIC CRC. If the sync is not OK then the CRC is false and the coarse offset calculation is starting again.

As default I enabled the coarse frequency correction and the PRS correlation as algorithm in the GUI version.

I think we can close this issue now.

mpbraendli · 2018-11-26T08:14:46Z

Good observation! I mostly run with disabled coarse corrector, and have not noticed this behaviour.

(The pedantic guy inside me would argue that syncReached is not an appropriate name anymore, but that's not very important :-D )

AlbrechtL · 2018-11-28T18:29:51Z

The pedantic guy inside me would argue that syncReached is not an appropriate name anymore, but that's not very important :-D

That's why I changed it into getIsCrcValid() ;-)

welle.io/src/backend/ofdm-processor.cpp

Line 364 in 7b1765f

if (!disableCoarseCorrector and !ficHandler.getIsCrcValid()) {

AlbrechtL · 2018-12-27T11:19:16Z

Reopened because there is still an instability inside coarse corrector.

AlbrechtL · 2018-12-27T13:33:41Z

It seems that there is an instability inside coarse corrector. With all three coarse correctors there are drop outs (the drop out counts are different).
Another observation is that the signal is not so bad so I'm wondering why the sync is getting lost generally.

IQ file: https://transfer.sh/S11OT/20181226_Paderborn_5C.7z (link valid until 10. Jan. 2019)

CorrelatePRS

Drop outs: 9

GetMiddle

Drop outs: 7

PatternOfZeros

Drop outs: 5

@mpbraendli Do you have an idea whats going wrong?
@andimik Can you try the IQ file with the other DAB SDRs?

AlbrechtL · 2019-01-16T19:44:59Z

To analyze the sync issue I added a waterfall plot.

Just from the waterfall plot I could not see why the sync is getting lost.

mpbraendli · 2019-01-18T07:26:51Z

Interesting view! If you plot the CIR over time in the same way (as a waterfall), you will see more easily where sync gets lost. It will however be a view of the effect, not the causes.
It looks like that: https://twitter.com/mpbraendli/status/1002088951488962561

AlbrechtL · 2019-02-01T20:06:01Z

In 7e77191 I added the waterfall plot also for the CIR.

What can be the cause?

mpbraendli · 2019-02-01T22:30:19Z

Great, that's a nice visualisation! Attention: x axis isn't in Hz, it's in samples (i.e. it's time, not frequency)

In order to understand the cause it would be good to have IQ files which reliably trigger this behaviour.

mpbraendli · 2019-02-05T15:25:41Z

I replaced the new FFT placement algorithm in ccc87b5 that applies a window on the peaks and uses a threshold. (Active when threshold == -1) I still need to test it in heavy multi-path scenarios.

Maybe that approach is better than the "find N peaks". (Active when threshold == 0)

@AlbrechtL do you still see the lock loss happening in your scenario?

AlbrechtL · 2019-02-05T19:49:00Z

Thanks for looking into it.

IQ file: https://transfer.sh/S11OT/20181226_Paderborn_5C.7z (link valid until 10. Jan. 2019)

Unfortunately this link is not valid anymore. I uploaded the IQ file again.
https://transfer.sh/SVdu0/20181226_Paderborn_5C.7z (link valid until 19. Feb. 2019)

AlbrechtL · 2019-02-12T18:59:44Z

I tested the three new options FFT window placement algorithm StrongestPeak, EarliestPeakWithBinning and ThresholdBeforePeak with ac62932.

Used Settings

File: 20181226_Paderborn_5C.iq
Enabled coarse corrector
Coarse corrector algorithm: CorrelatePRS

StrongestPeak

Total 6 drop outs (1 not shown)
Sometimes long re-sync activities (high [...] after xxx frames value)

EarliestPeakWithBinning

Total 9 drop outs
Fast re-sync

ThresholdBeforePeak

Total 4 drop outs
Missing SyncOnPhase messages
Sometimes extrem long re-sync activities (high [...] after xxx frames value)

Which is the algorithm that you mentioned in ?

Maybe that approach is better than the "find N peaks". (Active when threshold == 0)

mpbraendli · 2019-02-13T10:10:20Z

Before the change, threshold == 0 was "the new algorithm", which is now called EarliestPeakWithBinning. The "old algorithm" is StrongestPeak.

I will try to understand the significance of the SyncOnPhase failed message.

mpbraendli · 2019-02-13T14:38:58Z

I have slightly improved ThresholdBeforePeak, because the lack of SyncOnPhase failed messages showed that it did not properly signal failure of peak detection.

I used your Paderborn recording to test. It's interesting because it has a large enough frequency offset so that it requires the coarse offset. This makes it more complex because of the interplay between coarse offset corrector and PRS detection.

gvanem · 2019-02-14T07:45:00Z

I tried to verify this, but I get a Not Found for https://transfer.sh/YN7yp/sample_sync_lost.iq.

Please try to create a link for it here.

AlbrechtL · 2019-02-14T09:33:09Z

https://transfer.sh/SVdu0/20181226_Paderborn_5C.7z (link valid until 19. Feb. 2019)

Please use this one.

AlbrechtL · 2019-02-17T09:54:50Z

This makes it more complex because of the interplay between coarse offset corrector and PRS detection.

@mpbraendli With cheap rtl-sdrs without the TCXO you see such high frequency offsets.

AlbrechtL · 2019-03-06T19:07:58Z

Any news here?
I still have the sync issue during my daily use (frequency shift > 1 kHz).

AlbrechtL · 2019-05-03T18:19:16Z

From my point of view the sync issue is improved with commit 54f66c9.
@mpbraendli Thanks!

mpbraendli · 2019-05-03T18:47:50Z

Good to hear it does improve the behaviour! Thanks

AlbrechtL created this issue from a note in Tasks (To do) May 1, 2018

AlbrechtL added the task label May 1, 2018

mpbraendli assigned AlbrechtL Jul 26, 2018

mpbraendli moved this from To do to In progress in Tasks Jul 26, 2018

AlbrechtL closed this as completed Nov 25, 2018

AlbrechtL referenced this issue Nov 29, 2018

Add experimental ring buffer based RAW recording

10a6bdd

AlbrechtL reopened this Dec 27, 2018

AlbrechtL added this to the Version 2.0 milestone Dec 29, 2018

AlbrechtL mentioned this issue Jan 8, 2019

[macOS] ringbuffer is using deprecated API on macOS 10.12 #336

Open

andimik mentioned this issue Feb 13, 2019

[next] Rawfile: stations not found #348

Closed

AlbrechtL closed this as completed May 3, 2019

[next] Compare the 3 sync methods in ofdm-processor.cpp (freqsyncMethod) #247

[next] Compare the 3 sync methods in ofdm-processor.cpp (freqsyncMethod) #247

Comments

AlbrechtL commented May 1, 2018

AlbrechtL commented May 1, 2018

mpbraendli commented May 1, 2018

mpbraendli commented May 7, 2018

mpbraendli commented May 7, 2018

mpbraendli commented Jul 26, 2018

AlbrechtL commented Aug 3, 2018

mpbraendli commented Aug 6, 2018

AlbrechtL commented Aug 6, 2018

mpbraendli commented Aug 7, 2018

AlbrechtL commented Aug 7, 2018

mpbraendli commented Aug 8, 2018

AlbrechtL commented Aug 16, 2018 • edited

mpbraendli commented Aug 17, 2018

AlbrechtL commented Aug 18, 2018

mpbraendli commented Aug 19, 2018

AlbrechtL commented Nov 15, 2018 • edited

andimik commented Nov 15, 2018

AlbrechtL commented Nov 16, 2018

AlbrechtL commented Nov 18, 2018

andimik commented Nov 18, 2018

AlbrechtL commented Nov 19, 2018

mpbraendli commented Nov 20, 2018

AlbrechtL commented Nov 20, 2018

AlbrechtL commented Nov 21, 2018

mpbraendli commented Nov 22, 2018

AlbrechtL commented Nov 22, 2018

AlbrechtL commented Nov 25, 2018

mpbraendli commented Nov 26, 2018

AlbrechtL commented Nov 28, 2018

AlbrechtL commented Dec 27, 2018

AlbrechtL commented Dec 27, 2018

AlbrechtL commented Jan 16, 2019

mpbraendli commented Jan 18, 2019

AlbrechtL commented Feb 1, 2019

mpbraendli commented Feb 1, 2019

mpbraendli commented Feb 5, 2019

AlbrechtL commented Feb 5, 2019

AlbrechtL commented Feb 12, 2019

mpbraendli commented Feb 13, 2019

mpbraendli commented Feb 13, 2019

gvanem commented Feb 14, 2019

AlbrechtL commented Feb 14, 2019

AlbrechtL commented Feb 17, 2019

AlbrechtL commented Mar 6, 2019

AlbrechtL commented May 3, 2019

mpbraendli commented May 3, 2019

AlbrechtL commented Aug 16, 2018 •

edited

AlbrechtL commented Nov 15, 2018 •

edited