Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Added group receiver drop log warning #1761

Merged

Conversation

maxsharabayko
Copy link
Collaborator

@maxsharabayko maxsharabayko commented Jan 25, 2021

Added log message in case a packet is dropped due to a kangaroo link.

TODO:

  • A packet can also be dropped in some other cases to be identified, which is also not shown in stats.
    Somewhere related to this code block:
if (seqdiff > 1)
{
    HLOGC(grlog.Debug,
            log << "@" << id << " %" << mctrl.pktseq << " #" << mctrl.msgno << " AHEAD base=%"
                << m_RcvBaseSeqNo);
    p->packet.assign(buf, buf + stat);
    p->mctrl = mctrl;
    break; // Don't read from that socket anymore.
}

Fixes #1760

@maxsharabayko maxsharabayko added the [core] Area: Changes in SRT library core label Jan 25, 2021
@maxsharabayko maxsharabayko added this to the v1.4.3 milestone Jan 25, 2021
@maxsharabayko maxsharabayko self-assigned this Jan 25, 2021
@maxsharabayko
Copy link
Collaborator Author

srt-xtransmit.exe receive srt://:4200 srt://:4201 --enable-metrics -v
13:18:32.820020 [I] SRC URL: srt://:4200
13:18:32.820604 [I] SRC URL: srt://:4201
13:18:32.822032 [T] SOCKET::SRT_GROUP Creating a group of listeners
13:18:32.825036 [T] SOCKET::SRT_GROUP Created listener 0xD681AD on :4200
13:18:32.826792 [T] SOCKET::SRT_GROUP Created listener 0xD681AC on :4201
13:18:34.873934 [T] SOCKET::SRT_GROUP Accepted member socket 0xD681AB, remote IP 192.168.1.115:63374
13:18:34.883597 [I] SOCKET::SRT_GROUP Accepted connection sock 0x40D681AA
13:18:34.886273 [T] STATS: Added socket 1087799722.
13:18:34.911351 [T] SOCKET::SRT_GROUP Member socket 0xD681AB, BALANCING weight = 25
13:18:35.031192 [T] SOCKET::SRT_GROUP Accepted member socket 0xD681A9, remote IP 192.168.1.115:63114
13:18:35.065258 [T] SOCKET::SRT_GROUP Member socket 0xD681A9, BALANCING weight = 75
13:18:36.101000/T7476!W:SRT.gr: @14057897 %1459377964 #24 AHEAD base=%1459377962
13:18:36.102000/T7476!W:SRT.gr: group/recv @14057899: SPURIOUS epoll, ignoring
13:18:36.102000/T7476!W:SRT.br: @14057897 seqno %1459377964 base seqno %1459377964 (seqdiff 0)
13:18:36.103000/T7476!W:SRT.br: @14057899 seqno %1459377964 base seqno %1459377964 (seqdiff 0)
13:18:36.119000/T7476!W:SRT.gr: group/recv @14057897: SPURIOUS epoll, ignoring
13:18:36.120000/T7476!W:SRT.br: @14057897 seqno %1459377965 base seqno %1459377965 (seqdiff 0)
13:18:36.121000/T7476!W:SRT.br: @14057899 seqno %1459377965 base seqno %1459377965 (seqdiff 0)
13:18:36.136000/T7476!W:SRT.gr: group/recv @14057897: SPURIOUS epoll, ignoring
13:18:36.137000/T7476!W:SRT.br: @14057897 seqno %1459377967 base seqno %1459377967 (seqdiff 0)
13:18:36.137000/T7476!W:SRT.br: @14057899 seqno %1459377967 base seqno %1459377967 (seqdiff 0)
13:18:36.142000/T7476!W:SRT.gr: group/recv @14057899: SPURIOUS epoll, ignoring
13:18:36.142000/T7476!W:SRT.br: @14057897 seqno %1459377968 base seqno %1459377968 (seqdiff 0)
13:18:36.142000/T7476!W:SRT.br: @14057899 seqno %1459377968 base seqno %1459377968 (seqdiff 0)
13:18:36.193000/T7476!W:SRT.gr: group/recv @14057897: SPURIOUS epoll, ignoring
13:18:36.193000/T7476!W:SRT.br: @14057899 seqno %1459377972 base seqno %1459377972 (seqdiff 0)
13:18:36.224000/T7476!W:SRT.gr: group/recv @14057897: SPURIOUS epoll, ignoring
13:18:36.224000/T7476!W:SRT.br: @14057897 seqno %1459377975 base seqno %1459377975 (seqdiff 0)
13:18:36.225000/T7476!W:SRT.br: @14057899 seqno %1459377975 base seqno %1459377975 (seqdiff 0)
13:18:36.255000/T7476!W:SRT.gr: group/recv @14057899: SPURIOUS epoll, ignoring
13:18:36.255000/T7476!W:SRT.br: @14057897 seqno %1459377978 base seqno %1459377978 (seqdiff 0)
13:18:36.256000/T7476!W:SRT.br: @14057899 seqno %1459377978 base seqno %1459377978 (seqdiff 0)
13:18:36.382000/T7476!W:SRT.gr: group/recv @14057897: SPURIOUS epoll, ignoring
13:18:36.383000/T7476!W:SRT.br: @14057897 seqno %1459377990 base seqno %1459377990 (seqdiff 0)
13:18:36.384000/T7476!W:SRT.br: @14057899 seqno %1459377990 base seqno %1459377990 (seqdiff 0)
13:18:36.474000/T7476!W:SRT.gr: group/recv @14057897: SPURIOUS epoll, ignoring
13:18:36.475000/T7476!W:SRT.br: @14057897 seqno %1459377999 base seqno %1459377999 (seqdiff 0)
13:18:36.476000/T7476!W:SRT.br: @14057899 seqno %1459377999 base seqno %1459377999 (seqdiff 0)
13:18:36.582000/T7476!W:SRT.gr: group/recv @14057899: SPURIOUS epoll, ignoring
13:18:36.583000/T7476!W:SRT.br: @14057897 seqno %1459378009 base seqno %1459378009 (seqdiff 0)
13:18:36.584000/T7476!W:SRT.br: @14057899 seqno %1459378009 base seqno %1459378009 (seqdiff 0)
13:18:36.629000/T7476!W:SRT.gr: group/recv @14057897: SPURIOUS epoll, ignoring
13:18:36.632000/T7476!W:SRT.br: @14057897 seqno %1459378014 base seqno %1459378014 (seqdiff 0)
13:18:36.636000/T7476!W:SRT.br: @14057899 seqno %1459378014 base seqno %1459378014 (seqdiff 0)
13:18:36.720000/T7476!W:SRT.gr: group/recv @14057897: SPURIOUS epoll, ignoring
13:18:36.720000/T7476!W:SRT.br: @14057897 seqno %1459378022 base seqno %1459378022 (seqdiff 0)
13:18:36.720000/T7476!W:SRT.br: @14057899 seqno %1459378022 base seqno %1459378022 (seqdiff 0)
13:18:36.906814 [I] RECEIVE Latency, us (min/max/avg): 889673/912948/897177. Jitter: 6054us. Pkts: rcvd 99, reordered 0 (dist 0), lost 1

@ethouris
Copy link
Collaborator

There could be a small problem in the algorithm of the receiver function, although it will have to be improved anyway with the common buffer.

Currently it relies strictly on the TSBPD signoff for every member link. When a packet is then ready to play (note that that's the reason why clocks on all member links must be perfectly synchronized), it is reported in epoll as ready for extraction. In case when this packet is ready to play, but it's not the direct successor of the previously delivered packet, this situation is considered as "kangaroo". It is simply stated as impossible that even if both links are lacking a packet (let's say, it was simultanouesly lost on both links and one of them has it recovered) and any link is reporting a packet next-to-loss as ready to play, then this packet even if delivered later on the other link, it will be forgotten. This situation is theoretically possible in the following case (let's state parallel sending, so this applies to both broadcast and backup):

  • 9 A=delivered B=delivered (skipped)
  • 10 both lost and recovered
  • 11 A=delivered, B=lost
  • 12 A=delivered, B=... recovered 10 and 11, delivering 10, 11 and 12

If this happens, packet 10, as having been dropped by link A, and not recovered by link B until the time to play for 11 has come (it did, but only just before the time for 12 has come), packet 10 will not be delivered to the output even though it was successfully recovered by link B.

A straightforward algorithm for common buffer would do exactly the same. But with the common buffer this algorithm can be improved by giving some extra allowed delay in case when the latest received sequence on link B is before the packet that was lost and covered on link A. Of course this tolerance cannot be too big as the packet not delivered on time should be forgot anyway. Also I don't think this case is such a big problem - in this example the packet 10 had only an extra good luck to have been delivered because it did it before packet 12, but if packet 11 was delivered on link B in this case, packet 10 would be dropped anyway.

@maxsharabayko
Copy link
Collaborator Author

maxsharabayko commented Feb 2, 2021

The untracked group drop (that does not get into the stats and is not covered by the log message added in this PR) was due to a bug in CUDTGroup::checkPacketAhead(). The bug is to be fixed (#1783). This PR is ready to go.

CUDTGroup::ReadPos* CUDTGroup::checkPacketAhead()
{
    // (...)
        const int seqdiff = CSeqNo::seqcmp(a.mctrl.pktseq, m_RcvBaseSeqNo);
        if (seqdiff == 1)
        {
            // The very next packet. Return it.
            // XXX SETTING THIS ONE IS PROBABLY A BUG.
            m_RcvBaseSeqNo = a.mctrl.pktseq;

@maxsharabayko maxsharabayko marked this pull request as ready for review February 2, 2021 15:52
@maxsharabayko maxsharabayko added the Type: Maintenance Work required to maintain or clean up the code label Feb 2, 2021
srtcore/group.cpp Outdated Show resolved Hide resolved
@maxsharabayko maxsharabayko merged commit 2e9c1c7 into Haivision:master Feb 5, 2021
@maxsharabayko maxsharabayko deleted the hotfix/group-rcv-drop-log branch February 5, 2021 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Maintenance Work required to maintain or clean up the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Connection Bonding: Add TSBPD RCV-DROPPED log for the group
2 participants