Live mode NAKREPORT is triggered too quickly after loss detection report #701

ethouris · 2019-05-22T16:37:07Z

The NAKREPORT functionality should send the LOSSREPORT periodically, first after the expected retransmission didn't happen. However it looks like in every case the first of those periodic loss reports is sent immediately after the "detection-based" lossreport (the first one), which makes this report useless, although still resulting in sending the loss-reported packet. This may result in unnecessary excessive retransmissions.

The early research points that probably there can be a problem with setting the time of the periodic nakreport in a situation when one loss report is generated just before the moment when a periodic nakreport is about to be sent due to a previous lossreport. Or, in other words, there are two losses reported in a time relation very close to a time interval of the periodic nakreport. Possibly there should be introduced a time for loss reports so losses are first checked if it isn't "too early" for particular loss range to be reported in NAKREPORT, but still others should be reported.

jeandube · 2019-05-22T19:02:25Z

If I remember well, the NAK report period is based on the RTT. If your test case is with close peers chances are high that immediate NAK and LOSSREPORT collide.

maxsharabayko · 2019-07-03T14:39:12Z

Minimum NAK interval is NAKInt_min=300 ms, and it is set at the very beginning of the streaming (in CUDT::open()). PR #745 updates the timer after the connection is established (updated in CUDT::setupCC()).

NAK interval is updated after sending a periodic loss report. The new value is
NAKInt = RTT + 4 × RTTVar.
This value is passed to the Congestion Control (CC) module.

File CC can update the value based on the reported receiving speed R_rcv (packets per second) and the length of the loss list LOSS_len:
NAKInt = RTT + 4 × RTTVar + LOSS_len × 10⁶ / R_rcv.

Live CC will update the value by dividing it by 2::
NAKInt = (RTT + 4 × RTTVar) / 2.

The minimum value is NAKInt = max(NAKInt, NAKInt_min).

NAK time is updated only after sending the periodic NAK report. Meaning that even if a loss report was already sent, a periodic report can be triggered immediately and send the same loss report again.

ethouris · 2019-11-04T11:10:39Z

On a second thought, the possibility of sending the loss report always twice might be an interesting feature. When we have a probability of losing a packet 20%, which means 80% probability of delivery, first retransmission increases it to 88%, second one to 96%, which is "almost certain" - at the expense of using twice the overhead space as per necessary retransmission. This might be added as an option, after this one is fixed.

maxsharabayko · 2019-11-04T19:15:31Z

When we have a probability of losing a packet 20%, which means 80% probability of delivery, first retransmission increases it to 88%, second one to 96%

First send + retransmission gives 0.8+0.2*0.8=0.96 probability to deliver a packet.
Second retransmission gives 0.992.

ethouris · 2019-12-12T14:35:05Z

These things can be done for improving it:

Every loss must have recorded time of the first notice so that periodic retransmission is focused on particular loss
Checks for a need for NAK report should happen in every checkTimers(), and whether it should be done for particular loss, it should depend on its initial time
After a retransmitted packet was received, the "lite ACK" should be sent, with limitations (that is, not too early towards the previous ACK and not too early before the next "fat ACK").

maxsharabayko · 2019-12-12T15:19:59Z

From the internal report SAS-258.

Some improvements are required to reduce the overhead of periodic NAK reports.

Sender may have a timeout for the next retransmission of the lost packet. The timeout value might be (RTT + 10ms). If sender hasn't received the acknowledgement of the lost packet, it should consider to retransmit it either immediately, or upon reception of the next NAK report.
The receiver can be blocked from acknowledging further DATA packets. During this blockade period, it might still consider to send ACK with the same sequence number, thus letting know that the next packet is not yet received.
The receiver may send lite ACK packet upon reception of the previously missing packet, that was blocking further ACKs. This way it may inform sender within one RTT (instead of RTT + 10ms) that the packet was received.

ethouris · 2019-12-12T18:01:41Z

What you described in #1 is exactly FASTREXMIT. It is intentionally turned off in case when NAKREPORT is working because it's considered efficient enough.

We need to decide what is more important, or even better, provide options that allow users to decide what is more important for them: whether they can accept extra overhead in order to maximize reliability, or they need as small overhead as possible and accept the reliability this setting provides.

If we need more reliability, then of course, packets should be stubbornly retransmitted, but then the receiver should send ACKs quicker in order to update the sender with the "already received packets" information. We might also revive my earlier idea of "ACK bitmap", that is, together with ACK there's sent an additional number that defines the fate of the next 32 packets following the ACK-ed one, so that packets that follow a loss, but were received, won't be further retransmitted. Important thing in this solution is not only RTT, but also RTT variance, or possibly another value settable by an option, so that the false NAK report isn't sent too early in case when RTT happens to often diverge much from the average.

If we need least overhead, then it must be taken care of that packets are retransmitted only if the sender is absolutely certain (it is made so certain by the receiver) that the receiver didn't get this packet retransmitted and minimize the number of uselessly retransmitted packets. This would, however, happen at the expense of decreased probability that twice lost packets will be retransmitted fast enough, and usually higher reliability will come at bigger latency penalty.

The #2 is AFAIK already implemented - even there's a comment that it does it "TCP way". Although probably it works only in file mode and if I'm not mistaken it's what triggers the LATEREXMIT method.

#3 - same as mine point 3 above, and yes, good idea, unless it happens to often. It can't be then sent after every retransmitted packet, but with some reasonable "ionization wearoff time" it would be good.

maxsharabayko added this to the v.1.3.4 milestone May 29, 2019

maxsharabayko added the [core] Area: Changes in SRT library core label Jul 3, 2019

maxsharabayko modified the milestones: v1.3.4, v1.4.1 Aug 9, 2019

ethouris added Priority: Medium Status: Pending Type: Maintenance Work required to maintain or clean up the code labels Nov 4, 2019

ethouris modified the milestones: v1.4.1, v1.4.2 Nov 4, 2019

maxsharabayko mentioned this issue Nov 14, 2019

[core] Improved periodic NAK report timing #961

Merged

ethouris added this to To do in Development via automation Dec 19, 2019

maxsharabayko moved this from To Do to Backlog in Development Jan 17, 2020

maxsharabayko removed this from Backlog in Development Jan 17, 2020

ethouris removed the Status: Pending label Jul 2, 2020

maxsharabayko mentioned this issue Jul 21, 2020

[core] Added snd loss rexmit time check #1362

Merged

3 tasks

ethouris added this to Backlog in Development via automation Jul 22, 2020

ethouris assigned maxsharabayko Jul 22, 2020

maxsharabayko closed this as completed in #1362 Jul 27, 2020

Development automation moved this from Backlog to Done Jul 27, 2020

mbakholdina modified the milestones: v1.5.0, v1.4.2 Oct 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Live mode NAKREPORT is triggered too quickly after loss detection report #701

Live mode NAKREPORT is triggered too quickly after loss detection report #701

ethouris commented May 22, 2019

jeandube commented May 22, 2019

maxsharabayko commented Jul 3, 2019 •

edited

ethouris commented Nov 4, 2019

maxsharabayko commented Nov 4, 2019

ethouris commented Dec 12, 2019

maxsharabayko commented Dec 12, 2019

ethouris commented Dec 12, 2019

Live mode NAKREPORT is triggered too quickly after loss detection report #701

Live mode NAKREPORT is triggered too quickly after loss detection report #701

Comments

ethouris commented May 22, 2019

jeandube commented May 22, 2019

maxsharabayko commented Jul 3, 2019 • edited

ethouris commented Nov 4, 2019

maxsharabayko commented Nov 4, 2019

ethouris commented Dec 12, 2019

maxsharabayko commented Dec 12, 2019

ethouris commented Dec 12, 2019

maxsharabayko commented Jul 3, 2019 •

edited