Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live mode NAKREPORT is triggered too quickly after loss detection report #701

Closed
ethouris opened this issue May 22, 2019 · 7 comments · Fixed by #1362
Closed

Live mode NAKREPORT is triggered too quickly after loss detection report #701

ethouris opened this issue May 22, 2019 · 7 comments · Fixed by #1362
Assignees
Labels
[core] Area: Changes in SRT library core Priority: Medium Type: Maintenance Work required to maintain or clean up the code
Projects
Milestone

Comments

@ethouris
Copy link
Collaborator

The NAKREPORT functionality should send the LOSSREPORT periodically, first after the expected retransmission didn't happen. However it looks like in every case the first of those periodic loss reports is sent immediately after the "detection-based" lossreport (the first one), which makes this report useless, although still resulting in sending the loss-reported packet. This may result in unnecessary excessive retransmissions.

The early research points that probably there can be a problem with setting the time of the periodic nakreport in a situation when one loss report is generated just before the moment when a periodic nakreport is about to be sent due to a previous lossreport. Or, in other words, there are two losses reported in a time relation very close to a time interval of the periodic nakreport. Possibly there should be introduced a time for loss reports so losses are first checked if it isn't "too early" for particular loss range to be reported in NAKREPORT, but still others should be reported.

@jeandube
Copy link
Collaborator

If I remember well, the NAK report period is based on the RTT. If your test case is with close peers chances are high that immediate NAK and LOSSREPORT collide.

@maxsharabayko maxsharabayko added this to the v.1.3.4 milestone May 29, 2019
@maxsharabayko
Copy link
Collaborator

maxsharabayko commented Jul 3, 2019

Minimum NAK interval is NAKIntmin=300 ms, and it is set at the very beginning of the streaming (in CUDT::open()). PR #745 updates the timer after the connection is established (updated in CUDT::setupCC()).

NAK interval is updated after sending a periodic loss report. The new value is
NAKInt = RTT + 4 × RTTVar.
This value is passed to the Congestion Control (CC) module.

File CC can update the value based on the reported receiving speed Rrcv (packets per second) and the length of the loss list LOSSlen:
NAKInt = RTT + 4 × RTTVar + LOSSlen × 106 / Rrcv.

Live CC will update the value by dividing it by 2::
NAKInt = (RTT + 4 × RTTVar) / 2.

The minimum value is NAKInt = max(NAKInt, NAKIntmin).

NAK time is updated only after sending the periodic NAK report. Meaning that even if a loss report was already sent, a periodic report can be triggered immediately and send the same loss report again.

@maxsharabayko maxsharabayko added the [core] Area: Changes in SRT library core label Jul 3, 2019
@maxsharabayko maxsharabayko modified the milestones: v1.3.4, v1.4.1 Aug 9, 2019
@ethouris ethouris added Priority: Medium Status: Pending Type: Maintenance Work required to maintain or clean up the code labels Nov 4, 2019
@ethouris ethouris modified the milestones: v1.4.1, v1.4.2 Nov 4, 2019
@ethouris
Copy link
Collaborator Author

ethouris commented Nov 4, 2019

On a second thought, the possibility of sending the loss report always twice might be an interesting feature. When we have a probability of losing a packet 20%, which means 80% probability of delivery, first retransmission increases it to 88%, second one to 96%, which is "almost certain" - at the expense of using twice the overhead space as per necessary retransmission. This might be added as an option, after this one is fixed.

@maxsharabayko
Copy link
Collaborator

When we have a probability of losing a packet 20%, which means 80% probability of delivery, first retransmission increases it to 88%, second one to 96%

First send + retransmission gives 0.8+0.2*0.8=0.96 probability to deliver a packet.
Second retransmission gives 0.992.

@ethouris
Copy link
Collaborator Author

These things can be done for improving it:

  1. Every loss must have recorded time of the first notice so that periodic retransmission is focused on particular loss
  2. Checks for a need for NAK report should happen in every checkTimers(), and whether it should be done for particular loss, it should depend on its initial time
  3. After a retransmitted packet was received, the "lite ACK" should be sent, with limitations (that is, not too early towards the previous ACK and not too early before the next "fat ACK").

@maxsharabayko
Copy link
Collaborator

From the internal report SAS-258.

Some improvements are required to reduce the overhead of periodic NAK reports.

  1. Sender may have a timeout for the next retransmission of the lost packet. The timeout value might be (RTT + 10ms). If sender hasn't received the acknowledgement of the lost packet, it should consider to retransmit it either immediately, or upon reception of the next NAK report.
  2. The receiver can be blocked from acknowledging further DATA packets. During this blockade period, it might still consider to send ACK with the same sequence number, thus letting know that the next packet is not yet received.
  3. The receiver may send lite ACK packet upon reception of the previously missing packet, that was blocking further ACKs. This way it may inform sender within one RTT (instead of RTT + 10ms) that the packet was received.

@ethouris
Copy link
Collaborator Author

What you described in #1 is exactly FASTREXMIT. It is intentionally turned off in case when NAKREPORT is working because it's considered efficient enough.

We need to decide what is more important, or even better, provide options that allow users to decide what is more important for them: whether they can accept extra overhead in order to maximize reliability, or they need as small overhead as possible and accept the reliability this setting provides.

If we need more reliability, then of course, packets should be stubbornly retransmitted, but then the receiver should send ACKs quicker in order to update the sender with the "already received packets" information. We might also revive my earlier idea of "ACK bitmap", that is, together with ACK there's sent an additional number that defines the fate of the next 32 packets following the ACK-ed one, so that packets that follow a loss, but were received, won't be further retransmitted. Important thing in this solution is not only RTT, but also RTT variance, or possibly another value settable by an option, so that the false NAK report isn't sent too early in case when RTT happens to often diverge much from the average.

If we need least overhead, then it must be taken care of that packets are retransmitted only if the sender is absolutely certain (it is made so certain by the receiver) that the receiver didn't get this packet retransmitted and minimize the number of uselessly retransmitted packets. This would, however, happen at the expense of decreased probability that twice lost packets will be retransmitted fast enough, and usually higher reliability will come at bigger latency penalty.

The #2 is AFAIK already implemented - even there's a comment that it does it "TCP way". Although probably it works only in file mode and if I'm not mistaken it's what triggers the LATEREXMIT method.

#3 - same as mine point 3 above, and yes, good idea, unless it happens to often. It can't be then sent after every retransmitted packet, but with some reasonable "ionization wearoff time" it would be good.

@ethouris ethouris added this to To do in Development via automation Dec 19, 2019
@maxsharabayko maxsharabayko moved this from To Do to Backlog in Development Jan 17, 2020
@maxsharabayko maxsharabayko removed this from Backlog in Development Jan 17, 2020
@ethouris ethouris added this to Backlog in Development via automation Jul 22, 2020
Development automation moved this from Backlog to Done Jul 27, 2020
@mbakholdina mbakholdina modified the milestones: v1.5.0, v1.4.2 Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Priority: Medium Type: Maintenance Work required to maintain or clean up the code
Projects
No open projects
Development
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants