Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOCAL STORAGE DEPLETED #486

Closed
prabh13singh opened this issue Oct 30, 2018 · 23 comments · Fixed by #1335
Closed

LOCAL STORAGE DEPLETED #486

prabh13singh opened this issue Oct 30, 2018 · 23 comments · Fixed by #1335
Assignees
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@prabh13singh
Copy link
Contributor

prabh13singh commented Oct 30, 2018

I am seeing these error issue in the logs on the receiver side quite often and dropping lots of packets:

18:16:46.245466/SRT:RcvQ:workerE: SRT.c: LOCAL STORAGE DEPLETED. Dropping 1 packet: DATA: msg=11813898 seq=1232055391
18:58:02.878448/SRT:RcvQ:worker
E: SRT.c: IPE: ACK node overwritten when acknowledging 1095568, ack extracted: 1330856570
20:40:51.114986/SRT:TsbPd D: SRT.c: TLPKTDROP seq 1576924112-1576924113 (1 packets)
20:40:51.115012/SRT:TsbPd*E: SRT.t: %143964518:TSBPD:DROPSEQ: up to seq=1576924113 (2 packets) playable at 20:40:51.111753 delayed 3.240 mspackets_dropped:1

@ethouris
Copy link
Collaborator

ethouris commented Nov 1, 2018

Some of them look like performance problem.

The first log looks like some very unlikely fallback, when the unit queue for incoming packets is depleted (its size is hardcoded to 32, but there are multiple queues). This keeps all packets as they come in and theoretically they should be then quickly processed and passed further - command packets interpreted, data packets stored in the receiver buffer. As I have seen in the code, it may happen in a situation when the queue buffer is stretched to the limit, but if this was a data packet, it still undergoes retransmission.

The second one is something I thought it's IPE, but it looks like it may happen during a normal processing. SRT (since UDT) features the "ACK response", which is used to confirm against the sender that these packets are now removed from the sender buffer (otherwise ACK should be repeated). ACK nodes are stored in a special contained so that ACKACK can extract the node so that the ACK window can be updated. I'm not sure exactly about all possible situations when the ACK node disappears (beside that it was already processed by ACKACK once in the past). Anyway, the only impact it has is that the statistics updated for that case do not get this information in this case updated in the window, which calculates the reception speed and RTT.

TLPKTDROP is a mechanism of intentional dropping a packet, when it precedes an already received packet that is ready to play - it's preferred to waiting indefinitely for retransmission. This is controlled by a flag (SRTO_TLPKTDROP) and it's on by default. Increasing latency obviously increases the chance that the lost packet will be retransmitted on time.

If you also observe drops on the sender (SND-DROPPED in the log), the SRTO_SNDDROPDELAY option may help. The value is an extra time delay (ms) to wait until the sender drops packets, or when set to -1, the sender won't drop packets at all.

@prabh13singh
Copy link
Contributor Author

The only thing I'm mainly concerned about is TLPKTDROP. Right now, I am testing 3-4 instances of application with 25mbps video over the LAN. Although, I am seeing from stats that RTT: 0.06ms but I am wondering that If I still increase latency to say 500, it will affect my performance when I use just one instance of 25mbps or even less than that. Actually, I want to get their constant values which should not affect the performance if i increase/decrease instance or bit-rate of videos to certain extent.

@ethouris
Copy link
Collaborator

ethouris commented Nov 1, 2018

Latency doesn't change anything in the performance. It's just the matter of how long a packet is being kept in the buffer (that is, it affects the time when "packet sign-off" happens). It may increase the size of intermediately used receiver buffer, so by some extreme values of latency it may need to be extended, but I remember guys that were used 1080p/60fps with 6 seconds latency and default size of the receiver buffer.

RTT 0.06ms is for me kinda unlikely, even in a local network I can barely achieve RTT below 1ms. Anyway, if the network is more likely to drop a packet, a longer latency may be indeed necessary.

@prabh13singh
Copy link
Contributor Author

prabh13singh commented Nov 1, 2018

Thank you for the information. For latency, I think I was testing with some 20 mbps videos for error recovery using tc qdisc around latency values ranging 120-1000, but for higher range. It was losing connection but I was keeping default buffer sizes.
Anyway, I have increased latency and buffer sizes and running some tests and will let you know how it behaves.
For RTT, I saw those values from stats. I think running multiple instances may have caused congestion and
packet drops.

@prabh13singh
Copy link
Contributor Author

prabh13singh commented Nov 2, 2018

Hi @ethouris , I increased latency to 2000 and oheadbw to 40% and could run 4 instances of 25mbps for hours without any packet drop, but as I switched to 60Mbps video, it couldn't survive for more than 2 seconds. When I set default value of latency or increase buffers to 100000, I am able to get it working but still, there are some packet drops and it doesn't report anything in logs.

@jeandube
Copy link
Collaborator

jeandube commented Nov 2, 2018

@prabh13singh Do the maths , you may not have enough buffers to hold 2000 ms of 60 Mbps video. I guess around 20Mb is required and default is around 10Mb (SRTO_SNDBUF/SRTO_RCVBUF).

@prabh13singh
Copy link
Contributor Author

Sorry, I dont' understand. What is 10Mb (SRTO_SNDBUF/SRTO_RCVBUF) -the ratio of buffers?

@jeandube
Copy link
Collaborator

jeandube commented Nov 2, 2018

SRTO_SNDBUF and SRTO_RCVBUF are the configured number of send and receive buffer to handle retransmission. SRTO_MSS is the size of each (1500) each buffer hold one packet. your 60mbps video should generates aound 6000 pkt/secs. with a latency of 2000 ms you need 12000 buffers and default is lower than that. by reducing latency to default, you need a lot less buffers.

@prabh13singh
Copy link
Contributor Author

prabh13singh commented Nov 2, 2018

But where do we set them? We have m_iSndBufSize and m_iRcvBufSize in core.cpp. I am increasing those ones and increasing latency to prevent too late packet drops. Are we also supposed to change m_iUDPSndBufSize and m_iUDPRcvBufSize as well?

@prabh13singh
Copy link
Contributor Author

How do different application instances affect the performance of the video? If I run single or two instances, then I see no packet drops, but if I increase the number of instances, I see CC Errors on final udp output but I don't see any dropseq like log. All I see in logs is No room to store incoming packet, IPE: ACK node overwritten when acknowledging and LOCAL STORAGE DEPLETED in all of them. I'm not getting where the packets are getting dropped.

@ethouris
Copy link
Collaborator

Please redo the tests on my experimental branch where I'm attempting to get rid of CPU abuse in the RcvQ:worker thread:

https://github.com/ethouris/srt/tree/dev-rcvworker-sleeps

@prabh13singh
Copy link
Contributor Author

It can't go above 30 mbps even if set maxbw=100000000

@ethouris
Copy link
Collaborator

The question is whether you could with the previous version (1.2.0). Note that maxbw is only an option to limit the bandwidth, and by default it uses as much as it is available.

@prabh13singh
Copy link
Contributor Author

Yes, by also increasing buffer sizes.

@ethouris
Copy link
Collaborator

Ok, then please try to get some information about the network characteristics (RTT, available overall bandwidth) for the network you used for testing, as well as the stream you are trying to send (video paramteres). We'll try to reinstate this environment in out lab.

@ethouris ethouris added [core] Area: Changes in SRT library core Status: Revision Needed Type: Question Questions or things that require clarification labels Aug 13, 2019
@mbakholdina mbakholdina added this to the Parking Lot milestone Feb 20, 2020
@amiller-isp
Copy link

Is this issue still being considered? We are also seeing errors like:

/SRT:RcvQ:worker: LOCAL STORAGE DEPLETED. Dropping 1 packet: TARGET=565733709 DATA: msg=63890127 seq=1948012857 size=1316 flags: PB_SOLO ORD_RELAXED EK_ODD SN_ORIGINAL
/SRT:RcvQ:worker: LOCAL STORAGE DEPLETED. Dropping 1 packet: TARGET=981811147 CONTROL: ackack size=4

They occur very sporadically but would like to get to the root cause. I'd be happy to provide more detail etc. We are currently running SRT 1.4.1.

@maxsharabayko
Copy link
Collaborator

Hi @amiller-isp
This issue is in the parking lot at the moment.
But if you can help on getting deeper into it, we would appreciate your effort.

"LOCAL STORAGE DEPLETED" happens when there is no available memory unit (in the pre-allocated pool) to place incoming packet into.
The memory is managed by CRcvQueue::m_UnitQueue object. The initial number of memory units to store incoming packets is equal to 32. And it is dynamically increased when more units are needed.

This message means that this dynamic increase has failed.
The normal reason is if there is just not enough memory in the system.
But it could be an error in increasing logic as well.
Unfortunately, there were no logs that could point to the reason.

I've prepared this branch with additional logs: branch.
Could you please repeat your experiment with that version and share the log you get in case of this "storage depleted" error?

@maxsharabayko maxsharabayko self-assigned this May 8, 2020
@mbakholdina mbakholdina added this to Needs Triage in GitHub Issues via automation May 8, 2020
@mbakholdina mbakholdina moved this from Needs Triage to In progress in GitHub Issues May 8, 2020
@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior and removed Status: Unclear Type: Question Questions or things that require clarification labels May 25, 2020
@maxsharabayko
Copy link
Collaborator

There is a synchronization issue in accessing CUnitQueue by the receiver.
One thread calls CRcvBuffer::addData(..) calling m_pUnitQueue->makeUnitGood(unit);

At the same time, another thread may call CRcvBuffer::readMsg(..) eventually calling m_pUnitQueue->makeUnitFree(u).

The readMsg thread locks m_RecvLock, while the addData thread does not. Thus both threads can try to modify the same CUnitQueue counter of free units simultaneously, which leads to a wrong value and mismatch between the actual number of available units and the counted one.

@maxsharabayko maxsharabayko modified the milestones: Parking Lot, v1.5.0 May 26, 2020
@amiller-isp
Copy link

So we considered deploying your branch fix but we would have to deploy it into our production environment so it would need to be fully validated first. The message is very infrequent. In our environment appears roughly once every day per hundred channels.

@maxsharabayko
Copy link
Collaborator

So we considered deploying your branch fix but we would have to deploy it into our production environment so it would need to be fully validated first. The message is very infrequent. In our environment appears roughly once every day per hundred channels.

The branch I referred above only has improved logs to track down the issue.
The fix is under development. I will update you once it is ready. :)

@mbakholdina mbakholdina modified the milestones: v1.5.0, Sprint 16 May 29, 2020
maxsharabayko added a commit to maxsharabayko/srt that referenced this issue Jun 9, 2020
Covers some issues related to Local storage depleted issue Haivision#486
maxsharabayko added a commit to maxsharabayko/srt that referenced this issue Jun 9, 2020
Refers to issue Haivision#486 getting available uit to receive a packet
maxsharabayko added a commit to maxsharabayko/srt that referenced this issue Jun 9, 2020
Covers some issues related to Local storage depleted issue Haivision#486
maxsharabayko added a commit to maxsharabayko/srt that referenced this issue Jun 9, 2020
Refers to issue Haivision#486 getting available uit to receive a packet
maxsharabayko added a commit to maxsharabayko/srt that referenced this issue Jun 9, 2020
Covers some issues related to Local storage depleted issue Haivision#486
maxsharabayko added a commit to maxsharabayko/srt that referenced this issue Jun 9, 2020
Refers to issue Haivision#486 getting available uit to receive a packet
maxsharabayko added a commit to maxsharabayko/srt that referenced this issue Jun 9, 2020
maxsharabayko added a commit to maxsharabayko/srt that referenced this issue Jun 9, 2020
Refers to issue Haivision#486 getting available uit to receive a packet
maxsharabayko added a commit that referenced this issue Jun 11, 2020
Covers some issues related to Local storage depleted issue #486
maxsharabayko added a commit that referenced this issue Jun 11, 2020
Refers to issue #486 getting available uit to receive a packet
GitHub Issues automation moved this from In progress to Done Jun 11, 2020
maxsharabayko added a commit that referenced this issue Jun 11, 2020
@maxsharabayko
Copy link
Collaborator

Hi @amiller-isp @prabh13singh
The latest SRT master contains fixes of this issue (PR #1333 and PR #1335).
Please do not hesitate to reopen this issue if the problem remains.

@amiller-isp
Copy link

@maxsharabayko
Great! We might be able to pick this up from master. Is there a release planned any time soon? It seems like there have been a lot of commits since 1.4.1

@maxsharabayko
Copy link
Collaborator

@amiller-isp
No certain release date at the moment. There are still things to be done. The release will depend on some decisions to be discussed soon and the availability of QA.

@mbakholdina mbakholdina modified the milestones: v1.5.0 - Sprint 16, v1.4.2 Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
No open projects
GitHub Issues
  
Done
Development

Successfully merging a pull request may close this issue.

6 participants