-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jitterbuffer fix + basic send PLI implementation #461
Conversation
This commit fixes the jitterbuffer issue that caused severe picture lost issues and adds a simple send PLI mechanism
Codecov Report
@@ Coverage Diff @@
## main #461 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 31 31
Lines 5498 5566 +68
=========================================
+ Hits 5498 5566 +68
Continue to review full report at Codecov.
|
The code has been optimized and some additional bugfixes has been added Introduced a new smart_remove() method that prevents sending corrupt frames to the decoder.
49893a8
to
4644cae
Compare
With the new smart_remove() method I had to modify the tests to take into account the new smart_remove behaviour
Could we please have a clean PR so there is a chance of actually merging it? If you have identified a bug related to packet number wrap-around, please submit a PR which addresses JUST that. The work on PLI seems interesting but incomplete, so this looks like a candidate for a separate PR. |
What exactly do you mean by clean PR exactly? - do you want me to squash all commits into one or are you talking about separating the wrap-around fix from the PLI part? From my perspective they both are a jitterbuffer fix as a properly implemented jitterbuffer should trigger a PLI request each time it is being purged (reset). PS. The PLI implementation I proposed is simple but it is also complete (it addresses all the cases when a PLI should be sent). The only thing missing is the PLI re-transmission mechanism, but that can be a candidate for a separate PR as you said. |
Yes if you could please split the wrap-around fix and the PLI part that would be perfect. Also make sure you remove any "print()" statements and commented lines of code please. |
This commit cleanes the code from print() statements and commented lines
I've cleared the code as you asked. For the split of code i did create a separate branch with just the wrap-around fix, but before switching the branches in the PR I would like to try to convince you that pushing the PR as it is right now is actually a better idea - and here is why:
I hope you do see the reasoning behind my point of view and that you'll agree on merging the fix in it's current form - if not then I'll of course switch the branches in the PR and we'll do it your way. I'm waiting for your final decision, |
tests/test_jitterbuffer.py
Outdated
@@ -114,14 +141,14 @@ def test_add_seq_too_high_discard_four(self): | |||
jbuffer.add(RtpPacket(sequence_number=1, timestamp=1234)) | |||
self.assertEqual(jbuffer._origin, 0) | |||
|
|||
jbuffer.add(RtpPacket(sequence_number=2, timestamp=1234)) | |||
self.assertEqual(jbuffer._origin, 0) | |||
# jbuffer.add(RtpPacket(sequence_number=2, timestamp=1234)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the code instead of commenting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem I'll remove it - I just thought it's easier to understand what's going on this way.
tests/test_jitterbuffer.py
Outdated
jbuffer = JitterBuffer(capacity=4) | ||
|
||
jbuffer.add(RtpPacket(sequence_number=0, timestamp=1234)) | ||
# jbuffer.add(RtpPacket(sequence_number=1, timestamp=1234)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the commented line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem I'll remove it - I just thought it's easier to understand this way.
src/aiortc/jitterbuffer.py
Outdated
@@ -73,7 +83,7 @@ def _remove_frame(self, sequence_number: int) -> Optional[JitterFrame]: | |||
# check we have prefetched enough | |||
frames += 1 | |||
if frames >= self._prefetch: | |||
self.remove(remove) | |||
self.remove(remove) # this might be a bit faster than smart_remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the comment: should we use smart_remove() or not here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we can get rid of the remove() function entirely and replace it with smart_remove(). I haven't done so already as i didn't want to brake the backward compatibility and because smart_remove() checks the dumb_mode flag with each for loop tick (that's why it might be slightly slower). The smart_remove() run with dumb_mode flag set to true behaves exactly as the remove() function. I see 3 options here, we can:
- leave it as it is
- replace the remove() calls with smart_remove(dumb_mode=True) calls everywhere and get rid of remove() method
- rename smart_remove() to remove() and replace the dumb_mode flag with smart_mode flag in the current implementation - then we will have only remove(int) calls or remove(int, smart_mode=True) calls everywhere.
I leave the final decision to you as you now better how each of the changes can impact the rest of aiortc code.
src/aiortc/jitterbuffer.py
Outdated
delta = 0 | ||
misorder = 0 | ||
else: | ||
delta = (packet.sequence_number - self._origin) % self.__max_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use uint16_add (defined in aiortc.utils) for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so - I believe in python (-2 & 0xFFFF) = 2 while (-2 % 65536) = 65534
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope they are identical:
>>> -2 & 0xFFFF
65534
>>> -2 % 65536
65534
It's good news too, otherwise all our sequence number calculations for what we send would be messed up :)
tests/test_jitterbuffer.py
Outdated
""" | ||
Send an RTCP packet to report picture loss. | ||
""" | ||
print("[INFO][JitterBuffer] PLI sent to media_ssrc:", media_ssrc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This print() statement seems kind of pointless. Could we instead define a function in each test which stores a value which we can assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well you're right that it does nothing currently - I just thought it looks better then a pass statement. I'm not sure if I get what you want to achieve with the assert method though. For me this would be also pointless, as the async function will be run after the test finishes and we can not run a synchronous method with asyncio.ensure_future() (jitterbufer.py line 44 an 54).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is we are not asserting the function was actually called.
On the whole this PR looks very good, thanks so much for working on this. The one point I'm not very comfortable about is having synchronous code calling into asynchronous code. Is there any other way we could handle this? |
OK - I've added some comments to your review and I'll wait for your answer before pushing the next commit. I see your point with running the asynchronous code from synchronous one - but I haven't found any easy solution to get around it. I did test the code very thoroughly though and confirmed that everything works as intended. If not doing this synchronous/asynchronous mix is really that important to you then I believe the best solution would be to rewrite the jitterbuffer code and make it asynchronous, as all the rtcp send methods I've found are asynchronous as well. |
src/aiortc/jitterbuffer.py
Outdated
self._origin = (self._origin + 1) % self.__max_number | ||
|
||
def smart_remove(self, count: int, dumb_mode: bool = False) -> bool: | ||
# smart_remove makes sure that all packages belonging to the same frame are removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this sorry, make this a docstring
There is a very simple way of avoiding the call to the asynchronous method: change the signature of aiortc/src/aiortc/rtcrtpreceiver.py Line 490 in d56e14c
|
Solved async function call from synchronous code, and removed dumb_flag from smart_remove, for "dumb remove" remove should be called from now on.
src/aiortc/jitterbuffer.py
Outdated
@@ -89,4 +99,23 @@ def remove(self, count: int) -> None: | |||
for i in range(count): | |||
pos = self._origin % self._capacity | |||
self._packets[pos] = None | |||
self._origin += 1 | |||
self._origin = (self._origin + 1) % self.__max_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._origin = uint16_add(self._origin, 1)
src/aiortc/jitterbuffer.py
Outdated
break | ||
timestamp = packet.timestamp | ||
self._packets[pos] = None | ||
self._origin = (self._origin + 1) % self.__max_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._origin = uint16_add(self._origin, 1)
src/aiortc/jitterbuffer.py
Outdated
@@ -18,35 +19,44 @@ def __init__(self, capacity: int, prefetch: int = 0) -> None: | |||
self._origin: Optional[int] = None | |||
self._packets: List[Optional[RtpPacket]] = [None for i in range(capacity)] | |||
self._prefetch = prefetch | |||
self.__max_number = 65536 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed with the suggested changes below
src/aiortc/jitterbuffer.py
Outdated
self.remove(self.capacity) | ||
self._origin = packet.sequence_number | ||
delta = misorder = 0 | ||
if self._capacity >= 128: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the 128 magic value? Could we have a named constant with a descriptive name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the value set in rtcrtpreceiver.py (line 257) as the jitterbuffer size for video stream. I assume that this jitterbuffer size for video will rather increase in time, while the audio buffer should not reach 128 any time soon. I used the jitterbuffer size (capacity) to make sure that we are actually dealing with video before sending PLI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A viable alternative would be to add a "is_video" flag to the jitterbuffer __init__
method, as currently there is no easy way to check if the jitterbuffer is used for video other then checking the self._capacity
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_video
sounds like a good idea!
excess = delta - self.capacity + 1 | ||
self.remove(excess) | ||
if self.smart_remove(excess): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does smart_remove
change self._origin if we also change it right afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We change it afterwards only when the buffer has been entirely purged (in such a case the smart_remove returns True). If we would delete the "self._origin = packet.sequence_number" from line 52, in case of situation where the buffer was purged we would and up with a self._origin value that points to a self._packets element that is None and this is a bad thing that would cause the buffer to be 100% filled no matter what (the remove_frame method would terminate on line 72 until the value would be overwritten).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW we have the same situation with remove
in line 40 and 41 of jitterbuffer.py. I've found that the previous code did not address the situation correctly when I've encountered some abnormal local network conditions, that caused a lot of packet drops.
I'll do another iteration with the code to address the suggestions from the newest review tomorrow. I think i should be able to push the next commit tomorrow before 15:00 (CET) |
src/aiortc/rtcrtpreceiver.py
Outdated
pli_flag, encoded_frame = self.__jitter_buffer.add(packet) | ||
# check if the PLI should be sent | ||
if pli_flag: | ||
asyncio.ensure_future(self._send_rtcp_pli(packet.ssrc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use ensure_future, just await the call to send_rtcp_pli
Small code corections + new test_rtp_missing_video_packet test that takes into account the send PLI method.
Ok I think we've reached a good place and you have obviously taken the time to test this code. I'm going to merge it as-is, we can always have a follow-up PR. I'd actually love to see other PRs from you if you have time. Thanks so much for bringing the coverage back to 100%! |
Ah crap I forgot to squash the commits. The final commit is 42f4e0f |
Hi,
I've prepared a fix that may interest a lot of people having picture lost issues. I discovered that a big part of those issues might be related with a bad implementation of a jitter buffer that is used for collecting RTP packets before they are passed to the appropriate decoding function. You see, each of the RTP packets sent by a browser (sender) has a packet number that is used to properly group the packets into video frames on the receiver side. In theory those packet numbers should be unique, in practice however they are limited by the unsigned short int size which is 65535. Unfortunately the aiortc jitter buffer implementation did not handle the packet numbering return from 65535 to 0 correctly, causing the jitter buffer to be purged and the video signal to be lost or at least largely degraded. The situation repeated itself every 65535 received packets and was complicated by the fact that the aiortc does not support sending PLI (picture lost indicator) messages. Without the PLI messages being sent that video image could remain corrupted indefinitely as it is the video encoder decision if and when to send a new keyframe (in webRTC this usually happens on a rapid scene change or when it is explicitly requested through PLI).
This commit fixes the jitter buffer issue and adds a simple send PLI mechanism to the jitter buffer implementation.