New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance optimization for _split_bitstream
#564
Performance optimization for _split_bitstream
#564
Conversation
I noticed a lot of time was spent calling Here's with the optimiazation: Cumulative time per call shows an improvement of 42%! I think this is the most relevant metric since it includes calls to sub functions.
But total time is also improved by 30%
To me this indicates not that |
Not satisfied with a 40% improvement I dove deeper and got to a 2+ order of magnitude speed increase 🚀 😄 Apparently After the first commit (roughtly the same as the second pic in the first comment): 0.004232s cumulative per call Now with the second commit: 0.00002629s cumulative per call (and wayyyy down in the list of functions in the profile) Based on the original measurement of 0.007372s cumulative per call, that's a 99.64338% improvement or 280x faster! We can squash these commits together when we merge, but I think they're interesting on their own to see the improvement process. |
You're modifying the behavior of the code here. In H264, NAL unit can start by either
should read:
Nice improvement by the way. EDIT: Nevermind, I misread the patch, it should be safe the way you've written it. |
Haha yeah reading the old code it was really hard to understand the cases it was handling. The resulting code is simple but it's the result of a lot of time getting confused then clarifying :) Thanks for thinking about it! I also went ahead and fixed up the test to actually check if the right packet is output (not just the right number of packets). Looks like it was using literals wrong anyways ( python -m unittest tests.test_h264.H264Test.test_split_bitstream
Thanks!! I'm running this on a Raspberry Pi 3 1.2 and this change alone took my CPU usage from ~80% of a core to 20% when passing through raw 1080p h264 data. It was extremely satisfying. |
This is awesome, thanks for the very detailed description, more readable code and improved tests! |
14a3803
to
6a9b4ef
Compare
I fixed the linter error, but the test suite fails. Could you check whether it's the code or the test that's wrong? |
@rprata @johnboiles any chance of fixing the failing test (or the code)? |
Yeah I can just might be a couple days
…On Wed, Dec 8, 2021 at 3:13 AM Jeremy Lainé ***@***.***> wrote:
@rprata <https://github.com/rprata> @johnboiles
<https://github.com/johnboiles> any chance of fixing the failing test (or
the code)?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#564 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABVN7EHKNVKVQAZ3QY7473UP44VVANCNFSM5DWJZDCA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@johnboiles any news on this, I'd love to merge it? |
Weird, I must have made that last part of the test without actually running it. My bad. Should be good to go now. |
Bump |
Hm, the linter error is your fault, the test against appr.tc is not.. |
Is the apprtc test against the live server? Because Google decided to take it down. |
Yeah it was. I've put together a PR which rips out anything related to AppRTC in #623 |
fab1a95
to
ffeabce
Compare
Codecov Report
@@ Coverage Diff @@
## main #564 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 31 31
Lines 5675 5623 -52
=========================================
- Hits 5675 5623 -52
Continue to review full report at Codecov.
|
OK well we're almost but not quite there: there is still a branch of code which has no corresponding unit test. Could you fix this please? |
ffeabce
to
acfa341
Compare
I have added the missing unit test, and am merging the PR, thanks so much! |
In profiling performance of #559, I noticed that a ton of time is spent in
_split_bitstream
. So I set out to speed it up. In my test case (passing through un-transcoded h264 1080p), it's now 280x faster than it was. I think it's also a lot easier to read.Here's what I'm using to profile