Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

screenshare: add rap processing for screen streams with audio (complements #11622) #11626

Merged
merged 9 commits into from Mar 24, 2021

Conversation

prlanzarin
Copy link
Member

@prlanzarin prlanzarin commented Mar 11, 2021

What does this PR do?

Lifts @germanocaumo's (Mconf) work on making RaP correctly process raw screen sharing webm files with audio tracks in them.
Complements #11622 by adding the full recording capability to it.

Closes Issue(s)

#8632 (combined with #11622)

More

A few things about this:

  • Uses libopus to handle the decoding of the webm audio tracks. I'm almost certain it is built with our 2.3 ffmpeg package?
  • There's a resample in the audio track processing to correct any stream gaps that the raw might have and thus alleviate A/V desync on lossy streams
  • The code only kicks in if the webm has an audio track in it, otherwise business as usual
  • The audio track is mixed with the full audio recording file at the end

use libopus decoder and encoder, its better than built-in ffmpeg/flac
don't mix screenshare audio with mics, was generating desync with bad audio segments, encode it together with video file (TODO: needs adjustments in playback)
@kepstin
Copy link
Contributor

kepstin commented Mar 12, 2021

Hmm. I don't really want to do the audio mixing with an extra step after rendering like this - instead, the render function should be modified to allow multiple parallel audio streams in the edl that it mixes together during the render as a single step. This will speed it up, and also remove a generation of lossy encoding. This would be analogous to how the video rendering code supports multiple "areas".

There's no reason for there to be a separate internal audio format for screensharing audio. We should probably just switch everything to opus.

There's some complications about the volume levels when using amix like this - by default it attenuates both audio sources when mixing, so the result will have noticeably lower volume in the sections when only one of the two audio sources was active. (The ability to disable that function is only present in unreleased git ffmpeg)

@@ -155,6 +185,7 @@ def self.render(edl, output_basename)
if audioinfo[input[:filename]][:format][:format_name] == 'wav'
ffmpeg_cmd += ['-ignore_length', '1']
end
screenshare ? ffmpeg_cmd += ['-c:a', 'libopus'] : nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using the libopus decoder here instead of ffmpeg's builtin opus decoder? Have you had any specific issues with the builtin decoder causing problems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Higher frequency of A/V de-sync occurrences and a bit worse handling of lossy raw streams.
Don't have any data on that anymore, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm whether libopus is enabled in the ffmpeg package 2.3 is using?

Copy link
Contributor

@kepstin kepstin Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're having problems like that with the internal opus decoder, we should probably switch decoding all opus to use libopus. In any case, selecting the libopus decoder needs to be made conditional on the audioinfo saying that the codec is 'opus'.

I added the libopus decoder to the ffmpeg build a while back, search for "Enabled decoders" in the build log to see what's there: https://launchpadlibrarian.net/500061740/buildlog_ubuntu-bionic-amd64.ffmpeg_7%3A4.2.4-1ubuntu0.1bbb2~18.04_BUILDING.txt.gz

@@ -22,8 +22,11 @@ module EDL
module Audio
FFMPEG_AEVALSRC = "aevalsrc=s=48000:c=stereo:exprs=0|0"
FFMPEG_AFORMAT = "aformat=sample_fmts=s16:sample_rates=48000:channel_layouts=stereo"
FFMPEG_AFORMAT_SCREENSHARE = "aresample=async=1000,aformat=sample_fmts=s16:sample_rates=48000:channel_layouts=stereo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it a bit, we should probably apply the aresample=async=XXX filter to all of the audio inputs. It won't hurt anything and who knows, it might help improve audio sync in a few cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might help improve audio sync in a few cases

Yeah. While I didn't implement this, I suggested the resample to try and tackle desync issues @germanocaumo was dealing with on streams riddled with gaps due to simulated packet drops. It did improve it by a large margin. So might be a good idea to put it in the other inputs as well yes.

I'd do it in a separate PR if possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this PR to use the same audio format for all audio, there's no reason to special-case the screenshare audio.

kepstin added a commit to kepstin/bigbluebutton that referenced this pull request Mar 18, 2021
This incorporates only the audio desync related changes from bigbluebutton#11626
* Add the aresample filter with async option to fill in timestamp gaps
* Use the libopus decoder for opus audio instead of ffmpeg's builtin
  decoder
@github-actions
Copy link

This pull request has conflicts ☹
Please resolve those so we can review the pull request.
Thanks.

inputs.each do |input|
ffmpeg_cmd += ['-i', input]
end
ffmpeg_cmd += ['-filter_complex', "amix"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explicitly set the number of inputs to the amix filter here, in case a number of inputs ≠ 2 is provided.

Suggested change
ffmpeg_cmd += ['-filter_complex', "amix"]
ffmpeg_cmd += ['-filter_complex', "amix=inputs=#{inputs.length}"]

:audio => nil
}

events.xpath('/recording/event[@module="Deskshare" or (@module="bbb-webrtc-sfu" and (@eventname="StartWebRTCDesktopShareEvent" or @eventname="StopWebRTCDesktopShareEvent"))]').each do |event|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the "Deskshare" module (I think that's the old java screensharing?) can ever have audio, so this should only look at the WebRTC events.

@prlanzarin prlanzarin marked this pull request as ready for review March 24, 2021 16:55
@antobinary antobinary merged commit a950c9a into bigbluebutton:develop Mar 24, 2021
@prlanzarin prlanzarin mentioned this pull request Apr 2, 2021
@prlanzarin prlanzarin deleted the u23-recsa branch October 24, 2021 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants