-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add global compressor to the master audio channel #1831
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far! Some comments and suggestion.
(Also see the suggestion to move to pure float)
Interesting that float vs double makes such a big difference. Lots of people claim that there's only the memory bandwidth difference on modern CPUs, but there's no difference otherwise pure processing-wise. In fact, some well-regarded DSP authors claim doubles are actually a lot faster than floats on Intel hardware at least. I'll definitely test it out on my machine and post the results. |
c76773c
to
ddf1faa
Compare
So I've done the float vs double comparison on my MacBook, and on this particular machine there isn't much difference. This is more or less in line with what I've read about double vs float calculations on modern CPUs in various places, but clearly, on your machine there is a significant measurable difference, and I'm really curious why that's the case. The basic consensus on StackOverflow seemed to be that on modern CPUs once the data is inside the core, there should be either no performance difference between calculations on doubles vs floats, or floats might be even slower because FPUs usually operate on the widest supported format natively, which is then doubles (again, we're talking about modern CPUs here). I saw people posting ARM Cortex measurements as well; it was basically the same thing, float and double calculations had the same speed (can't bother to look it up again, it was a while ago). It was also pointed out that memory bandwidth can be a serious limitation when moving massive amounts of data in and out of the CPU, so although performing calculations on doubles imposes no performance penalty, moving more data through the bus can definitely hamper performance overall. However, for real-time audio processing the amount of data to be moved is small change for a modern machine (compared to numerical calculations on huge datasets at max speeds), so unless you need to process many hundreds of audio streams, then it can of course add up. But again, it could matter for a low-end machine, such as the Pi 4. In any case, I'm not a DSP expert, so I'm just repeating what I've read 😄 For example, this guy is a very well regarded DSP guy who wrote one of the best ultra-high quality offline sample-rate converters, and this is what he says about the subject: https://github.com/avaneev/r8brain-free-src
Ultimately, happy to change it to floats because apparently the difference can matter on some CPUs, according to your results. Although in this particular case both version are fast enough anyway, but yeah, why not make it a bit faster if we can on some machines. So here are my results:
Edit by kcgen: there's an updated benchmark below (prior versions of this message are in the edit history). |
617c249
to
a638e83
Compare
Ooops, so the above measurements are for the debug build... Here are the numbers for the release build: Doubles
Floats
|
Very interesting! Yeah; those numbers are looking roughly the same. I pushed a new If anyone else wants to try to check this:
|
Results from i5-7400, 4 cores / 4 threads, 3.00 / 3.50 GHz
Thanks, @GranMinigun! |
Changed the compressor to operate on floats, as discussed. This is ready for the final review @kcgen |
0ebbff2
to
6c9eaee
Compare
ee2ca54
to
7db2824
Compare
Okay, so this is the final version, @kcgen. I tested the compressor behaviour with fixed attack, and it turns out it's perfectly fine for our purposes. I think the variable level-dependent attack time comes more into play in peak-detection mode (which I haven't ported over because we don't need it) when used in conjunction with short release times, e.g. when compressing drum tracks with lots of short transient spikes. But we're basically just using the compressor in RMS mode as an auto-leveler, and very short fixed attack times are actually preferable for those applications. So good catch — it was one of those cases when accidents lead to better end results 😎 |
14e7a2f
to
fc65f35
Compare
All comments down; merging! |
This implements #1743.
The compressor is applied to the master output as the final step before converting the float sample stream to 16-bit integers. Without going into audio-engineering territory much, it acts like the compressor you can enable on many AVR receivers and TVs that evens out the difference between quiet and loud sounds (sometimes they call this feature "night mode", so you can turn down the volume at night and still hear the quiet parts). Another example is the broadcast compressors used by radio channels to ensure an even level of sound so people can hear the quiet parts in noisy environments such as cars, and to even out the level-differences between different songs.
It's important to realise that because this is a fully automatic process, it cannot work wonders. The keyword here is damage mitigation -- it should tuck in overly loud signals very well into the normal 16-bit range instead of letting them clip, and it shouldn't affect normal loud audio that is just a little below the clipping range too much. But it will affect it a bit, this is unavoidable (for the record, anything louder than -3dB will be affected progressively as the volume gets louder). However, 90%+ of people won't notice anything about this, but will benefit from the automatic gain reduction on overly loud parts. The release is set relatively slow (5 seconds), but that's more like a "guideline" to the algorithm as it's effectively dynamic and program-dependent. So the volume will slowly creep back to normal levels after loud segments, and the slow release time ensures that audible "volume pumping" artifacts are minimised.
That's about the best we can do without training the users to become amateur audio engineers themselves, and requiring them to tweak the compressor settings for every single song in every single game 😎 For purists, I will add an option to disable the compressor, and I might tweak the settings a bit further later too. But those are small incremental tweaks; I think we should merge this is as soon as possible so people can play around with it and test the performance on a Raspberry, etc.
And now, some example audio!
Below is a comparison of the Dune intro & menu music with and without the compressor. It is important to use the floppy version if you want to reproduce my results, as the CD version scales the master volume back to 25% to avoid severe clipping (not by 25%, to 25% of the volume of the floppy version! So you can replicate this by setting
mixer master 25
in the floppy version.) Also, you'll only get clipping in the floppy version when using the Adlib Gold emulation -- it doesn't clip at all with regular OPL2/OPL3. That's because in Adlib Gold mode the game boosts the bass by 15dB via the onboard DSP, which is a lot! So make sure to setoplmode = opl3gold
when testing this.You can also go crazy and set the master volume to 200 or even 400! Yep, the compressor will just deal with it, it's that good 😎 You won't hear any severe distortion, but the volume changes will be like a rollercoaster ride sometimes... (this will never happen under normal circumstances, only when the user messes up the mixer settings).
Dune (Floppy version) - No compressor
dune-no-compressor.mp3.zip
Dune (Floppy version) - Compressor
dune-compressor.mp3.zip