Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add global compressor to the master audio channel #1831

Merged
merged 18 commits into from
Aug 9, 2022
Merged

Conversation

johnnovak
Copy link
Member

@johnnovak johnnovak commented Aug 3, 2022

This implements #1743.

The compressor is applied to the master output as the final step before converting the float sample stream to 16-bit integers. Without going into audio-engineering territory much, it acts like the compressor you can enable on many AVR receivers and TVs that evens out the difference between quiet and loud sounds (sometimes they call this feature "night mode", so you can turn down the volume at night and still hear the quiet parts). Another example is the broadcast compressors used by radio channels to ensure an even level of sound so people can hear the quiet parts in noisy environments such as cars, and to even out the level-differences between different songs.

It's important to realise that because this is a fully automatic process, it cannot work wonders. The keyword here is damage mitigation -- it should tuck in overly loud signals very well into the normal 16-bit range instead of letting them clip, and it shouldn't affect normal loud audio that is just a little below the clipping range too much. But it will affect it a bit, this is unavoidable (for the record, anything louder than -3dB will be affected progressively as the volume gets louder). However, 90%+ of people won't notice anything about this, but will benefit from the automatic gain reduction on overly loud parts. The release is set relatively slow (5 seconds), but that's more like a "guideline" to the algorithm as it's effectively dynamic and program-dependent. So the volume will slowly creep back to normal levels after loud segments, and the slow release time ensures that audible "volume pumping" artifacts are minimised.

That's about the best we can do without training the users to become amateur audio engineers themselves, and requiring them to tweak the compressor settings for every single song in every single game 😎 For purists, I will add an option to disable the compressor, and I might tweak the settings a bit further later too. But those are small incremental tweaks; I think we should merge this is as soon as possible so people can play around with it and test the performance on a Raspberry, etc.


And now, some example audio!

Below is a comparison of the Dune intro & menu music with and without the compressor. It is important to use the floppy version if you want to reproduce my results, as the CD version scales the master volume back to 25% to avoid severe clipping (not by 25%, to 25% of the volume of the floppy version! So you can replicate this by setting mixer master 25 in the floppy version.) Also, you'll only get clipping in the floppy version when using the Adlib Gold emulation -- it doesn't clip at all with regular OPL2/OPL3. That's because in Adlib Gold mode the game boosts the bass by 15dB via the onboard DSP, which is a lot! So make sure to set oplmode = opl3gold when testing this.

You can also go crazy and set the master volume to 200 or even 400! Yep, the compressor will just deal with it, it's that good 😎 You won't hear any severe distortion, but the volume changes will be like a rollercoaster ride sometimes... (this will never happen under normal circumstances, only when the user messes up the mixer settings).

Dune (Floppy version) - No compressor

dune-no-compressor

dune-no-compressor.mp3.zip

Dune (Floppy version) - Compressor

dune-compressor

dune-compressor.mp3.zip

@kcgen kcgen added the enhancement New feature or enhancement of existing features label Aug 3, 2022
@kcgen kcgen added this to In progress in 0.79 release via automation Aug 3, 2022
Copy link
Member

@kcgen kcgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far! Some comments and suggestion.
(Also see the suggestion to move to pure float)

src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.h Outdated Show resolved Hide resolved
src/hardware/compressor.h Outdated Show resolved Hide resolved
src/hardware/mixer.cpp Outdated Show resolved Hide resolved
src/hardware/mixer.cpp Outdated Show resolved Hide resolved
vs/dosbox.vcxproj.filters Show resolved Hide resolved
@johnnovak
Copy link
Member Author

Curious what kind of numbers you get on your side using that branch.

Interesting that float vs double makes such a big difference. Lots of people claim that there's only the memory bandwidth difference on modern CPUs, but there's no difference otherwise pure processing-wise. In fact, some well-regarded DSP authors claim doubles are actually a lot faster than floats on Intel hardware at least. I'll definitely test it out on my machine and post the results.

@johnnovak
Copy link
Member Author

johnnovak commented Aug 6, 2022

So I've done the float vs double comparison on my MacBook, and on this particular machine there isn't much difference.

This is more or less in line with what I've read about double vs float calculations on modern CPUs in various places, but clearly, on your machine there is a significant measurable difference, and I'm really curious why that's the case. The basic consensus on StackOverflow seemed to be that on modern CPUs once the data is inside the core, there should be either no performance difference between calculations on doubles vs floats, or floats might be even slower because FPUs usually operate on the widest supported format natively, which is then doubles (again, we're talking about modern CPUs here). I saw people posting ARM Cortex measurements as well; it was basically the same thing, float and double calculations had the same speed (can't bother to look it up again, it was a while ago).

It was also pointed out that memory bandwidth can be a serious limitation when moving massive amounts of data in and out of the CPU, so although performing calculations on doubles imposes no performance penalty, moving more data through the bus can definitely hamper performance overall. However, for real-time audio processing the amount of data to be moved is small change for a modern machine (compared to numerical calculations on huge datasets at max speeds), so unless you need to process many hundreds of audio streams, then it can of course add up. But again, it could matter for a low-end machine, such as the Pi 4.

In any case, I'm not a DSP expert, so I'm just repeating what I've read 😄

For example, this guy is a very well regarded DSP guy who wrote one of the best ultra-high quality offline sample-rate converters, and this is what he says about the subject:

https://github.com/avaneev/r8brain-free-src

No explicit code for the "float" type is present in this library, because as practice has shown the "float"-based code performs considerably slower on a modern processor, at least in this library.

Ultimately, happy to change it to floats because apparently the difference can matter on some CPUs, according to your results. Although in this particular case both version are fast enough anyway, but yeah, why not make it a bit faster if we can on some machines.


So here are my results:

% sysctl -a | grep machdep.cpu.brand
machdep.cpu.brand: 0
machdep.cpu.brand_string: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

Edit by kcgen: there's an updated benchmark below (prior versions of this message are in the edit history).

@johnnovak
Copy link
Member Author

Ooops, so the above measurements are for the debug build... Here are the numbers for the release build:

Doubles

2022-08-06 17:29:15.087 | compressor latency:   1429 us
2022-08-06 17:29:15.088 | compressor latency:   1438 us
2022-08-06 17:29:15.089 | compressor latency:   1277 us
2022-08-06 17:29:15.090 | compressor latency:   1237 us
2022-08-06 17:29:15.091 | compressor latency:   1243 us
2022-08-06 17:29:15.092 | compressor latency:   1378 us
2022-08-06 17:29:15.093 | compressor latency:   1450 us
2022-08-06 17:29:15.094 | compressor latency:   1256 us
2022-08-06 17:29:15.095 | compressor latency:   1444 us
2022-08-06 17:29:15.096 | compressor latency:   1417 us
2022-08-06 17:29:15.097 | compressor latency:   1290 us
2022-08-06 17:29:15.098 | compressor latency:   1234 us
2022-08-06 17:29:15.099 | compressor latency:   1243 us
2022-08-06 17:29:15.100 | compressor latency:   1460 us
2022-08-06 17:29:15.101 | compressor latency:   1446 us
2022-08-06 17:29:15.102 | compressor latency:   1443 us
2022-08-06 17:29:15.103 | compressor latency:   1361 us
2022-08-06 17:29:15.104 | compressor latency:   1411 us
2022-08-06 17:29:15.105 | compressor latency:   1289 us
2022-08-06 17:29:15.106 | compressor latency:   1429 us
2022-08-06 17:29:15.107 | compressor latency:   1282 us
2022-08-06 17:29:15.108 | compressor latency:   1440 us
2022-08-06 17:29:15.109 | compressor latency:   1638 us

Floats

2022-08-06 17:29:46.736 | compressor latency:   1410 us
2022-08-06 17:29:46.737 | compressor latency:   1264 us
2022-08-06 17:29:46.738 | compressor latency:   1283 us
2022-08-06 17:29:46.739 | compressor latency:   1417 us
2022-08-06 17:29:46.740 | compressor latency:   1393 us
2022-08-06 17:29:46.741 | compressor latency:   1513 us
2022-08-06 17:29:46.742 | compressor latency:   1548 us
2022-08-06 17:29:46.743 | compressor latency:   1409 us
2022-08-06 17:29:46.744 | compressor latency:   1523 us
2022-08-06 17:29:46.745 | compressor latency:   1416 us
2022-08-06 17:29:46.746 | compressor latency:   1401 us
2022-08-06 17:29:46.747 | compressor latency:   1425 us
2022-08-06 17:29:46.748 | compressor latency:   1374 us
2022-08-06 17:29:46.749 | compressor latency:   1417 us
2022-08-06 17:29:46.750 | compressor latency:   1406 us
2022-08-06 17:29:46.751 | compressor latency:   1332 us
2022-08-06 17:29:46.752 | compressor latency:   1361 us
2022-08-06 17:29:46.753 | compressor latency:   1309 us
2022-08-06 17:29:46.754 | compressor latency:   1262 us
2022-08-06 17:29:46.755 | compressor latency:   1261 us
2022-08-06 17:29:46.756 | compressor latency:   1300 us
2022-08-06 17:29:46.757 | compressor latency:   1263 us
2022-08-06 17:29:46.758 | compressor latency:   1415 us
2022-08-06 17:29:46.759 | compressor latency:   1340 us
2022-08-06 17:29:46.760 | compressor latency:   1399 us
2022-08-06 17:29:46.761 | compressor latency:   1398 us

@kcgen
Copy link
Member

kcgen commented Aug 6, 2022

Here are the numbers for the release build

Very interesting! Yeah; those numbers are looking roughly the same.

I pushed a new kc/compressor-float-compare-2 branch based on your updates, and adjusted the benchmark to flip-flop every couple seconds between the compressors.

If anyone else wants to try to check this:

  1. Checkout the comparison branch and build it:
    git fetch
    git checkout remotes/origin/kc/compressor-float-compare-2 -f
    meson setup build/release
    meson compile -C build/release
    rel=$PWD/build/release/dosbox
  2. Download and unzip compressor-bench.zip, and run the release inside it:
    unzip compressor-bench.zip
    cd compressor-bench
    $rel
  3. autotype will progress the game, sit back and watch the console log.
  4. Ignore audible volume changes - this is expected because the benchmark flip-flops between compressors.

@kcgen
Copy link
Member

kcgen commented Aug 6, 2022

Results from Linux, i7-6700K CPU @ 4.00GHz, gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1):

2022-08-06_09-50_1

Results from macOS, ARM64 M1 mini, clang version 13.0.0 (clang-1300.0.29.30):

2022-08-06_09-59

@kcgen
Copy link
Member

kcgen commented Aug 6, 2022

Results from i5-7400, 4 cores / 4 threads, 3.00 / 3.50 GHz

2022-08-06 23:12:24.566 | float  compressor used  8.29 ms
2022-08-06 23:12:29.566 | double compressor used 10.87 ms, 31.1% slower
2022-08-06 23:12:34.566 | float  compressor used  8.17 ms
2022-08-06 23:12:39.566 | double compressor used 10.97 ms, 34.3% slower
2022-08-06 23:12:44.566 | float  compressor used  8.13 ms
2022-08-06 23:12:49.566 | double compressor used 10.99 ms, 35.2% slower
2022-08-06 23:12:54.566 | float  compressor used  8.20 ms
2022-08-06 23:12:59.566 | double compressor used 10.90 ms, 32.9% slower
2022-08-06 23:13:04.566 | float  compressor used  8.15 ms
2022-08-06 23:13:09.566 | double compressor used 10.86 ms, 33.3% slower
2022-08-06 23:13:14.566 | float  compressor used  8.29 ms
2022-08-06 23:13:19.566 | double compressor used 10.93 ms, 31.9% slower
2022-08-06 23:13:24.566 | float  compressor used  9.42 ms
2022-08-06 23:13:29.566 | double compressor used 12.65 ms, 34.3% slower
2022-08-06 23:13:34.566 | float  compressor used  9.29 ms
2022-08-06 23:13:39.566 | double compressor used 11.39 ms, 22.5% slower
2022-08-06 23:13:44.566 | float  compressor used  8.07 ms
2022-08-06 23:13:49.566 | double compressor used 10.78 ms, 33.6% slower
2022-08-06 23:13:54.566 | float  compressor used  8.15 ms
2022-08-06 23:13:59.566 | double compressor used 10.90 ms, 33.7% slower
2022-08-06 23:14:04.566 | float  compressor used  8.17 ms
2022-08-06 23:14:09.566 | double compressor used 11.89 ms, 45.5% slower
2022-08-06 23:14:14.566 | float  compressor used  9.24 ms
2022-08-06 23:14:19.566 | double compressor used 12.03 ms, 30.1% slower

Thanks, @GranMinigun!

@johnnovak
Copy link
Member Author

On my MacBook the results are very inconclusive...

image

image

MacPorts clang

image

image

System default gcc

image

image

@johnnovak
Copy link
Member Author

Changed the compressor to operate on floats, as discussed. This is ready for the final review @kcgen :shipit:

@johnnovak
Copy link
Member Author

johnnovak commented Aug 7, 2022

...and just for completeness' sake, these are the benchmark results on my Windows 10 box (MSVC 2019, i7 4790k 4.4GHz)

Floats are the clear winner here.

image

@johnnovak johnnovak force-pushed the jn/master-compressor branch 2 times, most recently from 0ebbff2 to 6c9eaee Compare August 7, 2022 06:58
include/mixer.h Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
@kcgen
Copy link
Member

kcgen commented Aug 7, 2022

Added some comments; coming along really nicely!

With this being entirely new code, suggest adding the narrowing checking up top in the cpp:

#include "checks.h"

CHECK_NARROWING();

This helped reveal more conversions to and from doubles:

2022-08-07_10-06

@johnnovak
Copy link
Member Author

johnnovak commented Aug 9, 2022

Okay, so this is the final version, @kcgen. I tested the compressor behaviour with fixed attack, and it turns out it's perfectly fine for our purposes. I think the variable level-dependent attack time comes more into play in peak-detection mode (which I haven't ported over because we don't need it) when used in conjunction with short release times, e.g. when compressing drum tracks with lots of short transient spikes. But we're basically just using the compressor in RMS mode as an auto-leveler, and very short fixed attack times are actually preferable for those applications.

So good catch — it was one of those cases when accidents lead to better end results 😎

src/hardware/compressor.cpp Outdated Show resolved Hide resolved
src/hardware/compressor.cpp Outdated Show resolved Hide resolved
@kcgen
Copy link
Member

kcgen commented Aug 9, 2022

All comments down; merging!
Thanks, @johnnovak.

@kcgen kcgen merged commit 4b736f0 into main Aug 9, 2022
0.79 release automation moved this from In progress to Done Aug 9, 2022
@kcgen kcgen deleted the jn/master-compressor branch August 14, 2022 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or enhancement of existing features
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants