-
-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Issue
As of v0.4.0, exporting seems to take about:
- 10 seconds for a very simple test file with about 1,000 notes (
test.py
, included in repository); - 467 seconds (almost 8 minutes!) for the Note Block Megacollab file with 250k+ notes.
See screenshots below for a snakeviz
profiling graph for these two operations (the .prof
files out of cProfile
are also attached here: nbswave_profile.zip):
This can be made a heck lot better.
Through the above screenshots, you can see that, when there aren't many notes to place, most of the time is spent loading the sound files. And, when the bulk of the operation becomes placing notes, a lot of time is spent in the audio manipulation operations, particularly on panning and volume (which, as we'll see, are simply array multiplications). This indicates that there are potential optimizations to make both in loading sounds, as well as on the mixing steps themselves.
Reason
Looking at jiaaro/pydub#725, many operations in pydub
are implemented using the now deprecated, to-be-removed audioop
module. Although it requires no external dependencies, it's extremely inefficient -- and, no wonder, takes up most of the export time.
nbswave
already bypasses pydub
on the mixing implementation -- we implement our own here using numpy
operations since it's a lot more efficient than the alternative implemented by pydub
(see my 2021 issue about this: jiaaro/pydub#550)
The audio engine implementation done for the future Python NBS rewrite has also shown that many operations nbswave
relies on are really slow in pydub
. As such, the library was entirely replaced in the audio module with other tools. In the next section, we'll discuss those implementations briefly and how they could be brought here to make the export performance much better. Most of them leverage numpy
, which is already a dependency of this package. If we can rely on it enough to bypass pydub
operations, it's possible to even remove it completely from the dependencies of nbswave
.
Optimizations to make
Loading sounds
- Current solution:
pydub.AudioSegment.from_file
- Proposed solution:
soundfile
package - Reason: The former launches a
ffmpeg
subprocess and takes seconds, while the latter callslibsoundfile
via CFFI, which is capable of loading all sounds in a fraction of a second. Implemented here.
Volume
- Current solution:
pydub.AudioSegment.apply_gain
->audioop.mul
- Proposed solution:
numpy
- Reason: One array multiplication with
numpy
does the trick. Implemented here.
Panning
- Current solution:
pydub.AudioSegment.pan
->audioop.tostereo
andaudioop.mul
- Proposed solution:
numpy
- Reason: Requires two array slice multiplications, one for each channel. It's really easy to calculate the gain boost and cut of each channel from the panning value; we've implemented it here.
Pitch
- Current solution:
pydub.AudioSegment._spawn
->audioop.ratecv
- Proposed solution:
libsamplerate
- Reason: There are entire libraries dedicated to resampling audio while retaining quality, some with the goal of real-time processing (e.g. OpenAL); others not (e.g. librosa etc.). But
audioop
is miserable at this.
This article presents a comparison between a few of them. In my own research, I've concluded that resampy
and samplerate
excel at this. resampy
uses scipy
and numba
to accelerate processing, while samplerate
uses the widely-known "Secret Rabbit Code", implemented in C++, using pybind11 to interface with it directly (meaning: it is FAST). There's also librosa
with its resample
function; though its overhead is much larger; and scipy.signal.resample
, but I'd rather not include the entirety of scipy
to use one function out of it :D
Here is an implementation using libsamplerate
, which should be ported here. The implementation prior to this commit used the real-time API to process slices of each playing sound on-demand, but our implementation here doesn't need this -- it's literally one function call, no callbacks or any of that monstrosity.
Order of operations
When this package was made, it was assumed that resampling (necessary to apply pitch) would be the most computationally-expensive operation, since it requires running costly signal interpolation filters.
That would most likely be true if the other operations (panning and pitch) were optimized as much as they could, since they consist entirely of basic array multiplications -- but in its current state, they aren't. To take advantage of this (non-)fact, the implementation applies pitch (resampling) first, and then caches the result to reuse it when applying panning and velocity. Since they are simple multiplication operations, they aren't expected to take long; alas, here we are.
Here's the bit code that does this:
Lines 155 to 209 in 8b6f4a1
last_ins = None | |
last_key = None | |
last_vol = None | |
last_pan = None | |
for note in sorted_notes: | |
ins = note.instrument | |
key = note.key | |
vol = note.velocity | |
pan = note.panning | |
if ins != last_ins: | |
last_key = None | |
last_vol = None | |
last_pan = None | |
try: | |
sound1 = self._instruments[note.instrument] | |
except KeyError: # Sound file missing | |
if not ignore_missing_instruments: | |
custom_ins_id = ins - self._song.header.default_instruments | |
instrument_data = self._song.instruments[custom_ins_id] | |
ins_name = instrument_data.name | |
ins_file = instrument_data.file | |
raise MissingInstrumentException( | |
f"The sound file for instrument {ins_name} was not found: {ins_file}" | |
) | |
else: | |
continue | |
if sound1 is None: # Sound file not assigned | |
continue | |
sound1 = audio.sync(sound1) | |
if key != last_key: | |
last_vol = None | |
last_pan = None | |
pitch = audio.key_to_pitch(key) | |
sound2 = audio.change_speed(sound1, pitch) | |
if vol != last_vol: | |
last_pan = None | |
gain = audio.vol_to_gain(vol) | |
sound3 = sound2.apply_gain(gain) | |
if pan != last_pan: | |
sound4 = sound3.pan(pan) | |
sound = sound4 | |
last_ins = ins | |
last_key = key | |
last_vol = vol | |
last_pan = pan |
So the slowness of the panning and gain functions are amplified by this design decision. After implementing the other optimizations, it's wise to check if the avoidances are working as intended and really reducing the exported time (as opposed to applying all operations to all notes). Although, I believe its potential will really shine when resampling becomes the most costly operation, as originally expected.
Summary
All of the operations to be replaced were already implemented in a past version of the NewNBS audio engine, before OpenAL was used. Their respective source code was presented here in each section, so it's only a matter of bringing the implementations here.
Finally, here's the entire history of the audio.py
module -- it's so precious to see how many iterations we've gone through to just land on OpenAL at the end!! The good thing is, we can use everything we learned there to make audio processing more efficient here, so it's a win-win :)
With these implementations, I estimate nbswave
can export up to 60–80% faster than it can now. :)
### Tasks
- [x] Optimize pitch (resampling) via `samplerate` package
- [x] Optimize panning with `numpy`
- [x] Optimize volume (gain) with `numpy`
- [x] Optimize sound loading via `soundfile` package
- [x] Consider removing `pydub` as project dependency