consider non-interleaved audio buffers #128

lnihlen · 2020-05-27T02:02:12Z

The current realtime audio ingestion system imports interleaved data from PortAudio, meaning that each sample frame is assumed to be 2 floats (stereo-only for now) and is uploaded to the GPU as packed 2D vectors in a single image of one frame of samples wide and 1 pixel tall.

This is supposedly fast in that the CPU only has to copy the data out of the buffer and into GPU memory without having to do any per-sample manipulation. But it is inflexible in that certain channel counts won't work well across all GPUs. For instance the Vulkan Hardware Database shows that signed 32-bit floats are broadly supported at 100% for singles, doubles, and quads, but support for sampling from triples is out for almost three quarters of the hardware supported. This means that at most we could build a system that can ingest 1, 2, or 4 channels of audio only. Scaling beyond 4 channels would require uploading a separate texture image.

Furthermore it seems from this proposal on portaudio that interleaved data may not always be the way the underlying hardware is providing the data to portaudio, so the library may be interleaving the data manually.

An alternative would be to upload the samples de-interleaved as a series of single floats in an image that is 1 frame of samples wide and an arbitrary number of channels tall.

As Scintillator is a video synth it's arguable that audio import, for visualization, doesn't need to be as sophisticated or flexible as SuperCollider. But it's also arguable that Scintillator should be able to consume and do something useful with any audio data that SuperCollider is capable of producing. And it is certainly the case that SuperCollider can produce very high channel count audio output. So it follows that Scintillator should also be able to handle these as inputs.

It might be best to expose to the log what the native API on the other side of PortAudio is providing, or is capable of providing, and offer the Scintillator user the option of requesting either interleaved audio with a fixed channel support or de-interleaved audio with an arbitrary number of channels. Perhaps the system by default could choose the one requiring the least processing power. Or perhaps ffmpeg audio decode and audio output will obviate the choice.

This is probably also worth waiting for some user feedback on, so opinions welcome here!

lnihlen added this to To Do in Media Workstream via automation Jul 29, 2020

lnihlen added the enhancement New feature or request label Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider non-interleaved audio buffers #128

consider non-interleaved audio buffers #128

lnihlen commented May 27, 2020

consider non-interleaved audio buffers #128

consider non-interleaved audio buffers #128

Comments

lnihlen commented May 27, 2020