Skip to content

Decoding Audio

abudaan edited this page Feb 1, 2015 · 1 revision

#Decoding audio

Background

Before you can play audio files via the AudioContext, the audio files need to be parsed to the format that the AudioContext uses for internal processing. This format is non-interleaved 32-bit linear PCM. This means that every audio sample is a float value between -1.0 and 1.0 and that the samples are stored per channel.

Linear PCM, or LPCM is a form of PCM sampling, more information can be found on this wikipedia page.

You can convert an audio file to the required format by using WebAudioContext.decodeAudioBuffer(buffer). The buffer argument is the audio file in ArrayBuffer format. If you load the audio file via an XMLHttpRequest you have to set the response type to arraybuffer. The decoding process returns an instance of AudioBuffer.

An AudioBuffer stores the samples per channel in an Float32Array. You can easily access the ArrayBuffers of a specific channel with AudioBuffer.getChannelData(index). If your file is mono, the maximum channel index is 0. If your file is stereo channel index 0 will retrieve the samples of the left channel and channel index 1 will retrieve the samples of the right channel.

Compressed audio vs uncompressed audio

A 16-bit wav file stores the audio samples as interleaved 32-bit PCM samples. Interleaved means that the first sample is the first sample of the left channel, the second sample is the first sample of the right channel. Then the third sample is the second sample of the left channel and the fourth sample is second sample of the right channel, and so on. The samples are stored as values between -32,768 and 32,767.

Converting a wav file to the format that the AudioContext requires is quite straightforward; the samples need to be stored in a separate AudioBuffer for each channel, and the value of each sample needs to be remapped to a value between -1.0 and 1.0.

Since the data type of a sample and the number of samples in a wav files is exact the same as in an AudioBuffer, the file size of a wav file is about the same as the memory used by an AudioBuffer once it has been loaded and decoded in your browser.

Converting compressed formats like mp3 and ogg is a bit more complex; whereas a wav file already contains every sample in almost the required format, for mp3 and other compressed formats every sample literally needs to be decoded. For more information about decoding mp3 see this blog.

Therefor decoding compressed formats takes much longer than decoding uncompressed formats. In fact when you decode a wav file, no actual decoding takes place because a wav file actually is a slightly different organized kind of AudioBuffer.

In this example you can see how long it takes to decode a file in wav, ogg and mp3 format. As expected, wav decodes the fastest, then ogg, then mp3. Note that the time it takes to decode a compressed file is also dependent on the used encoder and the encoding settings.

You can find more information about audio and the AudioContext on this MDN page.

Clone this wiki locally