Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADPCM Loop Optimizations #105

Merged
merged 3 commits into from Oct 23, 2018

Conversation

mzgoddard
Copy link
Contributor

@mzgoddard mzgoddard commented Oct 22, 2018

Depends on #104

Optimize ADPCMSoundDecoder to make the best use of the decompression loop knowing the full size of the waveform.

We can precompute and reuse the 1424 possible deltas given 89 steps and 16 codes. Using those can save a lot of time compared to the 10000s to millions of samples some files will have.

Knowing the full size of the waveform we can unwrap the current loop to specifically place the block header decompression and then loop over remaining block samples. Unwrapping the if/else block we can quick decode each byte into its two samples.

Performance

Project - System Before (seconds) After #104 (seconds) After #104 + #105 (seconds) Improvement
130041250 - 2017 MBP 15" Chrome 0.74 0.191 0.076 873.68%
130041250 - 2017 MBP 15" Firefox 2.446 0.168 0.054 4429.62%
130041250 - 2017 SMS Chromebook Plus 2.331 0.399 0.168 1287.49%
130041250 - 2018 RPi B+ 40 2.597 1.592 2412.56%

Memory

This doesn't really have an effect on memory.

The extracted children can refer to their parent typed array views and
buffer to keep from needing to make memory copies that take a lot of
time to create and memory to use. As well some time can be saved by
using the same Uint8Array for reading Uint8 values and strings.
The number of samples in the ADPCM can be known once the data chunk is
extracted and the block size is known. From there the audio buffer can
be created and its channel data passed to the decompress method. A lot
of time is saved by writing to the channel data directly instead of
writing to one array, copying that to another array, and then finally
copying to the channel data. A surprising amount of time is saved by
using one getChannelData call instead of calling to store each sample.
There are 1424 possible deltas given 89 steps and 16 codes. We can
quickly compute those and reuse them to save time.

Knowing the exact size of the waveform we can re-author the
decompression loop to take advantage of that. We can place the block
header decompression first in the outer while loop and then place an
inner loop with the 2 samples per block decompression unwrapped. The
first sample reads from the stream and the second uses the other 4 bits.
Copy link
Collaborator

@ericrosenbaum ericrosenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just waitin' on Travis.

if (_deltaTable === null) {
const NUM_STEPS = STEP_TABLE.length;
const NUM_INDICES = INDEX_TABLE.length;
_deltaTable = new Array(NUM_STEPS * NUM_INDICES).fill(0);

This comment was marked as abuse.

This comment was marked as abuse.

constructor (
arrayBuffer, start = 0, end = arrayBuffer.byteLength,
{
_uint8View = new Uint8Array(arrayBuffer)

This comment was marked as abuse.

@kchadha
Copy link
Contributor

kchadha commented Oct 23, 2018

@ericrosenbaum, @mzgoddard, build is passing now.

@ericrosenbaum ericrosenbaum merged commit 0ee5fb9 into scratchfoundation:develop Oct 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants