Streaming music with custom loop points and adjusted pitch? #18

juj · 2016-08-18T08:38:55Z

How can one stream music files while a) setting custom loop points for the loops to generate a seamless looping audio and b) adjusting (by possibly animating) the pitch of the playback?

Currently one can download the music file and .decodeAudioData() it fully, and then use AudioBufferSourceNode and its .loopStart, .loopEnd and .playbackRate attributes to specify the loop points and the pitch. However this consumes 100MB's + of memory and takes an awfully long time to perform the decoding because everything has to be uncompressed in memory for it to work, so it is not a viable solution.

If one uses a MediaElementAudioSourceNode, it is possible to pass the compressed music file directly for playback, and loop it, but looping is restricted to fully looping from begin to the end, which is not enough. It is either not possible to adjust pitch of the audio output (the .playbackRate performs pitch correction for that case, which is undesirable)

If AudioBufferSourceNode was able to be populated with compressed audio data, it would solve this use case. The Web Audio API would then decompress the AudioBufferSourceNode contents while it's playing it back in a streaming manner without having to decode the whole clip in advance. Would that sound like a feasible addition?

(Note that the interest here is in streamed decompression, not in streamed download - the compressed audio assets have been fully downloaded on the page prior)

rtoy · 2016-08-18T20:00:12Z

Just want to note that this makes sample-accurate playback fairly difficult. Suppose you have a compressed buffer representing, say, 1 min of audio, and then want to start playback at 50 ms from now with a loopstart and loopend at 50 sec and 59 sec.You would somehow have to locate where in the compressed file the 50 sec mark is, and decode that all in 50 ms or less. Don't know how feasible that would be.

Constantly decoding the file for playback wastes power too---a common MIPS vs memory tradeoff.

joeberkovitz · 2016-09-22T15:42:41Z

This is involved with improvement of decoding which has already been deferred to v.next.

bjornm · 2017-03-10T09:16:04Z

Just a comment since we've been prototyping this in pure javascript.

At least for ogg/vorbis files, seeking is a pretty fast operation, especially if you have the whole file in memory (a few ms?). Decompression is also fast (a few cpu %).

The trade-off between memory and MIPS is one that at least we (as developers) would like to be able to make. So a MediaElementAudioSourceNode-like node which is both seekable and precise - exposing start(num time) - would definitely add value and seems feasible in terms of performance and stability. You could always make seek a promise-based method so that the developer would be aware of that it might incur an extra delay and therefore require more ahead scheduling.

juj · 2017-09-01T12:10:54Z

I have created a test suite of different audio files and effects that currently are problematic. You can visit https://github.com/juj/audio_test_suite to find it, or http://clb.demon.fi/audio_test_suite/ to check it out live.

rtoy · 2017-09-01T16:58:41Z

Thanks for the tests.

I'm curious to understand what you expect to happen if you have an encoded audio file sampled at 8 kHz and the audio context sample rate is 44.1 kHz. You want the decoded audio buffer to have a sample rate of 8 kHz? And when I put that in an AudioBufferSourceNode, it gets magically upsampled to 44.1 kHz?

The different bit depths are pretty easy to deal with, except if you use copyToChannel or getChannelData. Not sure what should happen then.

padenot · 2017-09-01T17:06:31Z

Well per spec, it the AudioBuffer need to be resampled to the context rate per spec. This is easy to test: create an OfflineAudioContext with a rate of 8kHz, call decodeAudioData on this OfflineAudioContext, but play it back on the AudioContext.

We (gecko) have implemented lazy conversion from int16 to float32 for decodeAudioData, it's only converted when you do getChannelData, or by chunk when playing it back using AudioBufferSourceNode, and it halves the memory by two there.

rtoy · 2017-09-01T19:41:24Z

On Fri, Sep 1, 2017 at 10:06 AM, Paul Adenot ***@***.***> wrote: Well per spec, it the AudioBuffer need to be resampled to the context rate per spec. This is easy to test: create an OfflineAudioContext with a rate of 8kHz, call decodeAudioData on this OfflineAudioContext, but play it back on the AudioContext.

And this is also quite a nice workaround for getting the audio buffers to have the desired sample rate: Create an OfflineAudioContext with the desired sample rate, decode the file, and use it in the audio context.

…

We (gecko) have implemented lazy conversion from int16 to float32 for decodeAudioData, it's only converted when you do getChannelData, or by chunk when playing it back using AudioBufferSourceNode, and it halves the memory by two there. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/WebAudio/web-audio-api/issues/938#issuecomment-326634140>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAofPHO_gDcPbxXEoCEhW4zKi4d6Lrb_ks5seDmYgaJpZM4JnQlo> .

-- Ray

juj · 2017-09-02T09:42:49Z

You want the decoded audio buffer to have a sample rate of 8 kHz?

Yes, that would be good behavior.

And when I put that in an AudioBufferSourceNode, it gets magically upsampled to 44.1 kHz?

My thinking is that when the 8kHz audio buffer then is being played back on a context that is 44.1 kHz, the graph would upsample it during playback to match the context, but not upsample the input data, i.e. if I have 5 minutes of 8kHz audio, it should not get up front all resampled to a large 5-minute 44.1 kHz buffer when I add it to graph, but on the fly when being processed, to avoid memory usage from exploding.

And this is also quite a nice workaround for getting the audio buffers to have the desired sample rate: Create an OfflineAudioContext with the desired sample rate, decode the file, and use it in the audio context.

This is a nice idea, but it has the problem that one will need to first know what the sample rate of the source file is, since it does not allow to "give me whatever the input file was", so in practice one will need to pull in bits of decoder libraries that would at least be able to parse the headers of the files to be able to pull out the sampling rate, or have some "side channel" knowledge where one has somehow carried that information elsewhere (like in the filenames in the test suite).

I think overall the web needs something that has the amazing power that Web Audio API graph has, but also at the same time, is extremely mindful about memory usage, so if one had a 8Khz/8-bit samples input audio file, it'd be great to not have to take that to 44.1kHz/32-bit float. The most extreme example in the test suite is that the size of file 8bit_detective_8000hz_8kbs_mono_lame3.99.mp3 is 29664 bytes on disk, but to play it back, one will need 5544576 bytes of memory, a 186.91x blowup. Of course that is extremely exaggerating, but it would be well worth to support this case as well.

rtoy · 2017-09-02T16:29:54Z

On Sat, Sep 2, 2017 at 2:42 AM, juj ***@***.***> wrote: You want the decoded audio buffer to have a sample rate of 8 kHz? Yes, that would be good behavior. And when I put that in an AudioBufferSourceNode, it gets magically upsampled to 44.1 kHz? My thinking is that when the 8kHz audio buffer then is being played back on a context that is 44.1 kHz, the graph would upsample it during playback to match the context, but not upsample the input data, i.e. if I have 5 minutes of 8kHz audio, it should not get up front all resampled to a large 5-minute 44.1 kHz buffer when I add it to graph, but on the fly when being processed, to avoid memory usage from exploding.

Just want to add a bit of history. In the original webkit implementation, the internal decode function had a parameter to specify whether to resample or not. This was always set to resample. I think Chris Roger's final decision was that AudioBufferSource should be fast and resampling on the fly doesn't help. In addition the resampler in decodeAudioData is a very high quality sinc resampler suitable for all sample rates. We can be slow here because decodeAudioData is not part of the rendering graph. In webkit's AudioBufferSource, very simple linear interpolation is used for resampling. This is terrible if the sample rates differ a lot, but is very fast. Doing a good (or ok) quality resampler in the AudioBufferSource with dynamic loops and loop points and playbackRate is pretty hard to get right.

And this is also quite a nice workaround for getting the audio buffers to have the desired sample rate: Create an OfflineAudioContext with the desired sample rate, decode the file, and use it in the audio context. This is a nice idea, but it has the problem that one will need to first know what the sample rate of the source file is, since it does not allow to "give me whatever the input file was", so in

I was kind of assuming it was "your" application so you know exactly how you've encoded the files. If you're allowing files from any source, then, yeah, you have a problem.

practice one will need to pull in bits of decoder libraries that would at least be able to parse the headers of the files to be able to pull out the sampling rate, or have some "side channel" knowledge where one has somehow carried that information elsewhere (like in the filenames in the test suite). I think overall the web needs something that has the amazing power that Web Audio API graph has, but also at the same time, is extremely mindful about memory usage, so if one had a 8Khz/8-bit samples input audio file, it'd be great to not have to take that to 44.1kHz/32-bit float. The most extreme example in the test suite is that the size of file 8bit_detective_8000hz_8kbs_mono_lame3.99.mp3 is 29664 bytes on disk, but to play it back, one will need 5544576 bytes of memory, a 186.91x blowup. Of course that is extremely exaggerating, but it would be well worth to support this case as well.

Do you have an actual application that uses so much memory that it won't work on, say, a low end Android device with 512 MB of memory? I'd love to see such an application. An actual application is far more convincing to me than simple examples that illustrate the issue that we already knew existed.

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/WebAudio/web-audio-api/issues/938#issuecomment-326733986>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAofPOSjZMj_bdphAle6Hz-tZEtEPiu1ks5seSMZgaJpZM4JnQlo> .

-- Ray

juj · 2017-09-03T10:30:29Z

Here are some recent examples, in no particular order:

Dead Trigger 2: 271.21 MB (22.74%) of total application memory usage is wasted on uncompressing audio for looping
Total War Battles: KINGDOM: 110.54 MB (09.03%) of total application memory usage is wasted on uncompressing audio for looping
Angry Bots: 80.79 MB (16.05%) of total application memory usage is wasted on uncompressing audio for looping
AAaaa..!! for the Awesome: 548.24 MB (44.76%) of total application memory usage wasted on uncompressing audio for looping
StrategyGame: 59.57 MB (10.57%) of application memory spent on decompressing background music for seamless looping
EVERYDAYiPLAY Heroes of Paragon demo: 42.65 MB (03.95%) spent on Web Audio, and this is just one level demo with only one music file.
Ski Safari: 186.29 MB (21.53%) of total application memory wasted on uncompressed audio
AdVenture Capitali$t: In main menu already 69.76 MB (11.71%) spent on uncompressed audio
PlatformerGame: 60.19 MB (07.75%) spent on uncompressing Web Audio
Pretty much any HTML5 WebAudio-based game at y8.com that has music, e.g. Zombie Derby 2 is already at 95.29 MB (12.89%) of audio-related memory usage from the seamlessly looping music clip in the main menu alone.

In addition a number of games are using hacks to work around the issue. The two most popular approaches to avoid Web Audio memory usage explosion are:

Candy Crush Jelly Saga: Compiles in their own audio codec to WebAssembly and streams audio from there, but this causes stuttering on audio when browser GCs. Upcoming AudioWorklets+SharedArrayBuffer+WebAssembly combo will possibly mitigate this stuttering, but like mentioned in Remove AudioContext.decodeAudioData() from the web web-audio-api#1305, this will provide an easy solution only for Emscripten based applications, and will not fix up audio memory usage on the web in wide.
Zen Garden: Seamlessly looping background audio would have cost ~150MB of application memory, but it was just settled that "no looping audio on the web" and the audio track was adjusted to fade out and in at the end, and play back via <audio>, i.e. developers did not get the effect they wanted. Epic Games Epic Citadel demo also used this approach.

rtoy · 2017-09-05T15:36:56Z

Thanks for the links. This is the kind of information I was looking for.

joeberkovitz · 2017-11-07T23:12:29Z

This also has connections to streamed decoding as per WebAudio/web-audio-api#337

padenot · 2019-06-25T23:04:34Z

We're working on https://discourse.wicg.io/t/webcodecs-proposal/3662 which, along with AudioWorklet and SharedArrayBuffer, will provide everything there is needed to implement this, like a native developer would do it.

mdjp · 2019-09-17T00:59:14Z

Under consideration for V2, requires further engagement with developers.

padenot · 2020-06-15T14:50:13Z

Virtual F2F:

WebCodecs is a thing now (and being actively worked on at an implementation and standardization level), and pitch shifting is #14 (and needed for other things)
It's better to have a solution by way of composition of those two than to have something too rigid

mdjp transferred this issue from WebAudio/web-audio-api Sep 17, 2019

hoch added this to Under consideration in V2 Sep 17, 2019

padenot closed this as completed Jun 15, 2020

V2 automation moved this from Under consideration to Done Jun 15, 2020

juj mentioned this issue Sep 24, 2021

Decoding mp3/ogg/aac to fix Web Audio API .decodeAudioData() shortcomings? w3c/webcodecs#366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming music with custom loop points and adjusted pitch? #18

Streaming music with custom loop points and adjusted pitch? #18

juj commented Aug 18, 2016

rtoy commented Aug 18, 2016

joeberkovitz commented Sep 22, 2016

bjornm commented Mar 10, 2017

juj commented Sep 1, 2017

rtoy commented Sep 1, 2017

padenot commented Sep 1, 2017

rtoy commented Sep 1, 2017 via email

juj commented Sep 2, 2017

rtoy commented Sep 2, 2017 via email

juj commented Sep 3, 2017 •

edited

Loading

rtoy commented Sep 5, 2017

joeberkovitz commented Nov 7, 2017

padenot commented Jun 25, 2019

mdjp commented Sep 17, 2019

padenot commented Jun 15, 2020

Streaming music with custom loop points and adjusted pitch? #18

Streaming music with custom loop points and adjusted pitch? #18

Comments

juj commented Aug 18, 2016

rtoy commented Aug 18, 2016

joeberkovitz commented Sep 22, 2016

bjornm commented Mar 10, 2017

juj commented Sep 1, 2017

rtoy commented Sep 1, 2017

padenot commented Sep 1, 2017

rtoy commented Sep 1, 2017 via email

juj commented Sep 2, 2017

rtoy commented Sep 2, 2017 via email

juj commented Sep 3, 2017 • edited Loading

rtoy commented Sep 5, 2017

joeberkovitz commented Nov 7, 2017

padenot commented Jun 25, 2019

mdjp commented Sep 17, 2019

padenot commented Jun 15, 2020

juj commented Sep 3, 2017 •

edited

Loading