Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AudioBufferSourceNode: How to interpolate after last sample? #2032

Closed
collares opened this issue Aug 20, 2019 · 12 comments
Closed

AudioBufferSourceNode: How to interpolate after last sample? #2032

collares opened this issue Aug 20, 2019 · 12 comments

Comments

@collares
Copy link

Describe the issue
The spec requires interpolation for playhead positions not corresponding to sampled times. Therefore, after each sample, there is an interval of duration 1/buffer.sampleRate corresponding to valid playhead positions which must be interpolated. This is also true for the interval after the last sample, but the spec doesn't specify how to produce the interpolated values for this region (since there is no next point to interpolate with).

This affects existing WPT tests: buffer-resampling.html seems to assume that the value of the last sample should be used for the whole interval, but another option is to interpolate with silence.

Where Is It
https://webaudio.github.io/web-audio-api/#playback-AudioBufferSourceNode -- more precisely, "Sub-sample start offsets or loop points may require additional interpolation between sample frames" does not cover the last interval described above, because it is not between sample frames.

Additional Information
This is not related to looping, and in fact only applies when looping is disabled.

@rtoy
Copy link
Member

rtoy commented Aug 22, 2019

Teleconf: Firefox and Chrome pass the buffer-resampling test, so we should spec that behavior

@hoch hoch moved this from Untriaged to Ready for Editing in V1 Aug 22, 2019
@karlt
Copy link
Contributor

karlt commented Aug 22, 2019

This affects existing WPT tests: buffer-resampling.html seems to assume that the value of the last sample should be used for the whole interval, but another option is to interpolate with silence.

buffer-resampling.html doesn't expect that the value of the last sample is used for the whole subsequent interval.

It expects that interpolation after the last sample of one buffer is consistent with the interpolation before the first sample of the subsequent (adjacent) buffer.

Interpolation after the last sample should assume that the imaginary next sample in the buffer is zero, just like interpolation before the first sample should assume that the imaginary previous sample in the buffer was zero. i.e. it would be correct to interpolate with silence.

@collares
Copy link
Author

collares commented Aug 22, 2019

I understand where your intuition comes from, but there is an inherent asymmetry: Whenever playback starts, it starts at least from the first sample in the buffer (maybe a bit later if the start time is subsample-accurate). Therefore, there is no situation in which we interpolate with an imaginary previous sample if we are linearly interpolating.

But the plot thickens! I just implemented "use the last sample for the whole subsequent interval" in Servo and this does not make the test pass:

  • Interpolating with silence, I get 10 wrong samples for buffer-resampling.html. The SNR is 20.05dB, below the 37.17dB threshold.
  • Using the last sample value for the interval, I get 5 wrong samples. The SNR is 32.76dB, below the 37.17dB threshold.

Apparently Blink extrapolates from the last two samples (this behavior was added along with buffer-resampling.html in [0]), and Firefox uses an interpolation algorithm that is better than linear (via libspeex). So buffer-resampling.html seems to require a better algorithm for the last interval.

[0] https://chromium.googlesource.com/chromium/src.git/+/feaba58ccedc657f5d4ee23c5b11825de876bf0f%5E%21/#F1

@karlt
Copy link
Contributor

karlt commented Aug 23, 2019

That asymmetry is a problem when upsampling as in this test (and also if applied to non-sample-aligned start times). The noise threshold of 0.09 seems fairly high, so I wonder if that is involved.

In buffer-resampling.html, the buffer has a sample rate of 8000 Hz while the context is rendering at 48000 Hz.

The first sample in the buffer represents a sinc function with frequency 8000Hz. If that is centered on the start time, then there would be a few significantly non-zero 48000 Hz rendering samples before the start time.
An interpolation that chopped off those samples would not be ideal.

I found https://www.psaudio.com/article/cardinal-sinc/ helpful.

@collares
Copy link
Author

collares commented Aug 23, 2019

The first sample in the buffer represents a sinc function with frequency 8000Hz. If that is centered on the start time, then there would be a few significantly non-zero 48000 Hz rendering samples before the start time.
An interpolation that chopped off those samples would not be ideal.

This is definitely an issue from an audio quality standpoint, but outputting samples before the Node's start time violates lines 95--96 of the playback algorithm (besides being a bit counter-intuitive). This would be impossible to implement if the Node is to start playing immediately, but I agree with you: this should be considered for nodes scheduled for playing in the future. I filed issue #2047 for tracking a specific way of implementing this.

I found https://www.psaudio.com/article/cardinal-sinc/ helpful.

Personally, I agree that linear interpolation is a bad idea due to physical/DSP considerations (and I plan on using libspeex in Servo too once I decide on how to handle the case where the loop length is not a multiple of the "buffer offset per tick"), and I would be happy to see the spec mandating better-than-linear interpolation.

But I feel the issue you raise is mostly orthogonal to the present issue, because the spec strongly suggests linear interpolation is a valid implementation strategy. The spec says "may require additional interpolation between sample frames" (emphasis mine), which in my opinion requires clarification. From reading the spec, it didn't occur to me that linear extrapolation (or better, such as sinc interpolation) would be required instead of just desirable.

@rtoy
Copy link
Member

rtoy commented Aug 27, 2019

I had to check the code to see what Chrome is doing. See https://cs.chromium.org/chromium/src/third_party/blink/renderer/modules/webaudio/audio_buffer_source_node.cc?rcl=d0788ba8029af2c73443ef598ed5871f1cc44450&l=348

Based on the comment there, it's linearly extrapolating the last two samples to find the output sample. I guess that's kind of reasonable since you don't know what the following value would be since you're at the end of the buffer.

@mdjp mdjp added this to the Web Audio V1 milestone Sep 16, 2019
@rtoy
Copy link
Member

rtoy commented Sep 26, 2019

See also WebAudio/web-audio-api-v2#38.

@padenot
Copy link
Member

padenot commented Oct 31, 2019

But I feel the issue you raise is mostly orthogonal to the present issue, because the spec strongly suggests linear interpolation is a valid implementation strategy. The spec says "may require additional interpolation between sample frames" (emphasis mine), which in my opinion requires clarification. From reading the spec, it didn't occur to me that linear extrapolation (or better, such as sinc interpolation) would be required instead of just desirable.

AudioWG call: We're going to do a fix for this bit in bold, but it is indeed related to WebAudio/web-audio-api-v2#38, that we'll get clarified in V2.

@rtoy
Copy link
Member

rtoy commented Oct 31, 2019

A little more info from the call. We'll probably say it's extrapolated, but leave the extrapolation method unspecified.

Simple justification: if you're doing buffer stitching and have ABSNs that are contiguous parts of a large audio source where all the pieces are basically continuous, then extrapolation will produce a value that is close to the next value from the next buffer. If you interpolate between the last sample and zero, the output will probably differ quite a bit from the next buffer value unless it happened to be 0.

@haywirez
Copy link

haywirez commented Jun 9, 2020

(Also) related v2 issue: https://github.com/WebAudio/web-audio-api-v2/issues/25

@rtoy
Copy link
Member

rtoy commented Sep 11, 2020

Let's see how this goes. Assume an AudioBuffer with one channel of length 3 with the same sample rate as the AudioContext. For simplicity, we'll be working with frames, not seconds.

Let interp(n,m) be some function to interpolate between frames n and m of the AudioBuffer. (This isn't specified; it could be simple linear interpolation or a more complicated sinc interpolator.)

Assume the user calls start(0.5). Let out[n] be the output at frame n. Then,

out[0] = 0;  // because we haven't started yet
out[1] = interp(0, 1);
out[2] = interp(1, 2);
out[3] = ?;
out[4] = 0;

I think the above is straightforward. But what about out[3]? We can't do interp(2, 3) because there is no frame 3 in the buffer.

There were two options here:

  1. Interpolate between frame 2 and a value of 0.
  2. Extraplate

The conclusion from the teleconf was to extrapolate to produce this output. So frames 1 and 2 (and possibly more) are used to extrapolate an appropriate output value.

Whenever we run out of data but need one more sample we extrapolate from previous values. This includes the case where the sample rates are different or the playbackRate is not 1.

I don't intend to put this much detail into the spec; I think we can just say that if any of the following holds for a non-looping source

  1. the start time is not on a frame boundary
  2. the sample rates differ
  3. the playbackRate is not 1

then last output value is extrapolated from the last values of the buffer. The extrapolation method is implementation-dependent.

rtoy added a commit to rtoy/web-audio-api that referenced this issue Oct 2, 2020
Update `playbackSignal` to mention that extrapolation is used to
compute the output sample when the buffer is not looping and we're at
the end of the buffer but need output a sample after the end of the
buffer but before the next output sample frame.
@karlt
Copy link
Contributor

karlt commented Oct 15, 2020

Assume the user calls start(0.5). Let out[n] be the output at frame n. Then,

out[0] = 0; // because we haven't started yet

interp(-1, 0) would usually give a better result here, as an interpolation between the value zero (at frame -1) and the value at frame 0 of the buffer.

The case for this is even stronger with start((n + ε)/F) and 0 < ε ≪ 1, which is a likely scenario given the limited precision of double start times. In this case, using out[n] = 0 would be skipping the first frame in the buffer. The wish to fabricate an additional sample at the end of the buffer would motivated by the missing first sample in the subsequent contiguous buffer being stitched.

If, however, the first sample is interpolated with zero, then the last sample can be interpolated with zero, which provides consistent interpolation between contiguous buffers.

If the playback algorithm doesn't support this, then it is not conforming to the stated principles "Sub-sample start offsets or loop points may require additional interpolation between sample frames" and "Resampling of the buffer may be performed arbitrarily by the UA at any desired point to increase the efficiency or quality of the output."

@rtoy rtoy closed this as completed in 23502fc Dec 15, 2020
V1 automation moved this from Ready for Editing to Done Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
V1
  
Done
Development

No branches or pull requests

6 participants