Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When exactly does stop(time) stop? #2452

Open
rtoy opened this issue Sep 13, 2018 · 10 comments
Open

When exactly does stop(time) stop? #2452

rtoy opened this issue Sep 13, 2018 · 10 comments

Comments

@rtoy
Copy link
Member

rtoy commented Sep 13, 2018

Consider you have an offline context with sample rate F and with an ABSN scheduled to stop at time t. Where exactly does the ABSN stop?

If F*t lies between sample frames, it's pretty clear that the last non-zero output happens at the frame just less than F*t, i.e, floor(F*t). But what happens if F*t is exactly on a frame boundary? The spec doesn't really say, but I think the output value at that frame must be 0.

Doing it this way makes stop consistent with ABSN duration, I think. If the ABSN has a duration d, the number of frames is F*d, which is a whole number of frames, so the output is zero at frame F*d. If we did stop(d) as defined above, then this would be exactly equivalent to letting the ABSN run without calling stop.

@karlt
Copy link
Contributor

karlt commented Sep 15, 2018

When start, stop, and duration are whole numbers of frames, my expectation is
similar about stop being consistent with duration.

Re the boundaries for non-aligned times, I've aimed to avoid the situation
where a frame count converted to double would be at risk of being interpreted
differently according to whether rounding due to loss of precision in double
is up or down.

In the same way that an ABSN output sample at currentTime = 0 describes the
intensity of a band-limited impulse response centred at currentTime = 0, the
zeroth sample frame from an AudioBuffer represents a band-limited impulse
response at the very start of the buffer.

My expectation is that we should aim to centre the band-limited impulse for a
sample from a single-frame buffer (or the zeroth frame from any buffer) on the
start time.

This is simple enough when start time is frame aligned. The band-limited
impulse from the sample is centred on the start time by playing the sample
at the output frame corresponding to the start time.

When the start time is not frame aligned, the sample from the buffer can be
interpolated. Choosing a sample from the buffer to play in an ABSN output
frame (representing a slightly different time) is a zero-order interpolation.
Rounding start time to the nearest output frame provides a better zero-order
interpolation than floor(F*t).

"A starting offset, which can expressed with sub-sample precision" implies
that better interpolation methods are preferred but the algorithm chooses a
very basic interpolation of start and stop times (and duration).
https://webaudio.github.io/web-audio-api/#playback-AudioBufferSourceNode

For consistency with start time, an aligned stop time would describe the
centre of an impulse that is not played. Unaligned stop times can be
interpolated consistently with the start time. If one ABSN is starting at the
same time as another is stopping, then the expectation is that the second
could take over seamlessly from the first.

@rtoy
Copy link
Member Author

rtoy commented Sep 18, 2018

Sorry, I'm thoroughly confused. How does a band-limited impulse come into play here?

For the start time, I think rounding the time to the nearest frame is wrong. It should "start" exactly where I say so that the frame just before the start time must be 0 and the frame after the start time is not. We implemented this approach for Chrome's AudioParam's and it fixed a huge number of issues. Previously AudioParams would round to the nearest frame, but it's much easier to reason about if params started exactly where the time said. See also #915

@rtoy rtoy self-assigned this Sep 27, 2018
@karlt
Copy link
Contributor

karlt commented Oct 8, 2018

Consider

let context = new AudioContext();
let buffer = new AudioBuffer({length: 1, sampleRate: context.sampleRate});
buffer.getChannelData(0)[0] = 1.0;
let source = new AudioBufferSourceNode(context, {buffer: buffer});
source.loop = true;
source.start((n + epsilon) / context.sampleRate);

Assume n is a whole number and ε ≪ 1.

The samples in the buffer are there to represent a continuous function.
A band limited impulse or sinc function is just a means to interpolate a
series of samples to produce a continuous signal. I guess the precise
interpolation mechanism is not critical here. Playing the buffer involves
interpolation of buffer sample frames and then sampling at output
(AudioContext) frames (with pre-sampling band-limiting as appropriate).

For subsample accuracy in start time, the zeroth sample in the buffer will be
played at time (n + ε)/F. One can imagine buffer sample frames before this
time that are not played and so are equivalent to playing samples of value 0.
The last of these corresponds to a time (n - 1 + ε)/F. The continuous
function represented by the looping buffer would be initially 0,
0 at (n - 1 + ε)/F, 1 at (n + ε)/F, and finally 1, with some transitions
along the way. The details around the transition in the function,
particularly between 0 at (n - 1 + ε)/F and 1 at (n + ε)/F, depend on how
the samples in the buffer are interpolated to convert to a continuous signal.
Let's say this interpolation is accurate enough that the continuous signal
represented by the looping buffer is like a band-limited Heaviside step
function centred between the sample points having values of 0 and 1.
i.e. H(t - (n - 0.5 + ε)/F).

When generating the ABSN output, the continuous signal represented by the
buffer is sampled. The precise output would depend on the band-limiting and
sampling algorithms, but sample frame n would have a value something like
1 - ε.

What is counter-intuitive is that this is not like trimming off the leading
part of the continuous function H() at t = (n + ε)/F. Doing that would
generate something like (1 - ε)/2 at output frame n. If you consider the
ε = 0 case, it is clear that is not what we want. It would represent
cutting off half the first sample from the buffer. IOW the start time
indicates when the first frame from the buffer is played in full, not when
half the first sample is played.

My point was that rounding the start time to the nearest output frame would
generate output of 1 at sample frame n. Setting the output to 0 at frame
n because it is before the start time would be a much worse approximation.

@rtoy
Copy link
Member Author

rtoy commented Oct 9, 2018

Thanks for the detailed explanation. I understand what you're saying. However, your argument kind of assumes the ABSN is bandlimited (because you are bandlimiting the step function). But that's not a requirement for an ABSN.

My expectation is that with epsilon > 0, then at time n/F, the output is zero and at time (n+1)/F, it is not zero, with the actual value depending on how the interpolation is done. If epsilon is zero, then I would expect a value of 1 would be output at time n/F, and 0 at time (n-1)/F. This isn't band-limited, but that's not a problem here.

@karlt
Copy link
Contributor

karlt commented Oct 9, 2018

The output of ABSN is band-limited because it has a finite sample rate. (e.g. it cannot precisely represent a sub-sample start time.)
There is the option of not band-limiting before sampling, which will produce aliasing during the band-limiting that occurs during sampling.

But it is not really the step function that I was choosing to band-limit.
The band-limited step function is just the ideal interpolation of the buffer samples.

I found https://www.psaudio.com/article/cardinal-sinc/ a helpful resource.

@rtoy
Copy link
Member Author

rtoy commented Nov 30, 2018

I still stand by my original comments in https://github.com/WebAudio/web-audio-api/issues/1749#issue-360102561.

If an ABSN has 44100 samples in it and the sample rate is 44100, the duration is exactly 1. And the output has exactly 44100 samples so if we started the source at time 0, output frame 44099 will have the last sample in the ABSN and frame 44100 and after is 0.

So if a stop time lies on an exact frame boundary, the value of at that frame should be 0. If this is not the case, consider an ABSN with 50000 samples. I call stop(1). Conceptually this is the same as the original ABSN above. I would expect frame 44100 to have a value of 0. If we don't do this, then you you'll get a glitch if you started another ABSN at time 1 because you have a non-zero value from the ABSN that was stopped.

@hoch hoch removed their assignment Jan 17, 2019
@mdjp mdjp assigned padenot and unassigned rtoy Jan 31, 2019
@rtoy
Copy link
Member Author

rtoy commented Feb 28, 2019

Teleconf: not important enough to do for v1. Move to v.next.

@mdjp mdjp transferred this issue from WebAudio/web-audio-api Sep 17, 2019
@chrisguttandin
Copy link
Contributor

I made a quick test and it looks like Chrome and Firefox already do what @rtoy said in the last comment.

const offlineAudioContext = new OfflineAudioContext({ length: 88200, sampleRate: 44100 });
const constantSourceNode = new ConstantSourceNode(offlineAudioContext);

constantSourceNode.start(0);
constantSourceNode.stop(1);

constantSourceNode.connect(offlineAudioContext.destination);

offlineAudioContext
    .startRendering()
    .then((renderedBuffer) => {
        console.log(Array.from(renderedBuffer.getChannelData(0)).slice(44099, 44101));
        // This will log [ 1, 0 ].
    });

@rtoy
Copy link
Member Author

rtoy commented Oct 21, 2020

TPAC 2020:

Based on https://github.com/WebAudio/web-audio-api-v2/issues/38#issuecomment-642793385, both Chrome and Firefox interpret stop(t) in the same way where the sample at time t is 0. We just need to make this clear in the spec.

@rtoy
Copy link
Member Author

rtoy commented May 19, 2021

Virtual F2F 2021: https://github.com/WebAudio/web-audio-api-v2/issues/38#issuecomment-713729212 still holds. Now that V1 is basically done, we can start updating the text with these changes.

@mdjp mdjp transferred this issue from WebAudio/web-audio-api-v2 Sep 29, 2021
@mdjp mdjp added this to Untriaged in v.next via automation Sep 29, 2021
@mdjp mdjp moved this from Untriaged to In discussion in v.next Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
v.next
In discussion
Development

No branches or pull requests

6 participants