问题描述
在使用 OffscreenSprite 渲染非 1 倍速的音频 Clip 时,内部会调用 changePCMPlaybackRate 对 PCM 数据进行重采样来实现变速。这种方式在改变播放速度的同时也改变了音调——1.5 倍速播放时,音频不仅变快,音调也会升高,这对于视频剪辑场景来说是不正确的行为。
根本原因
OffscreenSprite 通过修改 PCM 采样率来处理播放速率,而不是通过时间拉伸(Time Stretching)算法。目前没有内置方式将播放速度与音调解耦。
解决方案:用 SoundTouch 包装 AudioClip
核心思路是绕过 OffscreenSprite 的变速逻辑,在外部自行处理音频时间拉伸:
- 创建一个实现
IClip 接口的包装类 SpeedAudioClip
- 在
tick(time) 中将时间乘以速率(time * speed)传入原始 AudioClip,让其读取更多/更少的原始素材
- 将读取到的 PCM 数据通过 soundtouch-ts 处理,设置
tempo = speed、pitch = 1,实现时间拉伸但音调不变
- 将
sprite.time.playbackRate 设置为 1,让 OffscreenSprite 跳过自身的重采样逻辑
依赖安装
npm install soundtouch-ts
# 或
pnpm add soundtouch-ts
完整实现代码
import type { AudioClip, IClip } from '@webav/av-cliper';
import { SoundTouch } from 'soundtouch-ts';
interface IClipMeta {
width: number;
height: number;
duration: number;
sampleRate?: number;
chanCount?: number;
}
export class SpeedAudioClip implements IClip {
private realClip: AudioClip;
private speed: number;
private st: SoundTouch;
private sampleRate: number;
private lastRealTime: number = 0;
private outputBuffer: Float32Array[] = [];
private bufferedSamples: number = 0;
readonly ready: Promise<IClipMeta>;
private _meta: IClipMeta | null = null;
constructor(realClip: AudioClip, speed: number, sampleRate: number = 48000) {
this.realClip = realClip;
this.speed = speed;
this.sampleRate = sampleRate;
this.st = new SoundTouch(sampleRate);
// 根据速率计算最优参数(参考 SoundTouch 自动参数算法)
const { sequenceMs, seekWindowMs, overlapMs } = this.calculateOptimalParams(speed);
this.st.tdStretch.setParameters(sampleRate, sequenceMs, seekWindowMs, overlapMs);
this.st.tdStretch.quickSeek = false;
// tempo 变速保持音调,pitch 固定为 1
this.st.tempo = speed;
this.st.pitch = 1;
this.outputBuffer = [new Float32Array(0), new Float32Array(0)];
this.ready = realClip.ready.then(async () => {
const meta = await realClip.meta;
this._meta = {
width: meta.width,
height: meta.height,
duration: meta.duration / speed, // 变速后时长
sampleRate: meta.sampleRate,
chanCount: meta.chanCount,
};
return this._meta;
});
}
/**
* 根据播放速率计算最优 SoundTouch 参数(移植自 SoundTouch C++ 自动参数算法)
*/
private calculateOptimalParams(rate: number) {
const AUTOSEQ_TEMPO_LOW = 0.5;
const AUTOSEQ_TEMPO_TOP = 2.0;
const AUTOSEQ_AT_MIN = 125.0;
const AUTOSEQ_AT_MAX = 50.0;
const AUTOSEEK_AT_MIN = 25.0;
const AUTOSEEK_AT_MAX = 15.0;
const AUTOSEQ_K = (AUTOSEQ_AT_MAX - AUTOSEQ_AT_MIN) / (AUTOSEQ_TEMPO_TOP - AUTOSEQ_TEMPO_LOW);
const AUTOSEQ_C = AUTOSEQ_AT_MIN - AUTOSEQ_K * AUTOSEQ_TEMPO_LOW;
const AUTOSEEK_K = (AUTOSEEK_AT_MAX - AUTOSEEK_AT_MIN) / (AUTOSEQ_TEMPO_TOP - AUTOSEQ_TEMPO_LOW);
const AUTOSEEK_C = AUTOSEEK_AT_MIN - AUTOSEEK_K * AUTOSEQ_TEMPO_LOW;
const clampedTempo = Math.max(AUTOSEQ_TEMPO_LOW, Math.min(AUTOSEQ_TEMPO_TOP, rate));
const sequenceMs = Math.max(AUTOSEQ_AT_MAX, Math.min(AUTOSEQ_AT_MIN, AUTOSEQ_C + AUTOSEQ_K * clampedTempo));
const seekWindowMs = Math.max(AUTOSEEK_AT_MAX, Math.min(AUTOSEEK_AT_MIN, AUTOSEEK_C + AUTOSEEK_K * clampedTempo));
return { sequenceMs, seekWindowMs, overlapMs: 8 };
}
get meta(): IClipMeta {
if (!this._meta) throw new Error('SpeedAudioClip not ready');
return this._meta;
}
tick = async (time: number): Promise<{ audio: Float32Array[]; state: 'success' | 'done' }> => {
// 变速后时间 → 原始时间
const realTime = time * this.speed;
const result = await this.realClip.tick(realTime);
if (result.state === 'done' || !result.audio || result.audio.length === 0) {
return result;
}
// 速度接近 1 时直接返回,不做处理
if (Math.abs(this.speed - 1) < 0.01) return result;
// 检测 seek 或时间跳跃,重置 SoundTouch 状态
const timeDiff = realTime - this.lastRealTime;
if ((this.lastRealTime > 0 && timeDiff < 0) || (this.lastRealTime > 0 && timeDiff > 1_000_000)) {
this.st.clear();
this.outputBuffer = [new Float32Array(0), new Float32Array(0)];
this.bufferedSamples = 0;
}
this.lastRealTime = realTime;
const audio = result.audio;
const channelCount = audio.length;
const inputFrameCount = audio[0].length;
// 转换为立体声交错格式(SoundTouch 只接受立体声交错输入)
const stereoInterleaved = new Float32Array(inputFrameCount * 2);
for (let i = 0; i < inputFrameCount; i++) {
stereoInterleaved[i * 2] = audio[0][i];
stereoInterleaved[i * 2 + 1] = audio[1]?.[i] ?? audio[0][i];
}
// 送入 SoundTouch 处理
this.st.inputBuffer.putSamples(stereoInterleaved, 0, inputFrameCount);
this.st.process();
// 取出处理结果,分离为双通道并追加到输出缓冲区
const stOutputCount = this.st.outputBuffer.frameCount;
if (stOutputCount > 0) {
const stereoOutput = new Float32Array(stOutputCount * 2);
this.st.outputBuffer.receiveSamples(stereoOutput, stOutputCount);
const left = new Float32Array(stOutputCount);
const right = new Float32Array(stOutputCount);
for (let i = 0; i < stOutputCount; i++) {
left[i] = stereoOutput[i * 2];
right[i] = stereoOutput[i * 2 + 1];
}
this.appendToBuffer(left, right);
}
// 期望输出帧数 = 输入帧数 / speed(因为读入了 speed 倍的原始数据)
const expectedOutputFrames = Math.round(inputFrameCount / this.speed);
// 缓冲区不足时返回静音(SoundTouch 预热期)
if (this.bufferedSamples < expectedOutputFrames) {
return {
audio: Array.from({ length: channelCount }, () => new Float32Array(expectedOutputFrames)),
state: 'success',
};
}
const [left, right] = this.consumeFromBuffer(expectedOutputFrames);
const outputAudio: Float32Array[] = [];
if (channelCount === 1) {
outputAudio[0] = new Float32Array(expectedOutputFrames);
for (let i = 0; i < expectedOutputFrames; i++) {
outputAudio[0][i] = (left[i] + right[i]) / 2;
}
} else {
outputAudio[0] = left;
outputAudio[1] = right;
for (let ch = 2; ch < channelCount; ch++) {
outputAudio[ch] = new Float32Array(right);
}
}
return { audio: outputAudio, state: 'success' };
};
private appendToBuffer(left: Float32Array, right: Float32Array): void {
const newLeft = new Float32Array(this.bufferedSamples + left.length);
const newRight = new Float32Array(this.bufferedSamples + right.length);
newLeft.set(this.outputBuffer[0]);
newLeft.set(left, this.bufferedSamples);
newRight.set(this.outputBuffer[1]);
newRight.set(right, this.bufferedSamples);
this.outputBuffer[0] = newLeft;
this.outputBuffer[1] = newRight;
this.bufferedSamples += left.length;
}
private consumeFromBuffer(frameCount: number): [Float32Array, Float32Array] {
const left = this.outputBuffer[0].slice(0, frameCount);
const right = this.outputBuffer[1].slice(0, frameCount);
this.outputBuffer[0] = this.outputBuffer[0].slice(frameCount);
this.outputBuffer[1] = this.outputBuffer[1].slice(frameCount);
this.bufferedSamples = Math.max(0, this.bufferedSamples - frameCount);
return [left, right];
}
clone = async () => {
const clonedReal = await this.realClip.clone();
return new SpeedAudioClip(clonedReal as AudioClip, this.speed, this.sampleRate) as this;
};
split = async (time: number) => {
const realTime = time * this.speed;
const [l, r] = await this.realClip.split(realTime);
return [
new SpeedAudioClip(l as AudioClip, this.speed, this.sampleRate),
new SpeedAudioClip(r as AudioClip, this.speed, this.sampleRate),
] as [this, this];
};
destroy(): void {
this.realClip.destroy();
this.st.clear();
this.outputBuffer = [];
}
}
/**
* 工厂函数:速度接近 1 时直接返回原始 Clip,避免不必要的处理开销
*/
export function createSpeedAudioClip(audioClip: AudioClip, speed: number, sampleRate = 48000): IClip {
if (Math.abs(speed - 1) < 0.01) return audioClip;
return new SpeedAudioClip(audioClip, speed, sampleRate);
}
使用方式
import { createSpeedAudioClip } from './speed-audio-clip';
// 将原始 AudioClip 包装后传入 OffscreenSprite
const audioClip = new AudioClip(audioSource);
const speedClip = createSpeedAudioClip(audioClip, 1.5); // 1.5 倍速
const sprite = new OffscreenSprite(speedClip);
// 关键:将 sprite 的 playbackRate 设为 1,让 OffscreenSprite 不再做自己的变速处理
sprite.time.playbackRate = 1;
注意事项
- 预热延迟:SoundTouch 有几帧的内部预热期,在此期间缓冲区数据不足,会返回静音帧,属于正常现象
- Seek 重置:检测到时间倒退或大幅跳跃时需调用
st.clear() 重置内部状态,否则会产生音频错乱
- 单声道兼容:SoundTouch 只处理立体声交错格式,单声道输入需复制到两个通道,输出时再取均值还原
- 速度范围:
calculateOptimalParams 参照 SoundTouch 自动参数算法,在 0.5x~2.0x 范围内效果最佳,超出范围时参数会被 clamp
效果
| 方式 |
速度变化 |
音调变化 |
OffscreenSprite 原生 playbackRate |
✅ |
❌ 音调随速度改变 |
| 本方案(SoundTouch 包装) |
✅ |
✅ 音调保持不变 |
问题描述
在使用
OffscreenSprite渲染非 1 倍速的音频 Clip 时,内部会调用changePCMPlaybackRate对 PCM 数据进行重采样来实现变速。这种方式在改变播放速度的同时也改变了音调——1.5 倍速播放时,音频不仅变快,音调也会升高,这对于视频剪辑场景来说是不正确的行为。根本原因
OffscreenSprite通过修改 PCM 采样率来处理播放速率,而不是通过时间拉伸(Time Stretching)算法。目前没有内置方式将播放速度与音调解耦。解决方案:用 SoundTouch 包装 AudioClip
核心思路是绕过
OffscreenSprite的变速逻辑,在外部自行处理音频时间拉伸:IClip接口的包装类SpeedAudioCliptick(time)中将时间乘以速率(time * speed)传入原始AudioClip,让其读取更多/更少的原始素材tempo = speed、pitch = 1,实现时间拉伸但音调不变sprite.time.playbackRate设置为1,让OffscreenSprite跳过自身的重采样逻辑依赖安装
npm install soundtouch-ts # 或 pnpm add soundtouch-ts完整实现代码
使用方式
注意事项
st.clear()重置内部状态,否则会产生音频错乱calculateOptimalParams参照 SoundTouch 自动参数算法,在 0.5x~2.0x 范围内效果最佳,超出范围时参数会被 clamp效果
OffscreenSprite原生playbackRate