|
| 1 | +# @soundtouchjs/phase-vocoder-worklet |
| 2 | + |
| 3 | +An `AudioWorklet` integration that uses the phase vocoder time-stretch algorithm from `@soundtouchjs/stretch-phase-vocoder`. Provides `PhaseVocoderNode` as a drop-in replacement for `SoundTouchNode` when smoother time-stretching at extreme ratios is required. |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +```sh |
| 8 | +npm install @soundtouchjs/phase-vocoder-worklet @soundtouchjs/stretch-phase-vocoder |
| 9 | +``` |
| 10 | + |
| 11 | +## When to use this package |
| 12 | + |
| 13 | +- Playback at extreme ratios (< 0.5× or > 2×) where the default WSOLA algorithm produces audible artifacts. |
| 14 | +- Smooth slow-motion or fast-forward effects where "phasiness" is acceptable. |
| 15 | +- You want the same `AudioWorkletNode` API as `SoundTouchNode` with a different internal stretch algorithm. |
| 16 | + |
| 17 | +For moderate ratios (0.5–2×) the default `@soundtouchjs/audio-worklet` typically sounds better. |
| 18 | + |
| 19 | +## Setup |
| 20 | + |
| 21 | +### Resolving `processorUrl` |
| 22 | + |
| 23 | +#### Vite |
| 24 | + |
| 25 | +```ts |
| 26 | +import processorUrl from '@soundtouchjs/phase-vocoder-worklet/processor?url'; |
| 27 | +``` |
| 28 | + |
| 29 | +#### webpack 5 |
| 30 | + |
| 31 | +```ts |
| 32 | +const processorUrl = new URL( |
| 33 | + '@soundtouchjs/phase-vocoder-worklet/processor', |
| 34 | + import.meta.url, |
| 35 | +).href; |
| 36 | +``` |
| 37 | + |
| 38 | +#### Static / CDN |
| 39 | + |
| 40 | +Copy `.dist/phase-vocoder-processor.js` to your public directory and reference it by path: |
| 41 | + |
| 42 | +```ts |
| 43 | +const processorUrl = '/phase-vocoder-processor.js'; |
| 44 | +``` |
| 45 | + |
| 46 | +## Usage |
| 47 | + |
| 48 | +```ts |
| 49 | +import { PhaseVocoderNode } from '@soundtouchjs/phase-vocoder-worklet'; |
| 50 | + |
| 51 | +const audioCtx = new AudioContext(); |
| 52 | +await PhaseVocoderNode.register(audioCtx, processorUrl); |
| 53 | + |
| 54 | +const node = new PhaseVocoderNode({ |
| 55 | + context: audioCtx, |
| 56 | + fftSize: 2048, // optional, default 2048 |
| 57 | + overlapFactor: 4, // optional, default 4 |
| 58 | +}); |
| 59 | + |
| 60 | +node.pitch.value = 1.5; // pitch up 50 % |
| 61 | +node.pitchSemitones.value = 3; // or shift by semitones |
| 62 | +node.playbackRate.value = 0.5; // slow down 2× |
| 63 | + |
| 64 | +node.connect(audioCtx.destination); |
| 65 | + |
| 66 | +// Connect your source: |
| 67 | +sourceNode.connect(node); |
| 68 | +``` |
| 69 | + |
| 70 | +## API |
| 71 | + |
| 72 | +### `PhaseVocoderNode` |
| 73 | + |
| 74 | +Extends `AudioWorkletNode`. Same API as `SoundTouchNode` with additional `fftSize` / `overlapFactor` options. |
| 75 | + |
| 76 | +#### Static methods |
| 77 | + |
| 78 | +| Method | Description | |
| 79 | +|--------|-------------| |
| 80 | +| `PhaseVocoderNode.register(context, processorUrl)` | Registers the processor module. Must be called before constructing nodes. | |
| 81 | +| `PhaseVocoderNode.registerStrategyModule(context, moduleUrl)` | Loads an interpolation strategy plugin into worklet scope. | |
| 82 | +| `PhaseVocoderNode.processorName` | The registered processor identifier string. | |
| 83 | + |
| 84 | +#### Constructor options |
| 85 | + |
| 86 | +| Option | Type | Default | Description | |
| 87 | +|--------|------|---------|-------------| |
| 88 | +| `context` | `BaseAudioContext` | required | AudioContext or OfflineAudioContext | |
| 89 | +| `fftSize` | `512 \| 1024 \| 2048 \| 4096` | `2048` | FFT frame size — larger = better frequency resolution, higher latency | |
| 90 | +| `overlapFactor` | `2 \| 4 \| 8` | `4` | Overlap factor — higher = smoother output, more computation | |
| 91 | +| `outputChannelCount` | `1 \| 2` | `2` | Output channel count (set to `1` for mono destinations) | |
| 92 | +| `sampleBufferType` | `'circular' \| 'fifo'` | `'circular'` | Internal buffer strategy | |
| 93 | +| `interpolationStrategy` | `RateTransposerInterpolationStrategy` | `'lanczos'` | Initial rate-transposer interpolation strategy | |
| 94 | + |
| 95 | +#### AudioParams |
| 96 | + |
| 97 | +| Param | Default | Range | Description | |
| 98 | +|-------|---------|-------|-------------| |
| 99 | +| `pitch` | `1.0` | `0.1–8.0` | Pitch multiplier (k-rate) | |
| 100 | +| `pitchSemitones` | `0` | `-24–24` | Pitch shift in semitones, combined with `pitch` (k-rate) | |
| 101 | +| `playbackRate` | `1.0` | `0.1–8.0` | Playback rate multiplier — set this to match the source node's `playbackRate` (k-rate) | |
| 102 | + |
| 103 | +#### Methods |
| 104 | + |
| 105 | +| Method | Description | |
| 106 | +|--------|-------------| |
| 107 | +| `setInterpolationStrategy(strategy)` | Switches interpolation strategy at runtime. | |
| 108 | +| `setInterpolationStrategyParams(params)` | Updates parameters for the active strategy. | |
| 109 | +| `setStretchParameters(params)` | No-op for the phase vocoder (accepted for API parity with `SoundTouchNode`). | |
| 110 | + |
| 111 | +#### Processor observability |
| 112 | + |
| 113 | +The processor posts metrics every 100 render blocks. Access them via the `metrics` getter or the `metrics` CustomEvent: |
| 114 | + |
| 115 | +```ts |
| 116 | +// Getter |
| 117 | +const m = node.metrics; // ProcessorMetrics | null |
| 118 | + |
| 119 | +// Event |
| 120 | +node.addEventListener('metrics', (e) => { |
| 121 | + const { framesBuffered, underrunCount, blockCount } = (e as CustomEvent<ProcessorMetrics>).detail; |
| 122 | + console.log('underruns:', underrunCount); |
| 123 | +}); |
| 124 | +``` |
| 125 | + |
| 126 | +`ProcessorMetrics` shape: |
| 127 | + |
| 128 | +| Field | Description | |
| 129 | +|-------|-------------| |
| 130 | +| `framesBuffered` | Output frames available at the last render block | |
| 131 | +| `underrunCount` | Cumulative render blocks with fewer output frames than requested | |
| 132 | +| `blockCount` | Total render blocks processed | |
| 133 | +| `timestamp` | `performance.now()` when the metrics arrived on the main thread | |
| 134 | + |
| 135 | +## Parameters |
| 136 | + |
| 137 | +| Option | Default | Description | |
| 138 | +|--------|---------|-------------| |
| 139 | +| `fftSize` | 2048 | FFT frame size — larger gives better frequency resolution but higher latency (`fftSize` samples) | |
| 140 | +| `overlapFactor` | 4 | Overlap factor — `4` = 75 % overlap (good quality); `8` = 87.5 % overlap (smoother, 2× cost) | |
| 141 | + |
| 142 | +## Trade-offs vs `SoundTouchNode` |
| 143 | + |
| 144 | +| | `SoundTouchNode` (WSOLA) | `PhaseVocoderNode` | |
| 145 | +|--|--|--| |
| 146 | +| Quality at extreme ratios | Artifacts above 2× | Smooth at all ratios | |
| 147 | +| Transient preservation | Better (time-domain) | Worse (frequency smearing) | |
| 148 | +| Computation | Lower | Higher (FFT per hop) | |
| 149 | +| Startup latency | Lower | `fftSize` samples | |
| 150 | +| Artifacts | Clicks / repeats | "Phasiness" / smearing | |
| 151 | + |
| 152 | +## License |
| 153 | + |
| 154 | +MPL-2.0 — see [LICENSE](../../LICENSE) for details. |
0 commit comments