Skip to content

WIP: @remotion/whisper-webgpu: #5267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 107 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
b489acf
initial commit
hunxjunedo Mar 21, 2025
f1a5915
add docs basic
hunxjunedo Mar 23, 2025
da58546
update sidebar
hunxjunedo Mar 23, 2025
e249d8c
minor fixes
hunxjunedo Mar 23, 2025
de4f0d7
Merge branch 'main' into pr/5043
JonnyBurger Mar 24, 2025
dbd9601
fixes
hunxjunedo Mar 24, 2025
b4394a4
Merge branch 'whisper-wasm-docs'
hunxjunedo Mar 24, 2025
3c7caa1
merge
hunxjunedo Mar 24, 2025
79bc9e7
Merge branch 'main' into pr/5043
JonnyBurger Mar 25, 2025
81f6085
enable typescript
JonnyBurger Mar 25, 2025
d658f95
compile whisper.wasm with emscripten
JonnyBurger Mar 25, 2025
25acad1
all fixed except ts in helper.ts
hunxjunedo Mar 25, 2025
2a2f195
add types
hunxjunedo Mar 25, 2025
8ff4b75
update docs
hunxjunedo Mar 25, 2025
2d7cd28
fix
hunxjunedo Mar 25, 2025
c70b370
fix example
hunxjunedo Mar 25, 2025
fc76512
fixes
hunxjunedo Mar 25, 2025
ecaa87b
Merge branch 'main' into main
JonnyBurger Mar 25, 2025
87c1b1b
Merge branch 'main' into pr/5043
JonnyBurger Mar 26, 2025
ab8a6c5
Merge branch 'main' of https://github.com/hunxjunedo/remotion into pr…
JonnyBurger Mar 26, 2025
2790e94
render cards and link module in example
JonnyBurger Mar 26, 2025
bc76e73
unsupport models with bad performance
JonnyBurger Mar 26, 2025
63a8f19
my modifications
JonnyBurger Mar 26, 2025
5424deb
use the function previously useless
hunxjunedo Mar 26, 2025
39c6a0f
export main
JonnyBurger Mar 26, 2025
d5afdd7
Update start-server.ts
JonnyBurger Mar 26, 2025
a99f97d
add example audio
JonnyBurger Mar 26, 2025
ad0ce3f
add example testbed
JonnyBurger Mar 26, 2025
2c6ed62
Update package.json
JonnyBurger Mar 26, 2025
26d3876
Merge branch 'pr/5043' into whisper-modifications
JonnyBurger Mar 26, 2025
7a0aa37
progress
JonnyBurger Mar 26, 2025
3cf8a8e
gooo
JonnyBurger Mar 26, 2025
b2adf7d
refactor
JonnyBurger Mar 26, 2025
8199272
progress
JonnyBurger Mar 26, 2025
be9f1a7
Update transcribe.ts
JonnyBurger Mar 26, 2025
34a21f6
make printer work again
JonnyBurger Mar 26, 2025
e77adc5
language
JonnyBurger Mar 26, 2025
1ea1d77
Merge branch 'main' into pr/5043
JonnyBurger Mar 31, 2025
bfa1ade
some progress
JonnyBurger Mar 31, 2025
546b0f9
this seems to hold up
JonnyBurger Mar 31, 2025
4e5601e
scope storeFS
JonnyBurger Mar 31, 2025
4452564
fix cyclic depdendency
JonnyBurger Mar 31, 2025
e0e215a
big progress
JonnyBurger Mar 31, 2025
ceeb498
get json
JonnyBurger Mar 31, 2025
1ca08af
better
JonnyBurger Mar 31, 2025
beba6e4
remove artificial timeout
JonnyBurger Mar 31, 2025
605e55c
load mod unique
JonnyBurger Apr 1, 2025
1b3e76d
Merge branch 'main' into pr/5043
JonnyBurger Apr 22, 2025
0014acf
Update Avif.tsx
JonnyBurger Apr 22, 2025
8f7c5c1
revert to working
JonnyBurger Apr 22, 2025
a832538
sizes
JonnyBurger Apr 22, 2025
5249b23
cleanup
JonnyBurger Apr 22, 2025
477e72a
get it together
JonnyBurger Apr 22, 2025
ba5519e
nice progress
JonnyBurger Apr 22, 2025
45b0e11
emscripten
JonnyBurger Apr 22, 2025
64454a8
becoming super nice!
JonnyBurger Apr 22, 2025
11676c8
this still works
JonnyBurger Apr 22, 2025
4b72e10
always better
JonnyBurger Apr 22, 2025
7cc87b8
Merge branch 'main' into whisper-wasm
samohovets May 5, 2025
e4748dc
update whisper-wasm package version
samohovets May 5, 2025
5c936cd
add language param and unhardcode english
samohovets May 6, 2025
df2bb54
rename onUpdate to onTranscribedChunks
samohovets May 6, 2025
5cf2999
add logger to see model's output in verbose logs
samohovets May 6, 2025
e9a4503
new `canUseWhisperWasm()` api
samohovets May 6, 2025
a65531a
new `canDownloadModel()` API
samohovets May 6, 2025
f7d633c
a single `canUseWhisperWasm` API for usability checks
samohovets May 7, 2025
ffd7b88
new API: `resampleTo16Khz()`
samohovets May 7, 2025
ef08334
move cross origin isolation check to `canUseWhisperWasm`
samohovets May 7, 2025
04dccd5
update docs
samohovets May 7, 2025
69cd727
onTranscribedChunks -> onTranscriptionChunk
samohovets May 7, 2025
d5fcf7d
invalid model name doesn't mean we can't use whisper in the browser
samohovets May 9, 2025
b2b97ed
AI-refine whisper-wasm testbed
samohovets May 9, 2025
074e935
Merge branch 'main' of https://github.com/remotion-dev/remotion into …
samohovets May 9, 2025
430e992
add experimental package badge
samohovets May 9, 2025
37f11f1
update package version
samohovets May 9, 2025
41e9601
add vanilla example to whisper-wasm
samohovets May 9, 2025
21045f8
document getLoadedModels() API
samohovets May 9, 2025
9903fb9
Merge branch 'main' into main
JonnyBurger May 12, 2025
7777da3
Render cards + 4.0.300 upgrade
JonnyBurger May 12, 2025
0cb59c4
update cards
JonnyBurger May 12, 2025
8d559dc
refactor downloadWhisperModel() and update docs
JonnyBurger May 12, 2025
ae7def4
update
JonnyBurger May 12, 2025
a7b9d12
get the files
JonnyBurger May 12, 2025
cdcc87d
make it work
JonnyBurger May 12, 2025
ec19a34
remove unnecessary stuff
JonnyBurger May 12, 2025
0a25842
progress
JonnyBurger May 12, 2025
1e181ff
this still works
JonnyBurger May 12, 2025
a2a359f
good progress
JonnyBurger May 12, 2025
b7858ab
add back whisper tokenizer
JonnyBurger May 12, 2025
f92961b
progress
JonnyBurger May 13, 2025
34fd47d
nice that works
JonnyBurger May 13, 2025
c132e3f
alright
JonnyBurger May 13, 2025
338e3d7
fft in remotion
JonnyBurger May 13, 2025
c71447e
progress, but does NOT work atm
JonnyBurger May 13, 2025
176e262
awesome
JonnyBurger May 13, 2025
59e55d0
interesting...
JonnyBurger May 13, 2025
0df25b7
alright
JonnyBurger May 13, 2025
ae65ad6
better
JonnyBurger May 13, 2025
a5d9bc3
does not work
JonnyBurger May 13, 2025
5eee73a
awesome
JonnyBurger May 13, 2025
aaf7544
whisper-wasm -> whisper-web
JonnyBurger May 13, 2025
fb7eeba
Revert "whisper-wasm -> whisper-web"
JonnyBurger May 13, 2025
a1ccaac
`@remotion/whisper-wasm`-> `@remotion/whisper-webgpu`
JonnyBurger May 13, 2025
08ac696
cards
JonnyBurger May 13, 2025
f9d7272
logs
JonnyBurger May 13, 2025
0957820
use xenova
JonnyBurger May 13, 2025
58018ad
Merge branch 'main' into whisper-web
JonnyBurger May 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"testlambda": "turbo run testlambda --concurrency=1 --no-update-notifier",
"ci": "turbo run make test --concurrency=1 --no-update-notifier",
"watch": "turbo watch make --concurrency=2 --experimental-write-cache --ui=tui",
"watchwhisperwebpu": "turbo watch make --experimental-write-cache --filter='@remotion/whisper-webgpu'",
"watchwebcodecs": "turbo watch make --experimental-write-cache --filter='@remotion/media-parser' --filter='@remotion/webcodecs'",
"watchcore": "turbo watch make --experimental-write-cache --filter='remotion'",
"watchstudio": "turbo watch make --experimental-write-cache --filter='@remotion/studio' --filter='@remotion/studio-server'",
Expand Down
3 changes: 3 additions & 0 deletions packages/.monorepo/builder.ts
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ const validateExternal = (external: string[]) => {
if (dep === 'stream' || dep === 'fs' || dep === 'path') {
continue;
}
if (dep.startsWith('.')) {
continue;
}
if (!packageJson.includes(stripEntryPoints(dep))) {
throw new Error(
`External dependency ${stripEntryPoints(dep)} not found in package.json`,
Expand Down
4 changes: 3 additions & 1 deletion packages/STATS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Download statistics

Monthly downloads of Remotion packages
[![NPM Downloads](https://img.shields.io/npm/dm/@remotion/animated-emoji.svg?style=flat&color=black&label=@remotion/animated-emoji)](https://npmcharts.com/compare/@remotion/animated-emoji?minimal=true)
[![NPM Downloads](https://img.shields.io/npm/dm/@remotion/animation-utils.svg?style=flat&color=black&label=@remotion/animation-utils)](https://npmcharts.com/compare/@remotion/animation-utils?minimal=true)
Expand Down Expand Up @@ -54,4 +55,5 @@ Monthly downloads of Remotion packages
[![NPM Downloads](https://img.shields.io/npm/dm/@remotion/three.svg?style=flat&color=black&label=@remotion/three)](https://npmcharts.com/compare/@remotion/three?minimal=true)
[![NPM Downloads](https://img.shields.io/npm/dm/@remotion/transitions.svg?style=flat&color=black&label=@remotion/transitions)](https://npmcharts.com/compare/@remotion/transitions?minimal=true)
[![NPM Downloads](https://img.shields.io/npm/dm/@remotion/webcodecs.svg?style=flat&color=black&label=@remotion/webcodecs)](https://npmcharts.com/compare/@remotion/webcodecs?minimal=true)
[![NPM Downloads](https://img.shields.io/npm/dm/@remotion/zod-types.svg?style=flat&color=black&label=@remotion/zod-types)](https://npmcharts.com/compare/@remotion/zod-types?minimal=true)
[![NPM Downloads](https://img.shields.io/npm/dm/@remotion/whisper-webgpu.svg?style=flat&color=black&label=@remotion/whisper-webgpu)](https://npmcharts.com/compare/@remotion/whisper-webgpu?minimal=true)
[![NPM Downloads](https://img.shields.io/npm/dm/@remotion/zod-types.svg?style=flat&color=black&label=@remotion/zod-types)](https://npmcharts.com/compare/@remotion/zod-types?minimal=true)
1 change: 1 addition & 0 deletions packages/cli/src/list-of-remotion-packages.ts
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,5 @@ export const listOfRemotionPackages = [
'@remotion/openai-whisper',
'@remotion/compositor',
'@remotion/example-videos',
'@remotion/whisper-webgpu',
];
1 change: 1 addition & 0 deletions packages/create-video/src/list-of-remotion-packages.ts
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,5 @@ export const listOfRemotionPackages = [
'@remotion/openai-whisper',
'@remotion/compositor',
'@remotion/example-videos',
'@remotion/whisper-webgpu',
];
101 changes: 101 additions & 0 deletions packages/docs/docs/whisper-webgpu/can-use-whisper-webgpu.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
image: /generated/articles-docs-whisper-webgpu-can-use-whisper-webgpu.png
title: canUseWhisperWebGpu()
crumb: '@remotion/whisper-webgpu'
---

:::warning
**Unstable API**: This package is experimental for the moment. As we test it, we might make a few changes to the API.
:::

# canUseWhisperWebGpu()

Checks if the current browser environment supports running `@remotion/whisper-webgpu` with a specified model. This function verifies various browser capabilities like `crossOriginIsolated`, `IndexedDB`, `navigator.storage.estimate()`, and available storage space.

## Example usage

```tsx twoslash
import {canUseWhisperWebGpu, type WhisperWebGpuModel} from '@remotion/whisper-webgpu';
import {useState, useEffect} from 'react';

export default function MyComponent() {
const [supported, setSupported] = useState<boolean | null>(null);
const [reason, setReason] = useState<string | undefined>(undefined);

useEffect(() => {
const checkSupport = async () => {
const modelToUse: WhisperWebGpuModel = 'tiny.en'; // Or any other model
const result = await canUseWhisperWebGpu(modelToUse);
setSupported(result.supported);
if (!result.supported) {
setReason(result.detailedReason ?? result.reason);
}
};

checkSupport();
}, []);

if (supported === null) {
return <p>Checking Whisper WASM support...</p>;
}

if (supported) {
return <p>Whisper WASM is supported!</p>;
}

return <p>Whisper WASM is not supported: {reason}</p>;
}
```

## Arguments

### `model`

The Whisper model intended to be used. This is used to check if there's enough storage space for the model.
Possible values: `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`.

Refer to the `WhisperWebGpuModel` type exported by the package for a comprehensive list.

## Return value

A `Promise` that resolves to a `CanUseWhisperWebGpuResult` object with the following properties:

### `supported`

A boolean indicating whether Whisper WASM can be used. `true` if supported, `false` otherwise.

### `reason?`

If `supported` is `false`, this field provides a brief, categorized reason for the lack of support.

Possible values values include:

- `window-undefined`: `window` object is not available.
- `not-cross-origin-isolated`: The page is not cross-origin isolated.
- `indexed-db-unavailable`: IndexedDB is not available.
- `navigator-storage-unavailable`: `navigator.storage.estimate()` API is not available.
- `quota-undefined`: Storage quota could not be determined.
- `usage-undefined`: Storage usage could not be determined.
- `not-enough-space`: Insufficient storage space for the specified model.
- `error-estimating-storage`: An error occurred while trying to estimate storage.

### `detailedReason?`

If `supported` is `false`, this field may contain a more detailed, human-readable explanation of why Whisper WASM is not supported.

## Important considerations

- **Cross-Origin Isolation:** For `SharedArrayBuffer` to work, which is required by `@remotion/whisper-webgpu`, the page must be served with specific HTTP headers:

- `Cross-Origin-Opener-Policy: same-origin`
- `Cross-Origin-Embedder-Policy: require-corp`
Ensure your server is configured to send these headers. See [MDN documentation on `crossOriginIsolated`](https://developer.mozilla.org/en-US/docs/Web/API/Window/crossOriginIsolated) for more details.

- **Browser Compatibility:** While this function checks for necessary APIs, always test on your target browsers as support for WebAssembly, IndexedDB, and storage estimation can vary.

## See also

- [Source code for `canUseWhisperWebGpu()`](https://github.com/remotion-dev/remotion/blob/main/packages/whisper-webgpu/src/can-use-whisper-webgpu.ts)
- [`@remotion/whisper-webgpu`](/docs/whisper-webgpu)
- [`transcribe()`](/docs/whisper-webgpu/transcribe)
- [`downloadWhisperModel()`](/docs/whisper-webgpu/download-whisper-model)
54 changes: 54 additions & 0 deletions packages/docs/docs/whisper-webgpu/download-whisper-model.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
image: /generated/articles-docs-whisper-webgpu-download-whisper-model.png
title: downloadWhisperModel()
crumb: '@remotion/whisper-webgpu'
---

# downloadWhisperModel()

:::warning
**Unstable API**: This package is experimental for the moment. As we test it, we might make a few changes to the API.
:::

Downloads a Whisper model into IndexedDB.

```tsx twoslash title="app.ts"
import {downloadWhisperModel} from '@remotion/whisper-webgpu';

const {alreadyDownloaded} = await downloadWhisperModel({
model: 'tiny.en',
onProgress: (progress) => {
console.log(progress);
},
});
```

## Options

### `model`

The model to download. Possible values: `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`.

### `onProgress?`

Act upon download progress. This is the function signature:

```tsx twoslash
import {DownloadWhisperModelOnProgress, DownloadWhisperModelProgress} from '@remotion/whisper-webgpu';

const onProgress: DownloadWhisperModelOnProgress = ({progress, totalBytes, downloadedBytes}: DownloadWhisperModelProgress) => {
console.log({progress, totalBytes, downloadedBytes});
};
```

## Return Value

Returns an object with the following properties:

- `alreadyDownloaded`: Whether the model has already been downloaded.

## See also

- [Source code for this function](https://github.com/remotion-dev/remotion/blob/main/packages/whisper-webgpu/src/download-whisper-model.ts)
- [`@remotion/whisper-webgpu`](/docs/whisper-webgpu)
- [`transcribe()`](/docs/whisper-webgpu/transcribe)
15 changes: 15 additions & 0 deletions packages/docs/docs/whisper-webgpu/get-loaded-models.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
image: /generated/articles-docs-whisper-webgpu-get-loaded-models.png
title: getLoadedModels()
crumb: '@remotion/whisper-webgpu'
---

:::warning
**Unstable API**: This package is experimental for the moment. As we test it, we might make a few changes to the API.
:::

Returns an array of Whisper models that have already been downloaded and stored in the browser's IndexedDB storage.

## Return value

Returns a Promise that resolves to an array of `WhisperWebGpuModel` strings, representing the models that are already downloaded and available locally.
77 changes: 77 additions & 0 deletions packages/docs/docs/whisper-webgpu/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
image: /generated/articles-docs-whisper-webgpu-index.png
title: '@remotion/whisper-webgpu'
crumb: 'Transcribe audio in browser'
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

:::warning
**Unstable API**: This package is experimental for the moment. As we test it, we might make a few changes to the API.
:::

Similar to [@remotion/install-whisper-cpp](/docs/install-whisper-cpp) but for the browser. Allows you to transcribe audio locally in the browser, with the help of WASM.

import {TableOfContents} from './whisper-webgpu';

<Installation pkg="@remotion/whisper-webgpu" />

## Example usage

:::info
This library is UI-agnostic and can be integrated with any frontend framework.
:::

```tsx twoslash
import {transcribe, canUseWhisperWebGpu, resampleTo16Khz, downloadWhisperModel} from '@remotion/whisper-webgpu';

// HTML:
// <input type="file" accept="audio/*" id="audio-input" />
// <p id="status"></p>

const input = document.getElementById('audio-input') as HTMLInputElement;

input.addEventListener('change', async (e) => {
const file = input.files?.[0];
if (!file) return;

console.log('Processing...');

const modelToUse = 'tiny.en';

const {supported, detailedReason} = await canUseWhisperWebGpu(modelToUse);
if (!supported) {
throw new Error(`Whisper WASM is not supported in this environment: ${detailedReason}`);
}

console.log('Downloading model...');
await downloadWhisperModel({
model: modelToUse,
onProgress: (p) => console.log(`Downloading model (${Math.round(p * 100)}%)...`),
});

console.log('Resampling audio...');
const channelWaveform = await resampleTo16Khz({
file,
onProgress: (p) => console.log(`Resampling audio (${Math.round(p * 100)}%)...`),
});

console.log('Transcribing...');
const {transcription} = await transcribe({
channelWaveform,
model: modelToUse,
onProgress: (p) => console.log(`Transcribing (${Math.round(p * 100)}%)...`),
});

console.log(transcription.map((t) => t.text).join(' '));
});
```

## Functions

<TableOfContents />

## License

MIT
50 changes: 50 additions & 0 deletions packages/docs/docs/whisper-webgpu/resample-to-16khz.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
image: /generated/articles-docs-whisper-webgpu-resample-to-16khz.png
title: resampleTo16Khz()
crumb: '@remotion/whisper-webgpu'
---

:::warning
**Unstable API**: This package is experimental for the moment. As we test it, we might make a few changes to the API.
:::

# resampleTo16Khz()

Processes an audio `File` or `Blob` by decoding it, converting it to mono, and resampling it to a 16kHz `Float32Array`. This prepares the audio data for use with the [`transcribe()`](/docs/whisper-webgpu/transcribe) function.

This function operates in a browser environment as it relies on the Web Audio API (`AudioContext`, `OfflineAudioContext`) and `FileReader`.

## Arguments

### `file`

The audio `File` or `Blob` object that you want to process. The function will attempt to decode the audio from common formats (e.g., WAV, MP3, Ogg) supported by the browser's Web Audio API.

### `onProgress?`

A callback function that receives progress updates during the resampling process. The `progress` value is a number between 0 and 1, where 0 indicates the start and 1 indicates completion.

### `logLevel?`

Default: `info`

**Type:** `'trace' | 'verbose' | 'info' | 'warn' | 'error'`

Optional. Determines the level of detail for logs printed to the console during the resampling process. Useful for debugging.

## Return value

`Promise<Float32Array>`

This array contains the raw audio waveform data for a single channel (mono), sampled at 16kHz. This output is ready to be passed to the `channelWaveform` argument of the [`transcribe()`](/docs/whisper-webgpu/transcribe) function.

## Behavior notes

- **Browser environment:** This function is intended for use in a browser environment due to its reliance on Web Audio APIs (`AudioContext`, `OfflineAudioContext`) and `FileReader`.
- **Audio decoding:** It uses the browser's built-in audio decoding capabilities. The range of supported audio formats may vary slightly between browsers.
- **Output format:** The output is always a mono `Float32Array` at 16kHz, regardless of the input file's original channel count or sample rate.

## See also

- [`transcribe()`](/docs/whisper-webgpu/transcribe)
- [`@remotion/whisper-webgpu`](/docs/whisper-webgpu)
Loading
Loading