Skip to content

example : add WASM example#155

Merged
PABannier merged 17 commits intomainfrom
emscripten
Apr 26, 2024
Merged

example : add WASM example#155
PABannier merged 17 commits intomainfrom
emscripten

Conversation

@PABannier
Copy link
Owner

Close #154

@PABannier PABannier marked this pull request as ready for review April 23, 2024 22:38
@PABannier
Copy link
Owner Author

PABannier commented Apr 25, 2024

The forward pass is taking a very long time. The generation speed to generate a token for Bark-small is 10x longer than when compiled in C++ (5ms vs. 65ms for the semantic encoder for instance). I passed the -O3 flag when compiling with Emscripten. I expected the WASM version to be slower than C++, but not by a factor of 10/20x.

@ggerganov Have you observed something similar with Whisper? Would you happen to have any ideas on speeding up the inference pass?

@ggerganov
Copy link
Contributor

There are WASM_SIMD implementations only for the matrix multiplication op. If there are other ops in Bark that require significant compute, they might become bottleneck when using WASM

Overall, WASM performance is not great. For example, I compare whisper small encoder on my M1 Pro, using only CPU (i.e. without Metal and without Accelerate CBLAS) and the C++ version is 10x faster than the web-version:

WHISPER_NO_METAL=1 WHISPER_NO_ACCELERATE=1 make -j
./bench -m models/ggml-small.bin -t 8

whisper_print_timings:   encode time =  1255.23 ms /     1 runs ( 1255.23 ms per run)

The web-version is here: https://whisper.ggerganov.com/bench/

whisper_print_timings:   encode time = 13947.06 ms /     1 runs (13947.06 ms per run)

@PABannier
Copy link
Owner Author

@ggerganov Understood. So using Bark with WASM is probably not the right idea as the model is still very computationally intensive. I'll focus on supporting Metal and cuBLAS then. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make a WASM demo to demonstrate bark.cpp efficiency

2 participants