Conversation
|
The forward pass is taking a very long time. The generation speed to generate a token for Bark-small is 10x longer than when compiled in C++ (5ms vs. 65ms for the semantic encoder for instance). I passed the @ggerganov Have you observed something similar with Whisper? Would you happen to have any ideas on speeding up the inference pass? |
|
There are Overall, WASM performance is not great. For example, I compare whisper WHISPER_NO_METAL=1 WHISPER_NO_ACCELERATE=1 make -j
./bench -m models/ggml-small.bin -t 8
whisper_print_timings: encode time = 1255.23 ms / 1 runs ( 1255.23 ms per run)The web-version is here: https://whisper.ggerganov.com/bench/ |
|
@ggerganov Understood. So using Bark with WASM is probably not the right idea as the model is still very computationally intensive. I'll focus on supporting Metal and cuBLAS then. Thanks! |
Close #154