Library to run inference of Whisper v3 in Java using DJL. This implementation is based on the huggingface Python implementation of Whisper v3 large.
Currently only runs on GPU.
The library has the ability to run inference on the GPU in Java out of the box.
Alternatives:
- whisper.cpp to run Whisper with C++
- whisper-jni (a JNI wrapper for whisper.cpp)
First, follow the installation instructions for the DJL PyTorch engine.
For GPU support, you also need to ensure CUDA is installed on your system and included in the path. You will need a CUDA version that matches the PyTorch version of your chosen DJL PyTorch engine. To see which DJL PyTorch engine version supports which PyTorch library version, see here.
Add the following to your pom file:
<!--
Note:
Maven central has file size restrictions, therefore the libs are currently
on the DIVISIO repository (because of the large whisper-model dependency).
-->
<repositories>
<repository>
<id>DIVISIO</id>
<url>https://mvn.divis.io/</url>
</repository>
</repositories>
<!-- Whisper library -->
<dependency>
<groupId>divisio</groupId>
<artifactId>whisper-java</artifactId>
<version>0.1</version>
</dependency>
<!-- Large Whisper model file (might take a bit to download, around 3.3GB) -->
<dependency>
<groupId>divisio</groupId>
<artifactId>whisper-model</artifactId>
<version>0.1</version>
</dependency>
<!-- You will also need a matching DJL PyTorch engine for your system -->
<dependency>
<groupId>ai.djl.pytorch</groupId>
<artifactId>pytorch-native-cu121</artifactId>
<version>2.1.1</version>
<scope>runtime</scope>
</dependency>
Create a demo class like this:
package divisio.whisper;
import divisio.whisper.token.Whisper3Language;
import divisio.whisper.token.WhisperToken;
import java.util.Arrays;
public class WhisperDemo {
public static void main(String[] args) throws Exception {
final String filePath = args[0];
try (Whisper3 whisper = Whisper3.instance()) {
WhisperResult result = whisper.task()
.language(Whisper3Language.AUTO)
.transcribe(filePath)
.withTimestamps()
.execute();
System.out.println("raw token ids: " + Arrays.toString(result.tokens().stream().mapToLong(WhisperToken::getTokenId).toArray()));
System.out.println("raw text: " + result.rawText());
System.out.println("clean: " + result.text());
}
}
}
And start the program with a parameter pointing to an audio file like /path/to/my_audio_file.wav
.
Initiating Whisper is expensive, so instances should be reused, e.g. by instantiating them as a spring bean singleton. Additionally, the first tasks might take a little bit longer than usual, due to internal warm-ups.
This work is based upon the huggingface version of whisper3 (https://huggingface.co/openai/whisper-large-v3/blob/main/README.md) by OpenAI. It is a traced version of that model, all JAVA code has been rewritten from scratch. We used the original Python code as a reference.
This library is licensed under the Apache 2.0 license (see LICENSE).