Java Native Access (JNA) wrapper for llama.cpp, providing Java bindings to run Large Language Models locally with high performance.
- Direct JNA bindings to llama.cpp native libraries
- Multi-module Maven structure with Java 8 compatibility
- CUDA acceleration support for GPU inference
- Cross-platform compatibility (Windows, Linux, macOS)
- High-level and low-level API options for different use cases
- Example implementations including SimpleChat interactive demo
- JDK 25+ - Download from https://jdk.java.net/25/
- Maven 3.6+ - For building the project
- Git - For cloning the repository
-
Clone the repository:
git clone https://github.com/your-org/llama-cpp-jna.git cd llama-cpp-jna
-
Download llama.cpp binaries from https://github.com/ggml-org/llama.cpp/releases/tag/b6527
-
Setup binaries (see Binary Setup section below)
-
Download a model (see Model Setup section below)
-
Run the example:
run-simple-chat.cmd # Windows ./run-simple-chat.sh # Linux/macOS (coming soon)
Extract the llama.cpp binaries to C:\opt\llama.cpp-b6527-bin
(Windows) or /opt/llama.cpp-b6527-bin
(Linux/macOS).
For CUDA acceleration support, you need files from both archives:
- Download and extract
llama-b6527-bin-win-cuda-12.4-x64.zip
toC:\opt\llama.cpp-b6527-bin\
- Download and extract
cudart-llama-bin-win-cuda-12.4-x64.zip
and copy these CUDA runtime files to the same directory:cublas64_12.dll
cublasLt64_12.dll
cudart64_12.dll
Important: Both archives must be extracted to the same directory for CUDA compatibility.
Visit the GGML Models collection for available models.
Example - Qwen3 8B Model:
- Go to https://huggingface.co/ggml-org/Qwen3-8B-GGUF
- Download
Qwen3-8B-Q8_0.gguf
- Save to
C:\opt\models\Qwen3-8B-Q8_0.gguf
(Windows) or/opt/models/Qwen3-8B-Q8_0.gguf
(Linux/macOS)
Windows:
# Option 1: Direct execution (compiles and runs)
run-simple-chat.cmd
# Option 2: Using Maven
run-simple-chat-with-maven.cmd
Linux/macOS:
# Coming soon - bash scripts in development
./run-simple-chat.sh
-
Configure environment variables in
llama-cpp-bin.env
:PATH=%PATH%;C:\opt\llama.cpp-b6527-bin GGML_BACKEND_PATH=C:\opt\llama.cpp-b6527-bin
-
Create run configuration:
- Name:
SimpleChat
- Main class:
com.quasarbyte.llama.cpp.jna.examples.simplechat.SimpleChat
- Module:
examples
- Program arguments:
-m C:\opt\models\Qwen3-8B-Q8_0.gguf -c 32768 -ngl 100
- Working directory: Project root
- Environment variables: Import from
llama-cpp-bin.env
- Name:
Flag | Description | Example |
---|---|---|
-m |
Path to GGUF model file | -m C:\opt\models\Qwen3-8B-Q8_0.gguf |
-c |
Context length (tokens) | -c 32768 |
-ngl |
GPU layers (0 for CPU-only) | -ngl 100 |
llama-cpp-jna/
├── core/ # Main JNA library bindings
│ └── src/main/java/com/quasarbyte/llama/cpp/jna/
│ ├── library/declaration/ # Native library interfaces
│ │ ├── llama/ # Core llama.cpp bindings
│ │ ├── ggml/ # GGML backend bindings
│ │ └── cuda/ # CUDA acceleration bindings
│ ├── bindings/ # High-level bindings layer
│ └── model/ # Data models and DTOs
├── examples/ # Usage examples
│ └── src/main/java/com/quasarbyte/llama/cpp/jna/examples/
│ ├── simple/ # Basic usage
│ ├── simplechat/ # Interactive chat
│ └── cuda/ # CUDA utilities
├── run-simple-chat.cmd # Windows execution script
├── run-simple-chat-with-maven.cmd # Windows Maven execution
└── llama-cpp-bin.env # Environment configuration
# Full build with tests
mvn clean install
# Quick build (skip tests)
mvn clean install -DskipTests
# Build specific module
mvn clean install -pl core
# Copy dependencies for examples
mvn dependency:copy-dependencies -DoutputDirectory=examples/target/lib -pl examples
The prebuilt Windows binaries for llama.cpp
(build b6527
) are linked against the latest Microsoft Visual C++ Redistributable. When launching through the JVM, the Java distribution may bring its own MSVC runtime copy:
- JDK 25+: Ships compatible DLLs that work without changes
- JDK 8–24: Bundle older runtime versions that can cause native loading errors
If using JDK 8–24, either:
- Upgrade to JDK 25+ (recommended)
- Remove/rename bundled MSVC runtime DLLs from
<java.home>/bin
- Ensure matching Visual C++ Redistributable is installed globally
Common failure pattern:
llama.dll
├── ggml-cuda.dll
│ ├── cudart64_12.dll, nvcuda.dll, cublas64_12.dll, cublasLt64_12.dll
│ ├── vcruntime140.dll (from JDK bin - causes conflict)
│ └── msvcp140.dll (from JDK bin - causes conflict)
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Issues: Report bugs and request features on GitHub Issues
- Discussions: Join the community on GitHub Discussions
- Email: hello@quasarbyte.com
- LinkedIn: Connect with the author
- Business inquiries: https://quasarbyte.com/contact.php