llama-cpp-jna

Java Native Access (JNA) wrapper for llama.cpp, providing Java bindings to run Large Language Models locally with high performance.

Features

Direct JNA bindings to llama.cpp native libraries
Multi-module Maven structure with Java 8 compatibility
CUDA acceleration support for GPU inference
Cross-platform compatibility (Windows, Linux, macOS)
High-level and low-level API options for different use cases
Example implementations including SimpleChat interactive demo

Quick Start

Prerequisites

JDK 25+ - Download from https://jdk.java.net/25/
Maven 3.6+ - For building the project
Git - For cloning the repository

Installation

Clone the repository:

git clone https://github.com/your-org/llama-cpp-jna.git
cd llama-cpp-jna

Download llama.cpp binaries from https://github.com/ggml-org/llama.cpp/releases/tag/b6527
Setup binaries (see Binary Setup section below)
Download a model (see Model Setup section below)

Run the example:

run-simple-chat.cmd          # Windows
./run-simple-chat.sh         # Linux/macOS (coming soon)

Binary Setup

Basic Setup (CPU Only)

Extract the llama.cpp binaries to C:\opt\llama.cpp-b6527-bin (Windows) or /opt/llama.cpp-b6527-bin (Linux/macOS).

CUDA Setup (GPU Acceleration)

For CUDA acceleration support, you need files from both archives:

Download and extract llama-b6527-bin-win-cuda-12.4-x64.zip to C:\opt\llama.cpp-b6527-bin\
Download and extract cudart-llama-bin-win-cuda-12.4-x64.zip and copy these CUDA runtime files to the same directory:
- cublas64_12.dll
- cublasLt64_12.dll
- cudart64_12.dll

Important: Both archives must be extracted to the same directory for CUDA compatibility.

Model Setup

Download Models

Visit the GGML Models collection for available models.

Example - Qwen3 8B Model:

Go to https://huggingface.co/ggml-org/Qwen3-8B-GGUF
Download Qwen3-8B-Q8_0.gguf
Save to C:\opt\models\Qwen3-8B-Q8_0.gguf (Windows) or /opt/models/Qwen3-8B-Q8_0.gguf (Linux/macOS)

Running Examples

Command Line (Recommended)

Windows:

# Option 1: Direct execution (compiles and runs)
run-simple-chat.cmd

# Option 2: Using Maven
run-simple-chat-with-maven.cmd

Linux/macOS:

# Coming soon - bash scripts in development
./run-simple-chat.sh

IDE Setup (IntelliJ IDEA)

Configure environment variables in llama-cpp-bin.env:

PATH=%PATH%;C:\opt\llama.cpp-b6527-bin
GGML_BACKEND_PATH=C:\opt\llama.cpp-b6527-bin

Create run configuration:
- Name: SimpleChat
- Main class: com.quasarbyte.llama.cpp.jna.examples.simplechat.SimpleChat
- Module: examples
- Program arguments: -m C:\opt\models\Qwen3-8B-Q8_0.gguf -c 32768 -ngl 100
- Working directory: Project root
- Environment variables: Import from llama-cpp-bin.env

Command Line Arguments

Flag	Description	Example
`-m`	Path to GGUF model file	`-m C:\opt\models\Qwen3-8B-Q8_0.gguf`
`-c`	Context length (tokens)	`-c 32768`
`-ngl`	GPU layers (0 for CPU-only)	`-ngl 100`

Project Structure

llama-cpp-jna/
├── core/                           # Main JNA library bindings
│   └── src/main/java/com/quasarbyte/llama/cpp/jna/
│       ├── library/declaration/    # Native library interfaces
│       │   ├── llama/             # Core llama.cpp bindings
│       │   ├── ggml/              # GGML backend bindings
│       │   └── cuda/              # CUDA acceleration bindings
│       ├── bindings/              # High-level bindings layer
│       └── model/                 # Data models and DTOs
├── examples/                       # Usage examples
│   └── src/main/java/com/quasarbyte/llama/cpp/jna/examples/
│       ├── simple/                # Basic usage
│       ├── simplechat/            # Interactive chat
│       └── cuda/                  # CUDA utilities
├── run-simple-chat.cmd            # Windows execution script
├── run-simple-chat-with-maven.cmd # Windows Maven execution
└── llama-cpp-bin.env             # Environment configuration

Building from Source

# Full build with tests
mvn clean install

# Quick build (skip tests)
mvn clean install -DskipTests

# Build specific module
mvn clean install -pl core

# Copy dependencies for examples
mvn dependency:copy-dependencies -DoutputDirectory=examples/target/lib -pl examples

Windows Compatibility Notes

The prebuilt Windows binaries for llama.cpp (build b6527) are linked against the latest Microsoft Visual C++ Redistributable. When launching through the JVM, the Java distribution may bring its own MSVC runtime copy:

JDK 25+: Ships compatible DLLs that work without changes
JDK 8–24: Bundle older runtime versions that can cause native loading errors

Troubleshooting Runtime Issues

If using JDK 8–24, either:

Upgrade to JDK 25+ (recommended)
Remove/rename bundled MSVC runtime DLLs from <java.home>/bin
Ensure matching Visual C++ Redistributable is installed globally

Common failure pattern:

llama.dll
├── ggml-cuda.dll
│   ├── cudart64_12.dll, nvcuda.dll, cublas64_12.dll, cublasLt64_12.dll
│   ├── vcruntime140.dll   (from JDK bin - causes conflict)
│   └── msvcp140.dll       (from JDK bin - causes conflict)

Helpful Links

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Support

Issues: Report bugs and request features on GitHub Issues
Discussions: Join the community on GitHub Discussions
Email: hello@quasarbyte.com
LinkedIn: Connect with the author
Business inquiries: https://quasarbyte.com/contact.php

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.mvn		.mvn
core		core
examples		examples
logs		logs
shell/dumpbin		shell/dumpbin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llama-cpp-bin.env		llama-cpp-bin.env
llama-cpp-developer-guide.md		llama-cpp-developer-guide.md
pom.xml		pom.xml
run-simple-chat-with-maven.cmd		run-simple-chat-with-maven.cmd
run-simple-chat.cmd		run-simple-chat.cmd
windows-compatibility-notes.md		windows-compatibility-notes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llama-cpp-jna

Features

Quick Start

Prerequisites

Installation

Binary Setup

Basic Setup (CPU Only)

CUDA Setup (GPU Acceleration)

Model Setup

Download Models

Running Examples

Command Line (Recommended)

IDE Setup (IntelliJ IDEA)

Command Line Arguments

Project Structure

Building from Source

Windows Compatibility Notes

Troubleshooting Runtime Issues

Helpful Links

Contributing

License

Support

About

Uh oh!

Releases

Packages

Languages

License

QuasarByte/llama-cpp-jna

Folders and files

Latest commit

History

Repository files navigation

llama-cpp-jna

Features

Quick Start

Prerequisites

Installation

Binary Setup

Basic Setup (CPU Only)

CUDA Setup (GPU Acceleration)

Model Setup

Download Models

Running Examples

Command Line (Recommended)

IDE Setup (IntelliJ IDEA)

Command Line Arguments

Project Structure

Building from Source

Windows Compatibility Notes

Troubleshooting Runtime Issues

Helpful Links

Contributing

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages