Add langchain4j-jllama module: in-process LangChain4j adapters by vaiju1981 · Pull Request #284 · bernardladenthin/java-llama.cpp

vaiju1981 · 2026-07-01T00:35:40Z

Summary

Adds a new sibling Maven module langchain4j-jllama that adapts a java-llama.cpp LlamaModel to
LangChain4j's model interfaces in-process over JNI — no HTTP hop, no separate llama-server:

Adapter	LangChain4j interface	Backing call
`JllamaChatModel`	`ChatModel`	`LlamaModel.chat(...)`
`JllamaStreamingChatModel`	`StreamingChatModel`	`LlamaModel.generateChat(...)` (token streaming)
`JllamaEmbeddingModel`	`EmbeddingModel`	`LlamaModel.embed(...)`
`JllamaScoringModel`	`ScoringModel` (re-rank)	`LlamaModel.handleRerank(...)`

Design decisions

Separate artifact, one-way dependency. The module depends on langchain4j-core:1.17.1, but the
core net.ladenthin:llama binding gains no langchain4j dependency — plain java-llama.cpp users
never pull langchain4j (or its Java 17 floor) transitively.
Sibling module, not in the root reactor. The native build/release pipeline is left completely
untouched; the module builds independently against the published core jar. Targets Java 17
(langchain4j 1.x baseline; the core stays Java 8).
Borrowed model lifecycle. Every adapter wraps a caller-owned LlamaModel and never loads or
closes it. One model can back several adapters.
For users who already run the OpenAI-compatible OpenAiCompatServer, langchain4j's
langchain4j-open-ai client already works over HTTP with zero code. This module is for the
in-process path (desktop / Android / embedded, no socket).

Testing

Pure message/parameter/response transforms are unit-tested model-free (LangChain4jMappingTest,
7 tests): role mapping, multimodal→text flattening, sampling-parameter pass-through, finish-reason
mapping, and rerank index alignment.
JllamaChatModelIntegrationTest runs a real chat + streaming round-trip and self-skips unless
-Dnet.ladenthin.llama.model.path=... points at a GGUF (mirrors the existing model-gated tests).
mvn test: 7 run, 2 skipped, green.

Adapter contracts were verified against langchain4j 1.17.1 source: doChat is the correct override
point, the chat response reports the model's real finish reason (stop/length/tool_calls) and
token usage, and streaming reports failures via onError (the framework does not wrap doChat).

Not yet mapped (documented in the module README)

Tool calling (ToolSpecification ↔ jllama ToolDefinition) — the main follow-up.
response_format (JSON mode); multimodal user input is flattened to text.

Notes for review

CI is not wired yet. The module isn't part of the reactor, so no existing job builds it; the
tests above were run locally. Happy to add a small job (mvn -DskipTests install for the core →
cd langchain4j-jllama && mvn test) if you'd like it gated in CI.
Placement. It is deliberately standalone, so if you'd prefer this live as a separate repository
rather than in-tree, that's a clean move — let me know your preference.
REUSE: all sources carry SPDX headers; the README is registered in REUSE.toml.

Introduce a separate Maven artifact that adapts a java-llama.cpp LlamaModel to LangChain4j's model interfaces over JNI, with no HTTP hop: - JllamaChatModel -> ChatModel - JllamaStreamingChatModel -> StreamingChatModel (token streaming) - JllamaEmbeddingModel -> EmbeddingModel - JllamaScoringModel -> ScoringModel (rerank; scores aligned by input index) The adapters borrow a caller-owned LlamaModel and never close it. The module depends on langchain4j-core 1.17.1, but the core net.ladenthin:llama binding gains no langchain4j dependency, so plain users never pull it transitively. It is kept as a sibling module (not part of the root reactor) so the native build and release pipeline stay untouched, and it targets Java 17 to match the langchain4j 1.x baseline. The pure message/parameter/response transforms are unit-tested model-free; an end-to-end chat and streaming test self-skips when no GGUF is provided. The module README documents usage and the currently unmapped surfaces (tool calling, multimodal user input).

vaiju1981 · 2026-07-01T00:41:57Z

This PR is just for me to integrate Jllama as native model within Langchain4j. This PR might mean a separate project.

bernardladenthin · 2026-07-01T10:53:11Z

Hey @vaiju1981 ,

give me some minutes to integrate it well, then maybe do additional work. Thanks and bests!

…Central publish Cleans up the integration of the merged langchain4j adapters (PR bernardladenthin#284) so the module is built, gated, version-locked and releasable — without touching the native build/release pipeline. - Rename artifact + directory langchain4j-jllama -> llama-langchain4j so it groups with the core net.ladenthin:llama family (Java package unchanged). - Pin the core dependency to ${project.version} (drops the drift-prone jllama.version property); a CI guard fails the build if the module version ever diverges from the core version (standalone module can't inherit it from a reactor). - Add per-artifact release plumbing (sources + javadoc + gpg + Central Publishing) mirroring the core release profile, so the module can deploy to Maven Central at the same version. - publish.yml: new test-java-llama-langchain4j job (install core Java jar, version-lockstep guard, mvn verify — builds the javadoc jar so a release-time javadoc break is caught in PR CI). publish-snapshot/publish-release now depend on it and deploy the module alongside the core. - REUSE.toml + README updated to the new name; CLAUDE.md documents the module, why it is a separate artifact (not a classifier), and the CI/publish wiring. Verified locally: core Java jar installs, module builds green (7 mapping tests pass, 2 model-backed integration tests self-skip), and the main/sources/javadoc jars all build under doclint=all. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Rt1paYztGJ2AKUuBuAGDXE

vaiju1981 requested a review from bernardladenthin as a code owner July 1, 2026 00:35

vaiju1981 temporarily deployed to startgate July 1, 2026 00:35 — with GitHub Actions Inactive

bernardladenthin merged commit b5ee309 into bernardladenthin:main Jul 1, 2026
40 of 44 checks passed

bernardladenthin mentioned this pull request Jul 1, 2026

feat: llama-langchain4j — rename, CI build/test + Central publish, model-backed tests, upfront model cache #285

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add langchain4j-jllama module: in-process LangChain4j adapters#284

Add langchain4j-jllama module: in-process LangChain4j adapters#284
bernardladenthin merged 1 commit into
bernardladenthin:mainfrom
vaiju1981:langchain4j-jllama

vaiju1981 commented Jul 1, 2026

Uh oh!

vaiju1981 commented Jul 1, 2026

Uh oh!

bernardladenthin commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vaiju1981 commented Jul 1, 2026

Summary

Design decisions

Testing

Not yet mapped (documented in the module README)

Notes for review

Uh oh!

vaiju1981 commented Jul 1, 2026

Uh oh!

bernardladenthin commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants