Skip to content

Add langchain4j-jllama module: in-process LangChain4j adapters#284

Merged
bernardladenthin merged 1 commit into
bernardladenthin:mainfrom
vaiju1981:langchain4j-jllama
Jul 1, 2026
Merged

Add langchain4j-jllama module: in-process LangChain4j adapters#284
bernardladenthin merged 1 commit into
bernardladenthin:mainfrom
vaiju1981:langchain4j-jllama

Conversation

@vaiju1981

Copy link
Copy Markdown

Summary

Adds a new sibling Maven module langchain4j-jllama that adapts a java-llama.cpp LlamaModel to
LangChain4j's model interfaces in-process over JNI — no HTTP hop, no separate llama-server:

Adapter LangChain4j interface Backing call
JllamaChatModel ChatModel LlamaModel.chat(...)
JllamaStreamingChatModel StreamingChatModel LlamaModel.generateChat(...) (token streaming)
JllamaEmbeddingModel EmbeddingModel LlamaModel.embed(...)
JllamaScoringModel ScoringModel (re-rank) LlamaModel.handleRerank(...)

Design decisions

  • Separate artifact, one-way dependency. The module depends on langchain4j-core:1.17.1, but the
    core net.ladenthin:llama binding gains no langchain4j dependency — plain java-llama.cpp users
    never pull langchain4j (or its Java 17 floor) transitively.
  • Sibling module, not in the root reactor. The native build/release pipeline is left completely
    untouched; the module builds independently against the published core jar. Targets Java 17
    (langchain4j 1.x baseline; the core stays Java 8).
  • Borrowed model lifecycle. Every adapter wraps a caller-owned LlamaModel and never loads or
    closes it. One model can back several adapters.
  • For users who already run the OpenAI-compatible OpenAiCompatServer, langchain4j's
    langchain4j-open-ai client already works over HTTP with zero code. This module is for the
    in-process path (desktop / Android / embedded, no socket).

Testing

  • Pure message/parameter/response transforms are unit-tested model-free (LangChain4jMappingTest,
    7 tests): role mapping, multimodal→text flattening, sampling-parameter pass-through, finish-reason
    mapping, and rerank index alignment.
  • JllamaChatModelIntegrationTest runs a real chat + streaming round-trip and self-skips unless
    -Dnet.ladenthin.llama.model.path=... points at a GGUF (mirrors the existing model-gated tests).
  • mvn test: 7 run, 2 skipped, green.

Adapter contracts were verified against langchain4j 1.17.1 source: doChat is the correct override
point, the chat response reports the model's real finish reason (stop/length/tool_calls) and
token usage, and streaming reports failures via onError (the framework does not wrap doChat).

Not yet mapped (documented in the module README)

  • Tool calling (ToolSpecification ↔ jllama ToolDefinition) — the main follow-up.
  • response_format (JSON mode); multimodal user input is flattened to text.

Notes for review

  • CI is not wired yet. The module isn't part of the reactor, so no existing job builds it; the
    tests above were run locally. Happy to add a small job (mvn -DskipTests install for the core →
    cd langchain4j-jllama && mvn test) if you'd like it gated in CI.
  • Placement. It is deliberately standalone, so if you'd prefer this live as a separate repository
    rather than in-tree, that's a clean move — let me know your preference.
  • REUSE: all sources carry SPDX headers; the README is registered in REUSE.toml.

Introduce a separate Maven artifact that adapts a java-llama.cpp LlamaModel
to LangChain4j's model interfaces over JNI, with no HTTP hop:

- JllamaChatModel          -> ChatModel
- JllamaStreamingChatModel -> StreamingChatModel (token streaming)
- JllamaEmbeddingModel     -> EmbeddingModel
- JllamaScoringModel       -> ScoringModel (rerank; scores aligned by input index)

The adapters borrow a caller-owned LlamaModel and never close it. The module
depends on langchain4j-core 1.17.1, but the core net.ladenthin:llama binding
gains no langchain4j dependency, so plain users never pull it transitively.

It is kept as a sibling module (not part of the root reactor) so the native
build and release pipeline stay untouched, and it targets Java 17 to match the
langchain4j 1.x baseline.

The pure message/parameter/response transforms are unit-tested model-free; an
end-to-end chat and streaming test self-skips when no GGUF is provided. The
module README documents usage and the currently unmapped surfaces (tool
calling, multimodal user input).
@vaiju1981

Copy link
Copy Markdown
Author

This PR is just for me to integrate Jllama as native model within Langchain4j. This PR might mean a separate project.

@bernardladenthin

Copy link
Copy Markdown
Owner

Hey @vaiju1981 ,

give me some minutes to integrate it well, then maybe do additional work. Thanks and bests!

@bernardladenthin bernardladenthin merged commit b5ee309 into bernardladenthin:main Jul 1, 2026
40 of 44 checks passed
vaiju1981 pushed a commit to vaiju1981/java-llama.cpp that referenced this pull request Jul 1, 2026
…Central publish

Cleans up the integration of the merged langchain4j adapters (PR bernardladenthin#284) so the
module is built, gated, version-locked and releasable — without touching the
native build/release pipeline.

- Rename artifact + directory langchain4j-jllama -> llama-langchain4j so it
  groups with the core net.ladenthin:llama family (Java package unchanged).
- Pin the core dependency to ${project.version} (drops the drift-prone
  jllama.version property); a CI guard fails the build if the module version
  ever diverges from the core version (standalone module can't inherit it from
  a reactor).
- Add per-artifact release plumbing (sources + javadoc + gpg + Central
  Publishing) mirroring the core release profile, so the module can deploy to
  Maven Central at the same version.
- publish.yml: new test-java-llama-langchain4j job (install core Java jar,
  version-lockstep guard, mvn verify — builds the javadoc jar so a release-time
  javadoc break is caught in PR CI). publish-snapshot/publish-release now
  depend on it and deploy the module alongside the core.
- REUSE.toml + README updated to the new name; CLAUDE.md documents the module,
  why it is a separate artifact (not a classifier), and the CI/publish wiring.

Verified locally: core Java jar installs, module builds green (7 mapping tests
pass, 2 model-backed integration tests self-skip), and the main/sources/javadoc
jars all build under doclint=all.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Rt1paYztGJ2AKUuBuAGDXE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants