Claude/update b8913 compatibility eq8 n8 by bernardladenthin · Pull Request #95 · bernardladenthin/java-llama.cpp

bernardladenthin · 2026-04-24T08:03:04Z

No description provided.

No breaking API changes for this project in this range. Changes applied: - CMakeLists.txt: bump GIT_TAG to b8913 - README.md: bump version badge to b8913 - CLAUDE.md: update pinned version; document b8887–b8913 changes - server.hpp: clamp n_discard to non-negative (mirrors server-task.cpp fix) Other upstream changes handled automatically (server-chat.cpp, server-common.cpp included directly from llama.cpp source): - convert_transcriptions_to_chatcmpl gained a tmpls parameter - parallel_tool_calls now defaults to model capability - normalize_anthropic_billing_header added for Claude Code prefix caching - chat_template_kwargs passed through in Anthropic→OAI conversion - common_chat_prompt_preset / common_chat_get_asr_prompt added (chat.h) - string_starts_with(char) overload added (common.h) - LLAMA_ROPE_TYPE_NONE added to mtmd rope-type switch https://claude.ai/code/session_01JkSSBJ1A1k9tXh54m4oP15

- logit_bias object format: add else-if branch handling {"tok": bias} map format; previously silently dropped if client sent an object instead of an array - ignore_eos EOG injection: inject -INF logit bias for every EOG token when ignore_eos=true, matching upstream behaviour (b8913 server-task.cpp) - antiprompt fallback: apply server-configured stop sequences when the request omits the "stop" field; previously defaults.antiprompt was never initialised from params_base https://claude.ai/code/session_01JkSSBJ1A1k9tXh54m4oP15

Port all remaining feature-parity items from upstream: defaults: - initialise defaults.n_predict and defaults.cache_prompt from params_base - use defaults.cache_prompt instead of hardcoded true basic params: - add max_completion_tokens as OAI alias for max_tokens / n_predict - add adaptive_target, adaptive_decay, backend_sampling sampling fields speculative: - seed speculative struct from defaults before per-field overrides - add speculative.type via common_speculative_type_from_name - add speculative.ngram_size_n/m and ngram_min_hits with clamping grammar: - inherit defaults.sampling.grammar before parsing request grammar - add grammar_type == "tool_calls" -> COMMON_GRAMMAR_TYPE_TOOL_CALLS - remove unnecessary inner block nesting chat format: - allow per-request reasoning_format override via common_reasoning_format_from_name - sync params.sampling.generation_prompt from oaicompat_chat_syntax - add chat_parser field via oaicompat_chat_syntax.parser.load reasoning budget: - add full reasoning_budget_tokens/start_tag/end_tag/message block https://claude.ai/code/session_01JkSSBJ1A1k9tXh54m4oP15

…upstream slot_params struct: - add include_usage, return_progress, n_cmpl, n_cache_reuse fields matching task_params in upstream server-task.h - fix typo 'mininum' -> 'minimum' in n_indent comment slot_params::to_json: - fix std::move on const ref in grammar_triggers loop (silently a copy; remove move) - add speculative.type, speculative.ngram_size_n/m, speculative.ngram_m_hits, backend_sampling to JSON output result_timings: - add cache_n field and emit it first in to_json(), matching upstream params_from_json_cmpl: - add defaults.n_cache_reuse = params_base.n_cache_reuse - add stream_options / include_usage parsing - add return_progress, n_cmpl (with "n" alias), n_cache_reuse parsing - align column spacing with upstream for readability - merge split TODO comment onto one line - add n_cmpl > n_parallel validation at end of function https://claude.ai/code/session_01JkSSBJ1A1k9tXh54m4oP15

- Add n_prompt_tokens_cache to server_slot (field, reset, capture at prompt processing, propagate to result structs and get_timings) - get_timings() now populates result_timings.cache_n from cache field - Add include_usage to server_task_result_cmpl_final; wire from slot params so streaming usage chunk is only sent when requested - Fix to_json_oaicompat_chat_stream(): usage now goes in a separate final chunk with empty choices per OpenAI spec, not in finish chunk - Add usage_json_oaicompat() helper to server_task_result_cmpl_final; includes prompt_tokens_details.cached_tokens in all OAI responses - n_prompt_tokens_cache also added to server_task_result_cmpl_partial https://claude.ai/code/session_01JkSSBJ1A1k9tXh54m4oP15

- Add result_prompt_progress struct (total, cache, processed, time_ms) with to_json() matching upstream server-task.h - Add is_progress and progress fields to server_task_result_cmpl_partial - Update to_json_non_oaicompat(), to_json_oaicompat(), and to_json_oaicompat_chat() in partial result to emit prompt_progress when is_progress is true - send_partial_response() gains is_progress parameter (default false); when true, populates progress fields and leaves content/tokens empty - Emit progress in the decode loop for SLOT_STATE_PROCESSING_PROMPT and SLOT_STATE_DONE_PROMPT slots when stream and return_progress are set https://claude.ai/code/session_01JkSSBJ1A1k9tXh54m4oP15

Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from github.com/kherud/java-llama.cpp and append a Verification plan section with: (a) a table of new info extracted from each issue body, (b) four concrete JUnit test sketches that would close out #80, #95, #98, #102, (c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the corresponding action (feature, docs, CI matrix), (d) a recommended PR sequencing. Notable finding: #98's original repro did not call enableEmbedding() at all — the binding never forwarded --embedding to the upstream server-context, so the result_output assertion fired because the embedding pipeline was never initialised. enableEmbedding() now exists in ModelParameters (line 1040), so the fix is essentially code-confirmed; an integration test against nomic-embed-text is optional confirmation.

) * Enrich open-issues baseline with current-fork status Appends a Status in fork subsection to each of the 37 upstream issues with a verdict, file:line evidence, and next steps; adds a Status overview table summarising verdicts across all issues. * Add deep-dive analysis for likely/partially fixed issues Appends a per-issue Deep-dive analysis block to each of the 9 LIKELY FIXED / PARTIALLY FIXED entries, and adds a top-level Deep-dive verdict guide categorising which issues are confirmable from code inspection, which need one targeted JUnit test, and which genuinely require platform-specific runtime reproduction. Updates the Status overview table for #121 (FIXED for 64-bit Android) and #86 (CUDA jar requires libcudart at runtime, not auto-fallback). * Add verification plan with original-issue research and test sketches Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from github.com/kherud/java-llama.cpp and append a Verification plan section with: (a) a table of new info extracted from each issue body, (b) four concrete JUnit test sketches that would close out #80, #95, #98, #102, (c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the corresponding action (feature, docs, CI matrix), (d) a recommended PR sequencing. Notable finding: #98's original repro did not call enableEmbedding() at all — the binding never forwarded --embedding to the upstream server-context, so the result_output assertion fired because the embedding pipeline was never initialised. enableEmbedding() now exists in ModelParameters (line 1040), so the fix is essentially code-confirmed; an integration test against nomic-embed-text is optional confirmation. --------- Co-authored-by: Claude <noreply@anthropic.com>

Updates docs/history/49be664_open_issues.md to reflect that the four JUnit regression tests called for in the verification plan have been added on this branch: - Deep-dive verdict guide now lists each test name and self-skip behaviour next to its issue bullet - Per-issue Status blocks for #80, #95, #98, #102 annotated as "LIKELY FIXED -> FIXED on CI green" with the covering test - Status overview table rows for the same four issues updated - "What the original issues actually contain" feasibility table marks all four as DONE with the commit reference - "Concrete test plan" gains a status callout noting the as-shipped implementation matches the sketches - "Recommended sequencing" step 1 marked DONE and enumerates what shipped; remaining steps (#86 docs, #103/#34 typed image API, Android emulator CI) carried forward as the next deliverables No code or behaviour change, documentation only. https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW

* test: add JUnit regressions for kherud open issues #80, #95, #98, #102 Adds four small JUnit tests proposed in the verification plan section of docs/history/49be664_open_issues.md to upgrade the corresponding upstream issues from LIKELY FIXED to FIXED: - MemoryManagementTest#testOpenCloseLoopDoesNotLeak (#102) - 20-iteration open/close loop; on Linux asserts VmRSS delta < 200 MB. Degenerates to a no-crash smoke test on non-Linux hosts where /proc/self/status is absent. - MemoryManagementTest#testOpenCloseWithoutGeneration (#80) - 20 open + immediate close without any generation, exercises the half-initialised worker race closed by the double server.terminate() in jllama.cpp. - LlamaModelTest#testIteratorTerminatesOnRepetitivePrompt (#95) - asserts the iterator terminates within nPredict+1 steps on a deliberately repetitive prompt. - LlamaEmbeddingsTest#testNomicEmbedLoads (#98) - gated on system property net.ladenthin.llama.nomic.path; reproduces the reporter's batch/ubatch config plus the fix (enableEmbedding()), and asserts a 768-dim vector for nomic-embed-text-v1.5. Wires up the optional nomic GGUF download in the linux-x86_64 Java test job in .github/workflows/publish.yml. Other test jobs cleanly self-skip via Assume because the system property is unset. Documents the local native-build workflow in CLAUDE.md - per-host output paths, mvn-cmake handoff, optional model handling, and the restricted-network caveat for environments that block huggingface.co. https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW * docs: record #80/#95/#98/#102 regression tests added in 713d426 Updates docs/history/49be664_open_issues.md to reflect that the four JUnit regression tests called for in the verification plan have been added on this branch: - Deep-dive verdict guide now lists each test name and self-skip behaviour next to its issue bullet - Per-issue Status blocks for #80, #95, #98, #102 annotated as "LIKELY FIXED -> FIXED on CI green" with the covering test - Status overview table rows for the same four issues updated - "What the original issues actually contain" feasibility table marks all four as DONE with the commit reference - "Concrete test plan" gains a status callout noting the as-shipped implementation matches the sketches - "Recommended sequencing" step 1 marked DONE and enumerates what shipped; remaining steps (#86 docs, #103/#34 typed image API, Android emulator CI) carried forward as the next deliverables No code or behaviour change, documentation only. https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW --------- Co-authored-by: Claude <noreply@anthropic.com>

* docs: mark #80/#95/#98/#102 as FIXED now that PR #185 is merged PR #185 (commit cba693c) merged the four regression tests sketched in the 49be664 open-issues verification plan. Update the per-issue blocks, the status overview table, the top-level deep-dive verdict guide, and the recommended-sequencing section to reflect that #80, #95, #98 and #102 are now FIXED (no longer "LIKELY FIXED → FIXED on CI green"). https://claude.ai/code/session_01R3jVWHsB3zymwAQtj8GT43 * docs: add README "Choosing the right classifier" section Closes the documentation gap for issue #86 (does the CUDA jar fall back to CPU?) and the 32-bit Android tail of #121 (armeabi-v7a not published). The new section enumerates the three published classifiers (default CPU, cuda13-linux-x86-64, opencl-android-aarch64), their backends, target platforms, and runtime requirements. It explicitly states that the CUDA JAR is CUDA-only at runtime — it dlopens libcudart.so.13/libcublas.so.13 and has no automatic CPU fallback — and that Android armeabi-v7a is not shipped as a released artifact. Updates docs/history/49be664_open_issues.md to mark #86 as FIXED-AS-DOCUMENTED and #121 as FIXED (64-bit) with the 32-bit limitation now documented. https://claude.ai/code/session_01R3jVWHsB3zymwAQtj8GT43 --------- Co-authored-by: Claude <noreply@anthropic.com>

claude added 6 commits April 24, 2026 07:11

bernardladenthin merged commit 22e7a38 into master Apr 24, 2026
16 checks passed

bernardladenthin deleted the claude/update-b8913-compatibility-Eq8N8 branch April 24, 2026 11:11

bernardladenthin mentioned this pull request May 22, 2026

docs: add deep-dive analysis and verification plan for open issues #184

Merged

6 tasks

bernardladenthin mentioned this pull request May 22, 2026

Add regression tests for issues #80, #95, #98, #102 #185

Merged

7 tasks

bernardladenthin mentioned this pull request May 22, 2026

Document classifier selection and mark 4 issues FIXED via PR #185 #186

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/update b8913 compatibility eq8 n8#95

Claude/update b8913 compatibility eq8 n8#95
bernardladenthin merged 6 commits into
masterfrom
claude/update-b8913-compatibility-Eq8N8

bernardladenthin commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernardladenthin commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants