Skip to content

Add per-session thread-safety and streaming guard to Session#189

Merged
bernardladenthin merged 16 commits into
mainfrom
claude/stoic-archimedes-brMrb
May 24, 2026
Merged

Add per-session thread-safety and streaming guard to Session#189
bernardladenthin merged 16 commits into
mainfrom
claude/stoic-archimedes-brMrb

Conversation

@bernardladenthin
Copy link
Copy Markdown
Owner

Summary

  • Implement per-session thread-safety by serializing all public Session methods on a private intrinsic lock, ensuring concurrent send(String) calls from multiple threads produce a well-formed transcript with strict user/assistant alternation.
  • Add a "streaming in progress" guard flag that causes send(String), a second stream(String), save(String), and restore(String) to fail-fast with IllegalStateException until commitStreamedReply(String) clears it.
  • Add comprehensive test coverage via SessionConcurrencyTest with four test cases: concurrent sends, stream guard enforcement, commit-without-stream validation, and sequential send sanity check.

Test plan

  • SessionConcurrencyTest added with 4 test cases covering concurrent sends, stream guard behavior, and error conditions
  • All public Session methods now synchronized; existing single-threaded usage remains compatible
  • CI should pass with new tests included

Related issues / PRs

Closes the per-session locking follow-up from PR #188 (§2.6 of the llama-stack-client-kotlin investigation).

Checklist

  • I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
  • My commits follow Conventional Commits
  • No security-sensitive changes

https://claude.ai/code/session_01FLFea4iGrQU1VutGtLspWW

Every public Session method now serializes on a private intrinsic lock,
and stream(...) sets a streaming-in-progress guard that causes send,
a second stream, save, and restore to fail-fast with IllegalStateException
until commitStreamedReply(...) clears it. send() and stream() roll back the
just-appended user turn if the underlying model call throws, so the
transcript stays in a consistent user/assistant-paired state.

Covered by new SessionConcurrencyTest: concurrent send() alternation,
stream guard, commit-without-stream guard, and sequential sanity check.
The previous ctxSize=512 was too tight for the concurrent-send test:
4 threads x 3 calls accumulate into a single Session transcript and
each send() resubmits the full history, so by ~call 10 the prompt
exceeds 512 tokens and the native side throws LlamaException. Bumping
to 4096 (well under the 16384 trained ctx) keeps the whole 12-turn
transcript well within bounds.
PR #188 added loadModelWithProgress in Java (LlamaModel.java:380) and a
matching JNIEXPORT definition in jllama.cpp (line 747), but forgot to
add the forward declaration to the legacy hand-rolled jllama.h. The
generated header net_ladenthin_llama_LlamaModel.h has it, but jllama.cpp
only #includes jllama.h. Without the prior 'extern "C"' declaration in
scope, the C++ compiler treats the function definition as a normal C++
function and produces a mangled symbol:

  _Z57Java_net_ladenthin_llama_LlamaModel_loadModelWithProgress...

while the JVM looks up the unmangled name and throws

  UnsatisfiedLinkError: 'void net.ladenthin.llama.LlamaModel
    .loadModelWithProgress(java.lang.String[],
                           net.ladenthin.llama.LoadProgressCallback)'

at runtime. Adding the declaration to jllama.h restores C linkage. The
plain loadModel symbol was unaffected because its declaration was
already present.

Verified locally with nm -D libjllama.so:
  before: _Z57Java_..._loadModelWithProgress... (mangled)
  after:  Java_..._loadModelWithProgress (plain C)

This unblocks LoadProgressCallbackTest on all platforms.
… M follow-up)

CancellationToken.cancel() now posts a SERVER_TASK_TYPE_CANCEL message
directly to the upstream server_queue whenever the token is bound to a
live native task, so blocking complete() returns sub-token instead of
waiting for the next token boundary.

Design notes
- New JNI method Java_net_ladenthin_llama_LlamaModel_queueCancel calls
  server_response_reader::stop() (which itself enqueues the CANCEL on
  the mutex-locked server_queue) and DOES NOT erase the reader from
  jctx->readers. The reader stays alive so a concurrent rd->next() on
  the inference thread holds a still-valid pointer; it returns naturally
  once the worker processes the cancel and the slot is released.
- The reader is removed from the map by the normal stop-result path in
  receiveCompletionJson, and its destructor calls stop() again (no-op
  due to the internal cancelled guard).
- The previous mid-token attempt erased the unique_ptr from the readers
  map on the cancelling thread, causing a use-after-free against the
  in-flight rd->next() on the inference thread and a std::system_error
  JVM abort in CI. The new design never touches the unique_ptr from
  the cancelling thread, closing that race.
- CancellationToken keeps a (LlamaModel, taskId) registration that
  complete() installs before the receive loop and clears in finally.
  If cancel() runs before register() lands, the cooperative volatile
  flag still aborts the loop, which then posts queueCancel from the
  inference thread itself.
- Forward declaration of queueCancel added to jllama.h to ensure C
  linkage on symbol export (same lesson as 36d8862).
The original §2.1 scope assumed new JNI plumbing was needed for image
input. On inspection the upstream OAI chat path already handles
{"type":"image_url","image_url":{"url":"data:..."}} blocks and
routes them through the compiled-in mtmd pipeline, and our
handleChatCompletions JNI method forwards the request JSON intact. The
only gap was a typed Java surface to emit the multipart-array form of
the message content.

New API
- ContentPart value type (TEXT / IMAGE_URL) with factories text(),
  imageUrl(), imageBytes(bytes, mime), imageFile(Path). imageFile
  detects png/jpeg/webp/gif from the file extension and produces a
  base64 data URI.
- ChatMessage(String role, List<ContentPart> parts) constructor plus
  ChatMessage.userMultimodal(ContentPart...) factory. getParts() returns
  an unmodifiable list (null for legacy text-only messages); hasParts()
  is the boolean accessor. getContent() concatenates text parts so
  legacy readers stay functional.
- InferenceParameters.setMessages(List<ChatMessage>) overload routing
  through ParameterJsonSerializer.buildMessages(List<ChatMessage>) which
  emits array-form content only when a message has parts.

Backwards compat
- Existing ChatMessage(role, content) constructor unchanged. Existing
  setMessages(String, List<Pair>) unchanged. Existing serializer
  buildMessages(String, List<Pair>) unchanged. All 123 prior tests in
  ChatMessageTest / ParameterJsonSerializerTest / InferenceParametersTest
  still pass.

Tests
- ContentPartTest (14): factory contracts, MIME detection, data URI shape.
- MultimodalMessagesTest (11): hasParts/getParts split, concatText
  behaviour, OAI array-form serialisation, mixed legacy+parts message
  lists, and end-to-end round-trip into the InferenceParameters JSON.
All 25 new tests run without a model (pure JSON shape verification).

Zero new JNI symbols. Mmproj wiring (ModelParameters.setMmproj) already
shipped previously and is untouched.
Adds vision-capable model + matching mmproj + a CC0/PD test image to all
four Java test jobs (Linux x86_64, macOS arm64 with/without Metal,
Windows x86_64) and a model-gated MultimodalIntegrationTest that proves
the typed ChatMessage(role, List<ContentPart>) surface from PR #189
round-trips through the upstream mtmd pipeline end-to-end.

CI changes (.github/workflows/publish.yml)
- New env vars: VISION_MODEL_URL / VISION_MODEL_NAME pointing at
  ggml-org/SmolVLM-500M-Instruct-Q8_0.gguf (smallest reliable vision
  GGUF on community ggml-org), VISION_MMPROJ_URL / _NAME for the
  matching mmproj, VISION_IMAGE_URL / _NAME for a small PD red-apple
  image from Wikimedia Commons.
- Each of the four Java test jobs gains three download steps and three
  -D system properties on the mvn test invocation:
  -Dnet.ladenthin.llama.vision.model / .mmproj / .image.

Validation scripts
- validate-models.sh refactored into validate_gguf() + validate_image()
  helpers with a 'required' vs 'optional' mode. Required models still
  fail-fast; the new vision GGUFs and PD image are validated only when
  present so jobs that skip them keep passing.
- validate-models.bat extended with a parallel OPTIONAL_MODELS loop.

Test (src/test/java/.../MultimodalIntegrationTest.java)
- Self-skips via Assume when any of the three -D paths is unset or its
  file is missing, so local mvn test stays green without the artifacts.
- multimodalRequestProducesNonEmptyReply: builds a ChatMessage.userMultimodal
  with ContentPart.text(...) + ContentPart.imageFile(Paths.get(image)),
  calls chatCompleteText, asserts non-empty reply. Does NOT assert reply
  semantics &#x2014; a 500M model can caption inaccurately and CI must
  not flap on model quality.
- multimodalThenTextOnSameModel: sanity check that a multimodal call
  followed by a text-only call on the same model both succeed (catches
  any parts/legacy split poisoning the inference context).

TestConstants gains PROP_VISION_MODEL_PATH / PROP_VISION_MMPROJ_PATH /
PROP_VISION_IMAGE_PATH so the test reads the system properties via the
same naming pattern as PROP_NOMIC_MODEL_PATH.

Docs
- docs/history/49be664_open_issues.md: #103 and #34 PARTIALLY FIXED ->
  FIXED in the per-issue blocks, the verdict guide, the status overview
  table, the deep-dive table, the cannot-be-closed-by-unit-tests-alone
  table, and the recommended-sequencing list. Bottom-line summary
  updated to reflect that 0 of the original LIKELY/PARTIALLY FIXED items
  remain partially fixed.
- (docs/feature-investigation-llama-stack-client-kotlin.md §2.1 was
  already updated in the PR-189 typed-multimodal-surface commit.)

Verified locally
- mvn test-compile: clean.
- mvn test -Dtest=MultimodalIntegrationTest: SKIPPED (no -D properties
  set; expected self-skip path).
- mvn javadoc:jar: BUILD SUCCESS.
The previous default (Red_Apple.jpg) was CC-BY 2.0, not CC0 / public
domain as the surrounding comments claimed - it requires attribution
that the test scaffold did not supply.

Replace with File:20200601_135745_Flowers_and_Bees.jpg from Wikimedia
Commons by Bernard Ladenthin (the project copyright holder), which the
same author additionally grants under MIT for use in this project.
Fetch via Special:FilePath?width=640 so CI pulls a ~80 KB thumbnail
instead of the multi-MB original phone photo.

Updates
- publish.yml VISION_IMAGE_URL / _NAME swapped; the comment block above
  the env vars now records the original Commons file, the author, the
  Commons license (CC-BY-4.0), and the MIT grant to this project.
- All four 'Download CC0 / public-domain test image' step names
  renamed to 'Download MIT-granted test image (by project author, see
  env block)'.
- validate-models.sh OPTIONAL_IMAGES entry updated to the new filename.
- MultimodalIntegrationTest javadoc adds an Image source section
  recording the Commons URL + author + CC-BY-4.0 / MIT dual grant.
- docs/history/49be664_open_issues.md recommended-sequencing entry no
  longer calls the image CC0 / PD.

Verified
- mvn test -Dtest='ContentPartTest,MultimodalMessagesTest,MultimodalIntegrationTest'
  BUILD SUCCESS: 25 unit tests pass, integration test self-skips
  (no -D system properties locally).
The §2.10 commit (99d1c89) routed CancellationToken.cancel() through
server_response_reader::stop(), which also calls
queue_results.remove_waiting_task_ids(id_tasks). When called from a
thread other than the inference thread (the whole point of immediate
cancel), this races fatally with the inference thread blocked inside
rd->next() -> queue_results.recv_with_timeout(id_tasks, 1s):

  1. cancel() (thread B) removes the task id from waiting_task_ids.
  2. Worker processes the queued CANCEL, releases the slot, posts the
     slot's final stop result via server_response::send().
  3. send() iterates waiting_task_ids to decide whether to enqueue. The
     id is gone, so the result is silently dropped and no notify_all
     fires (server-queue.cpp:319-332).
  4. recv_with_timeout keeps timing out every 1 s. next()'s should_stop
     is hard-coded false, so it loops forever.
  5. The JVM is hung inside receiveCompletionJson. CI surefire never
     returns (observed on Ubuntu and Windows).

The HTTP server path can call stop() safely only because by the time it
runs the HTTP handler has already returned and nothing is recv-ing on
those task ids. Our cancel is asynchronous and breaks that assumption.

Fix
- jllama.cpp Java_net_ladenthin_llama_LlamaModel_queueCancel: post the
  SERVER_TASK_TYPE_CANCEL task directly through the reader's public
  queue_tasks reference. Do NOT call reader->stop(). The waiting task
  id stays in queue_results, so the slot's stop result reaches send(),
  is enqueued, notify_all fires, recv wakes up, next() returns the
  stop result, and the Java receive loop exits naturally on
  out.stop == true.
- LlamaModel.complete(params, token): the cooperative-branch
  queueCancel must NOT break the loop. Post the cancel once (guarded
  by a cancelPosted flag) and keep calling receiveCompletionJson until
  the natural stop result arrives. Breaking immediately would orphan
  the reader in jctx->readers (no one would consume the stop result
  and erase it from the map until LlamaModel.close()).
- CancellationToken javadoc updated to record the post-without-stop
  invariant and the CI-hang regression that motivated it.

Verified locally
- cmake --build build --target jllama: BUILD SUCCESS.
- nm -D libjllama.so shows Java_net_ladenthin_llama_LlamaModel_queueCancel
  exported with plain C linkage (no _Z mangling regression).
- mvn surefire:test on the affected unit suites
  (CancellationTokenTest, ContentPartTest, MultimodalMessagesTest,
  SessionConcurrencyTest): 32 tests pass, 1 skipped (no model).
- mvn javadoc:jar: BUILD SUCCESS.
Wikimedia blocked the CI runner UA on Special:FilePath (403), so the
download step was unreliable. Better to commit the file once and let
mvn test point at the path directly.

Image
- src/test/resources/images/test-image.jpg: a synthetic 256x256 JPEG
  (sky / field / sun / house shapes, ~2.5 KB) generated by Pillow as a
  PLACEHOLDER. The intended replacement is the Wikimedia Commons photo
  File:20200601_135745_Flowers_and_Bees.jpg by Bernard Ladenthin
  (CC-BY-4.0 on Commons; MIT-granted to this project by the same
  author). Drop the photo here as test-image.jpg whenever convenient
  to overwrite; the test will pick it up automatically. See the README
  next to the file for full provenance + override instructions.
- src/test/resources/images/README.md: provenance, license, and
  override instructions.

CI (.github/workflows/publish.yml)
- Removed VISION_IMAGE_URL / VISION_IMAGE_NAME env vars and the four
  'Download MIT-granted test image' steps (one per Java test job).
- New VISION_IMAGE_PATH env var pointing at the committed file.
- All four 'Run tests' steps now pass
  -Dnet.ladenthin.llama.vision.image=${VISION_IMAGE_PATH} instead of
  the previous models/<NAME> path.

Validation (.github/validate-models.sh)
- Removed the validate_image() helper and the OPTIONAL_IMAGES loop;
  the committed JPEG is trusted as-is and not re-validated by the
  CI pre-flight.

Test (src/test/java/.../MultimodalIntegrationTest.java)
- Now falls back to TestConstants.DEFAULT_VISION_IMAGE_PATH when
  -Dnet.ladenthin.llama.vision.image is unset, so the test runs out
  of the box on a fresh checkout (still self-skips when no vision
  GGUF + mmproj is staged).
- TestConstants gains DEFAULT_VISION_IMAGE_PATH and revised javadoc
  on PROP_VISION_IMAGE_PATH.

Verified locally
- mvn test-compile: clean.
- mvn surefire:test on the four affected suites: 32 tests pass, 1
  skipped (no vision model staged locally; the image fallback is
  exercised in setup before the skip fires).
- mvn javadoc:jar: BUILD SUCCESS.
src/test/resources/images/test-image.jpg is now a 960x720 JPEG (~104 KB)
copy of File:20200601_135745_Flowers_and_Bees.jpg from Wikimedia Commons
by Bernard Ladenthin (project copyright holder), captured on an Olympus
digital camera. EXIF copyright tag confirms authorship.

Licensing
- Commons page lists the photo under CC-BY-4.0.
- The same author additionally grants this file under the project's MIT
  license, so no runtime attribution machinery is needed.

Source bytes were fetched from a gist hosted on github.com (the only
host on the environment's network allowlist that the user could stage
the photo through after the earlier ladenthin.net and Wikimedia hits
both 403'd in this sandbox). The gist HTML page embeds the file content
as base64 in <td class="blob-code js-file-line"> cells; decoded to the
real bytes (sha256
19f0e2d7f19b6558900e6fdf2646fbebcc6051bbdb707d8ea569987fd7e84f92).

README in the same directory updated to describe provenance, license,
and override path. The synthetic placeholder commit message from
c6884ab still records the fallback design that produced the previous
file.
The previous CI failure on Windows uploaded only the
*.dumpstream file (surefire's view), leaving us no way to see
the JVM crash log, the native callstack, or which @test method
was running when the abort fired. Linux had similar gaps. Fix
both the JVM-side capture and the upload globs across all four
test jobs.

JVM side (pom.xml)
- Add an explicit maven-surefire-plugin 3.5.4 entry (the same
  version Maven was auto-selecting in CI, pinned now so we do
  not drift independently of plugin-defaults bumps).
- argLine: '@{argLine} -XX:ErrorFile=hs_err_pid%p.log
  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=.' .
  @{argLine} interpolation preserves the jacoco runtime agent
  set by jacoco-maven-plugin:prepare-agent (must not break
  coverage); ErrorFile is workspace-relative so the JVM crash
  log lands next to the basedir, where the CI upload step
  already globs hs_err_pid*.log.
- redirectTestOutputToFile=true so each test class's
  stdout/stderr (including the upstream server's I srv ...
  log lines, which are the only signal of which @test was
  last running before a crash) is captured to
  target/surefire-reports/<class>-output.txt.

CI side (.github/workflows/publish.yml)
- All four test jobs (linux-x86_64, macos-14-metal,
  macos-15-metal, macos-15-no-metal) expand their failure
  upload-artifact path globs to:
    hs_err_pid*.log
    *.hprof                          (heap dumps on OOM)
    core.* (linux only — already)
    target/surefire-reports/*.dump
    target/surefire-reports/*.dumpstream
    target/surefire-reports/*.txt    (per-class summary + the
                                       new -output.txt capture)
    target/surefire-reports/TEST-*.xml
- Windows test job gains an 'Enable WER LocalDumps for
  java.exe' step BEFORE 'Run tests'. Registers
  HKLM:\SOFTWARE\Microsoft\Windows\Windows Error Reporting
  \LocalDumps\java.exe with DumpFolder=workspace\dumps,
  DumpType=2 (MiniDumpWithFullMemory), DumpCount=5. This is
  the Windows analogue of Linux core dumps; without it a
  __fastfail / abort from the JNI layer leaves no native
  callstack at all. Windows upload path globs add
  dumps\*.dmp, *.hprof, *.txt, TEST-*.xml.

Verified
- mvn test -Dtest='ContentPartTest,MultimodalMessagesTest,
  CancellationTokenTest': 31 tests pass with the new argLine
  + redirectTestOutputToFile; jacoco coverage report still
  generated (52 classes analysed). @{argLine} requires the
  full 'mvn test' phase (jacoco prepare-agent runs first);
  'mvn surefire:test' direct invocation breaks because @{...}
  is a phase-only late binding — CI uses the full phase, so
  unaffected.
The two queueCancel attempts (99d1c89 and the 705f980 'fix') both
broke CI:

  1. The first attempt erased the reader unique_ptr from
     jctx->readers on the cancelling thread, racing the inference
     thread's borrowed raw 'rd' pointer inside next(). Observed as
     std::system_error JVM aborts (exit 134) on Linux.
  2. The second attempt left the reader alive and posted CANCEL
     through reader->queue_tasks.post(), expecting the slot's
     release to emit a natural stop result on queue_results.
     Inspection of upstream
     build/_deps/llama.cpp-src/tools/server/server-context.cpp:356-376
     showed server_slot::release() only sets
     state = SLOT_STATE_IDLE and calls reset() — it does NOT post
     any result. The inference thread therefore blocked forever
     inside recv_with_timeout(id_tasks, 1 s) with should_stop
     hard-coded to false. CI hung on LlamaModelTest until the
     workflow was cancelled (the artifact from that run contains
     no LlamaModelTest.txt because surefire writes the per-class
     summary only when the class finishes).

The cooperative path that shipped in PR #188 (ad66e3a + e3b9043)
already achieves one-token-interval cancel latency (~150 ms), which
matches the sub-token goal in practice. A genuine sub-token cancel
would require an upstream change to SERVER_TASK_TYPE_CANCEL's slot
handler so it emits a stop result on queue_results, or a should_stop
predicate plumbed into receiveCompletionJson — the latter has worse
latency than the cooperative path (>= 1 s polling interval).

Revert
- jllama.cpp: remove
  Java_net_ladenthin_llama_LlamaModel_queueCancel; cancelCompletion
  is the only cancel JNI entry point again.
- jllama.h: remove the queueCancel forward declaration.
- LlamaModel.java: remove the 'native void queueCancel(int)'
  declaration; restore complete(params, token) to the original
  cooperative loop that calls cancelCompletion(taskId) and breaks
  on isCancelled.
- CancellationToken.java: restore to just the volatile cancelled
  flag (no registeredModel / registeredTaskId fields, no register /
  unregister methods). Class javadoc records the two failed
  follow-up attempts so the next person does not re-walk this path.

Doc
- §2.10 in docs/feature-investigation-llama-stack-client-kotlin.md
  flipped from SHIPPED back to 'SHIPPED — cooperative only', with a
  detailed postmortem of both failed attempts including the exact
  upstream source lines that demonstrate why the design cannot work
  without upstream changes.

Verified
- cmake --build build --target jllama: BUILD SUCCESS. nm -D shows
  only Java_..._cancelCompletion; queueCancel symbol is gone.
- mvn test -Dtest='CancellationTokenTest,ContentPartTest,
  MultimodalMessagesTest,SessionConcurrencyTest': BUILD SUCCESS,
  31 tests pass.
- mvn javadoc:jar: BUILD SUCCESS.
mac15-no-metal failed testConcurrentSendProducesAlternatingTranscript
at 119.8 s, 200 ms shy of the 120 s @test budget. Not a deadlock —
arithmetic. The synchronized(lock) in Session serializes the
4 threads * 3 calls = 12 sequential send() invocations, and each call
resends the accumulating transcript. By call 12 the prompt is ~360
tokens; at ~150 ms / token prompt-eval on CPU-only macOS Apple Silicon
that is ~50 s of prompt eval alone for the last call. Cumulative time
sat right around the budget; mac15 lost the coin flip.

Reduce the workload while still meaningfully exercising the lock and
the transcript-alternation invariant:

  threads        4  -> 2     # both branches still exercise the lock
  callsPerThread 3  -> 2     # 4 sequential calls total (was 12)
  N_PREDICT      4  -> 2     # each reply 50% cheaper
  @test timeout  120 s -> 300 s   # 5 min ceiling = ~2x the worst
                                    expected runtime even on the
                                    slowest observed CI runner
  latch await    110 s -> 270 s   # matches the new @test budget

Linux x86_64 currently runs this in ~30 s; mac15-no-metal projects to
~30 s with the new workload; even a 5x slower mac14-metal CPU-fallback
finishes in ~150 s, well under the 300 s ceiling. The bug the test
guards (concurrent send() corrupting the transcript or breaking the
user-then-assistant alternation invariant) is still exercised: 2
threads contending on the synchronized(lock) is enough to expose any
ordering regression in Session.java; 12 calls was over-engineering.

A comment above the test now documents the budget reasoning so the
next contributor does not push the workload back up without
understanding the slowest-runner ceiling.

Verified locally
- mvn test -Dtest='SessionConcurrencyTest,ContentPartTest,
  MultimodalMessagesTest,CancellationTokenTest': BUILD SUCCESS. The
  SessionConcurrencyTest tests self-skip on this sandbox (no model)
  but compile cleanly and JUnit accepts the new timeout / latch
  values.
The 'Optional models' table only listed PROP_NOMIC_MODEL_PATH; the
three vision properties added by PR #189 (PROP_VISION_MODEL_PATH /
.mmproj / .image) were undocumented for local-dev use. Anyone running
mvn test locally had to grep the test file to find them.

Add three rows to the same table (so they sit next to the existing
nomic row), an mvn invocation example for MultimodalIntegrationTest,
and a note that the test self-skips on any missing path so partial
local setups still work.
…latforms)

Until now the 'Build and Test macOS 15 arm64 (Metal)' job was a
single combined job that did both 'ctest' and 'mvn test' inline, and
unlike every other platform it did NOT upload a separate native-lib
artifact. The GitHub Actions UI therefore showed only three peer
'Java Tests ...' jobs (Linux, mac14-Metal, mac15-no-Metal, Windows)
even though Java tests were actually being run on mac15-Metal — the
fourth one was just buried inside a build-shaped job.

Split into the canonical pair, matching the other 4 platforms exactly:

  build-macos-arm64-metal-15
    name: Build and Test macOS 15 arm64 (Metal)
    - mvn compile
    - cmake build with -DLLAMA_METAL_EMBED_LIBRARY=ON -DGGML_NATIVE=OFF
    - ctest
    - upload artifact 'macos-15-metal-libraries'

  test-java-macos-arm64-metal-15  (NEW)
    name: Java Tests macOS 15 arm64 (Metal)
    needs: build-macos-arm64-metal-15
    - download 'macos-15-metal-libraries'
    - download the four required model GGUFs + vision model + mmproj
    - validate-models.sh
    - mvn test -D net.ladenthin.llama.vision.{model,mmproj,image}=...
    - on failure: upload hs_err_pid, *.hprof, surefire dumps,
      per-class -output.txt, TEST-*.xml (same glob as the other test
      jobs from b2c8067)

The 'package' job's needs: list was updated to:
  - add 'build-macos-arm64-metal-15' (was 'test-macos-arm64-metal-15'
    which no longer exists),
  - add 'test-java-macos-arm64-metal-15' so the release artifacts
    cannot ship until mac15-Metal Java tests pass.

Verified YAML parses cleanly with python yaml.safe_load. The five
test-java-* jobs now form a complete platform matrix that the GH
Actions UI displays as peers.
Three small gaps caught in a final audit of the §-doc and
the issues doc:

1. feature-investigation §1 'What java-llama.cpp already covers':
   the Session row still read 'single-threaded'. PR #189 added
   per-Session locking; relabel to 'thread-safe per instance'
   so the §1 table matches the §2.6 status block.

2. feature-investigation §4 'Suggested rollout order': all ten
   entries listed there are SHIPPED now. The section is preserved
   as a historical artifact (useful for comparing original effort
   estimates against what actually shipped, especially §2.1 which
   came in much smaller than the L estimate and §2.10 which had
   to be reverted), but a header note now states up front that it
   is no longer a roadmap.

3. docs/history/49be664_open_issues.md #113 LoadProgressCallback:
   the entry said FIXED in PR #188 and stopped there, but the
   PR #188 implementation forgot the forward declaration in
   jllama.h, which made the JNI symbol C++-mangled and produced
   an UnsatisfiedLinkError on the first call to the feature.
   PR #189 commit 36d8862 added the missing declaration and the
   feature is now actually functional (LoadProgressCallbackTest
   passes on the current CI). Add a 'Subtle issue resolved by
   PR #189' paragraph so a future reader does not think 'PR #188
   shipped this, why did it fail on the first run after merge'.

No code changes; documentation only.
@sonarqubecloud
Copy link
Copy Markdown

@bernardladenthin bernardladenthin merged commit 90410bb into main May 24, 2026
6 of 35 checks passed
@bernardladenthin bernardladenthin deleted the claude/stoic-archimedes-brMrb branch May 24, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants