v0.1.7 — Fix use-after-move of LlamaModel
What's Changed
Critical Bug Fix
Fixed a use-after-move of LlamaModel that caused SIGSEGV when Llama crossed a return boundary (e.g. Llama::load returning by value, or Llama being moved across scope).
The self-referential &'a LlamaModel field on LlamaContext, papered over by a PhantomData<*mut ()> + transmute in Llama::load, has been replaced with a heap-allocated Box<LlamaModel> and a NonNull<LlamaModel> raw pointer on the context.
Symptoms fixed:
n_embdreading as0in the embedding contextn_vocabreading stale or zeroed data inlogits_ith/sampled_probs_ith- Random crashes in
rerankand any flow that crossed areturnboundary with aLlamavalue
🔧 Changes
Llama.modelis nowBox<LlamaModel>(heap-allocated, stable address)LlamaContext.modelis nowNonNull<LlamaModel>(raw pointer owned by the context)- Removed the
'alifetime parameter fromLlamaContext - Fields in
Llamareordered tocontext → model → _backend → _not_send_syncto enforce correct drop order - Added 9 regression tests covering embeddings, rerank, infill, and streaming APIs
- Bumped version to 0.1.7
- Workspace members now use
crates/*glob pattern - Updated changelog and versioned docs.rs links
Verification
All 9 new regression tests pass:
embeddings_seq_returns_768_dim_unit_norm_vectorembed_called_twice_returns_consistent_dimlogits_ith_after_decode_reads_n_vocab_floatsrerank_scores_documents_and_top_match_is_rustrerank_empty_documents_returns_empty_vecinfill_returns_some_contentinfill_called_twice_is_consistentstreaming_completion_collects_tokensstreaming_completion_can_stop_early
Migration Notes
No public API breakage. All existing code continues to work without changes.