Skip to content

v0.1.7 — Fix use-after-move of LlamaModel

Choose a tag to compare

@DominguesM DominguesM released this 16 Jun 12:42
· 5 commits to develop since this release
8f8aa41

What's Changed

Critical Bug Fix

Fixed a use-after-move of LlamaModel that caused SIGSEGV when Llama crossed a return boundary (e.g. Llama::load returning by value, or Llama being moved across scope).

The self-referential &'a LlamaModel field on LlamaContext, papered over by a PhantomData<*mut ()> + transmute in Llama::load, has been replaced with a heap-allocated Box<LlamaModel> and a NonNull<LlamaModel> raw pointer on the context.

Symptoms fixed:

  • n_embd reading as 0 in the embedding context
  • n_vocab reading stale or zeroed data in logits_ith / sampled_probs_ith
  • Random crashes in rerank and any flow that crossed a return boundary with a Llama value

🔧 Changes

  • Llama.model is now Box<LlamaModel> (heap-allocated, stable address)
  • LlamaContext.model is now NonNull<LlamaModel> (raw pointer owned by the context)
  • Removed the 'a lifetime parameter from LlamaContext
  • Fields in Llama reordered to context → model → _backend → _not_send_sync to enforce correct drop order
  • Added 9 regression tests covering embeddings, rerank, infill, and streaming APIs
  • Bumped version to 0.1.7
  • Workspace members now use crates/* glob pattern
  • Updated changelog and versioned docs.rs links

Verification

All 9 new regression tests pass:

  • embeddings_seq_returns_768_dim_unit_norm_vector
  • embed_called_twice_returns_consistent_dim
  • logits_ith_after_decode_reads_n_vocab_floats
  • rerank_scores_documents_and_top_match_is_rust
  • rerank_empty_documents_returns_empty_vec
  • infill_returns_some_content
  • infill_called_twice_is_consistent
  • streaming_completion_collects_tokens
  • streaming_completion_can_stop_early

Migration Notes

No public API breakage. All existing code continues to work without changes.