Release v0.1.7 — Fix use-after-move of LlamaModel · DominguesM/llama-crab

What's Changed

Critical Bug Fix

Fixed a use-after-move of LlamaModel that caused SIGSEGV when Llama crossed a return boundary (e.g. Llama::load returning by value, or Llama being moved across scope).

The self-referential &'a LlamaModel field on LlamaContext, papered over by a PhantomData<*mut ()> + transmute in Llama::load, has been replaced with a heap-allocated Box<LlamaModel> and a NonNull<LlamaModel> raw pointer on the context.

Symptoms fixed:

n_embd reading as 0 in the embedding context
n_vocab reading stale or zeroed data in logits_ith / sampled_probs_ith
Random crashes in rerank and any flow that crossed a return boundary with a Llama value

🔧 Changes

Llama.model is now Box<LlamaModel> (heap-allocated, stable address)
LlamaContext.model is now NonNull<LlamaModel> (raw pointer owned by the context)
Removed the 'a lifetime parameter from LlamaContext
Fields in Llama reordered to context → model → _backend → _not_send_sync to enforce correct drop order
Added 9 regression tests covering embeddings, rerank, infill, and streaming APIs
Bumped version to 0.1.7
Workspace members now use crates/* glob pattern
Updated changelog and versioned docs.rs links

Verification

All 9 new regression tests pass:

embeddings_seq_returns_768_dim_unit_norm_vector
embed_called_twice_returns_consistent_dim
logits_ith_after_decode_reads_n_vocab_floats
rerank_scores_documents_and_top_match_is_rust
rerank_empty_documents_returns_empty_vec
infill_returns_some_content
infill_called_twice_is_consistent
streaming_completion_collects_tokens
streaming_completion_can_stop_early

Migration Notes

No public API breakage. All existing code continues to work without changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.7 — Fix use-after-move of LlamaModel

Choose a tag to compare

Sorry, something went wrong.