Add Llama-3 / Mistral architecture support

## Feature: Llama-3 and Mistral model support

Currently Dendrite supports LLaMA (2/1) and Qwen3 architectures. Llama-3 and Mistral are among the most commonly used open models.

### What Changed in Llama-3

- GQA (Grouped Query Attention) — different num_kv_heads vs num_heads
- RoPE scaling: uses `rope_scaling` config field
- SentencePiece → tiktoken tokenizer

### Mistral Differences

- Sliding window attention (SWA) in some layers
- Different FFN interleaving
- Same RoPE as Llama-2

### Scope

1. `model/llama3.rs` — Llama-3 architecture (extend existing `model/transformer.rs`)
2. `model/mistral.rs` — Mistral architecture
3. Tests: golden output test against reference implementation (HuggingFace transformers)
4. Example: `examples/llama3_inference.rs`

### Why This Matters

Most users asking "can Dendrite run model X?" will ask about Llama-3 first. This is a high-visibility gap.

### Complexity

Medium — architecture is well-documented. GQA support may require changes to `KvCacheConfig` (num_kv_heads != num_attention_heads). Good entry point if you know transformer architectures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Llama-3 / Mistral architecture support #6

Feature: Llama-3 and Mistral model support

What Changed in Llama-3

Mistral Differences

Scope

Why This Matters

Complexity

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add Llama-3 / Mistral architecture support #6

Description

Feature: Llama-3 and Mistral model support

What Changed in Llama-3

Mistral Differences

Scope

Why This Matters

Complexity

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions