Skip to content

Add Llama-3 / Mistral architecture support #6

@BioInfo

Description

@BioInfo

Feature: Llama-3 and Mistral model support

Currently Dendrite supports LLaMA (2/1) and Qwen3 architectures. Llama-3 and Mistral are among the most commonly used open models.

What Changed in Llama-3

  • GQA (Grouped Query Attention) — different num_kv_heads vs num_heads
  • RoPE scaling: uses rope_scaling config field
  • SentencePiece → tiktoken tokenizer

Mistral Differences

  • Sliding window attention (SWA) in some layers
  • Different FFN interleaving
  • Same RoPE as Llama-2

Scope

  1. model/llama3.rs — Llama-3 architecture (extend existing model/transformer.rs)
  2. model/mistral.rs — Mistral architecture
  3. Tests: golden output test against reference implementation (HuggingFace transformers)
  4. Example: examples/llama3_inference.rs

Why This Matters

Most users asking "can Dendrite run model X?" will ask about Llama-3 first. This is a high-visibility gap.

Complexity

Medium — architecture is well-documented. GQA support may require changes to KvCacheConfig (num_kv_heads != num_attention_heads). Good entry point if you know transformer architectures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions