Skip to content

RFC: Support for Inference-Layer Expert Steering (Extension to ail Spec) #5

@AlexChesser

Description

@AlexChesser

Core Objective:
Develop an extension for the ail specification that allows the runner to influence Mixture-of-Experts (MoE) routing at the inference layer, as theorized in arXiv:2509.09660 (SteerMoE). This moves ail from "Top-Down" prompt orchestration to "Direct Neuromodulation" of the model’s internal reasoning experts.

Technical Context for the Issue:

  • The Problem: Prompting alone is a "leaky" abstraction for controlling model behavior. MoE models often "flicker" between experts, leading to inconsistent reasoning or "alignment faking."
  • The Solution: Implement a steering block in the ail YAML that allows the runner to apply logit-biasing to specific experts during the forward pass.
  • Safety & Compliance: Frame this as a Faithfulness & Determinism feature. Steering should be used to reinforce "Logic" and "Verification" experts, ensuring the model remains grounded in the provided context.

Requested Issue Sections:

  1. Proposed Spec Syntax:

    • Introduce a steering block nested within steps.
    • Include fields for activate, suppress, and intensity.
    • Define a requirement_level (e.g., strict, flexible) to handle non-MoE or unsupported models.
  2. Expert Mapping Strategy:

    • Propose a sidecar expert_map.json standard. Since Expert IDs are model-specific, ail needs a translation layer that maps human-readable intents (e.g., reasoning, code_generation) to specific expert indices.
  3. Runner Requirements (Rust/C++):

    • Outline the need for hooks in the inference engine (e.g., llama.cpp or candle) to intercept the Gate/Router logits before the top-k selection.
  4. Error Handling & Validation:

    • What happens if a strict steering requirement is met with a dense model?
    • Define the "Degraded Mode" where the runner falls back to standard prompting if steering is unavailable.
  5. Safety Guardrails:

    • Explicitly mention that steering can be used to enforce safety experts in high-stakes loops, providing a verifiable "Neuro-Audit" log of the model's internal state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreissue applies to the applicaiton core. Typically around altering YAML grammer or behaviorenhancementNew feature or requestquestionFurther information is requestedspec

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions