Overview
This issue tracks the addition of end-to-end TileGym kernel support for Meta's Llama 4 model family.
I am planning to work on this and opening this issue to avoid duplicate efforts.
Approach
Llama 4 shares the core Llama architecture (RoPE, RMSNorm, SwiGLU MLP, GQA attention),
so the implementation will largely follow the existing Llama 3.1 integration pattern.
Planned steps:
- Add
apply_tilegym_kernel_to_llama4 in monkey_patch.py
- Register it in
MODEL_TYPE_TO_APPLY_TILEGYM_FN
- Handle Llama 4-specific differences (e.g. MoE experts in the Maverick variant)
- Add E2E inference test
Questions / Discussion Points
- Is there a preferred inference framework to target, or is HuggingFace Transformers the expected path?
- Does the Maverick MoE variant require a new dispatch path, or can it reuse the existing MoE infrastructure from DeepSeek V2?
- What is the target GPU for validation?
Happy to discuss the approach before diving in.
Overview
This issue tracks the addition of end-to-end TileGym kernel support for Meta's Llama 4 model family.
I am planning to work on this and opening this issue to avoid duplicate efforts.
Approach
Llama 4 shares the core Llama architecture (RoPE, RMSNorm, SwiGLU MLP, GQA attention),
so the implementation will largely follow the existing Llama 3.1 integration pattern.
Planned steps:
apply_tilegym_kernel_to_llama4inmonkey_patch.pyMODEL_TYPE_TO_APPLY_TILEGYM_FNQuestions / Discussion Points
Happy to discuss the approach before diving in.