[E2E Model Support] Add TileGym kernel integration for Llama 4

## Overview

This issue tracks the addition of end-to-end TileGym kernel support for Meta's Llama 4 model family.

I am planning to work on this and opening this issue to avoid duplicate efforts.

## Approach

Llama 4 shares the core Llama architecture (RoPE, RMSNorm, SwiGLU MLP, GQA attention), 
so the implementation will largely follow the existing Llama 3.1 integration pattern.

Planned steps:
1. Add `apply_tilegym_kernel_to_llama4` in `monkey_patch.py`
2. Register it in `MODEL_TYPE_TO_APPLY_TILEGYM_FN`
3. Handle Llama 4-specific differences (e.g. MoE experts in the Maverick variant)
4. Add E2E inference test

## Questions / Discussion Points

- Is there a preferred inference framework to target, or is HuggingFace Transformers the expected path?
- Does the Maverick MoE variant require a new dispatch path, or can it reuse the existing MoE infrastructure from DeepSeek V2?
- What is the target GPU for validation?

Happy to discuss the approach before diving in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[E2E Model Support] Add TileGym kernel integration for Llama 4 #97

Overview

Approach

Questions / Discussion Points

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[E2E Model Support] Add TileGym kernel integration for Llama 4 #97

Description

Overview

Approach

Questions / Discussion Points

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions