Skip to content

Add Gemma 4 text-only inference support to TileGym transformers examples #130

@duoan

Description

@duoan

Hi TileGym team,

I’d like to propose adding initial Gemma 4 text-only inference support to the modeling/transformers path.

Motivation

TileGym already has a Gemma integration path in infer.py, where "gemma" models are routed through apply_tilegym_kernel_to_gemma3(...) with existing kernel toggles for rope, rms_norm, mlp, and attn.

The repository also already includes Gemma-3-4B-IT in the supported transformer benchmark examples, so there is a clear precedent for Gemma-family model integration in the current workflow.

In the roadmap, “More LLM models” is explicitly marked as Help Wanted, and the maintainers recommend opening an issue first for significant end-to-end model support work to coordinate scope and avoid duplicate effort.

Gemma 4 also looks like a practical next target because Google positions it as a new Gemma family with strong reasoning/coding capability and efficient deployment, but I would keep the first contribution intentionally narrow and avoid multimodal scope in the initial PR.

Proposed scope for v1

I propose a small first step:

  • add Gemma 4 text-only support in modeling/transformers
  • reuse the existing Gemma patch path where possible
  • add a Gemma 4 benchmark script
  • validate with the existing profile and kernel coverage report flow

This stays aligned with the current transformer example stack, which already supports benchmarking, Torch profiler output, and NSight Systems based cuTile kernel coverage reporting.

Out of scope for the first PR

To keep the first contribution reviewable, I would exclude:

  • multimodal image/audio/video support
  • agent/function-calling workflows
  • any broader API/runtime integration outside the existing modeling/transformers flow

Gemma 4 does advertise multimodal and agentic capabilities, but I think the first upstreamable version should focus on text generation parity first.

Implementation sketch

Tentative plan:

  1. identify a Gemma 4 model variant that fits the existing TileGym benchmark path
  2. verify whether the current Gemma patch path can be reused directly or needs a small Gemma 4 specific adapter
  3. add/update benchmark entry points
  4. run:
    • baseline generation
    • TileGym generation
    • profiler output
    • kernel coverage report

Validation expectations

I expect the PR to include:

  • a runnable Gemma 4 example in modeling/transformers
  • benchmark results against the baseline path
  • profiling / kernel coverage output
  • any minimal README updates needed for reproducibility

Questions for maintainers

Before I start, I’d appreciate guidance on two points:

  1. Would you prefer Gemma 4 support to extend the existing Gemma path, or would you rather have a dedicated Gemma 4 patch entry point?
  2. Is there a preferred Gemma 4 model size to target first for the initial contribution?

If this direction looks good, I can start with a narrow text-only PR and keep the first version focused on compatibility, benchmarking, and coverage reporting.


Sources

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions