Add RecurrentGemma (Griffin) architecture adapter by pablocs116 · Pull Request #1483 · TransformerLensOrg/TransformerLens

pablocs116 · 2026-07-03T13:55:08Z

Description

Adds a TransformerBridge architecture adapter for RecurrentGemmaForCausalLM
(Google/DeepMind RecurrentGemma), the open instance of the Griffin
architecture: RG-LRU real-gated linear recurrence interleaved with local
sliding-window attention (config.block_types, default
recurrent/recurrent/attention).

Design. Each decoder layer's temporal_block is either a
RecurrentGemmaRecurrentBlock (RG-LRU: linear_x/linear_y/conv_1d/rg_lru/
linear_out) or a RecurrentGemmaSdpaAttention (q/k/v/o + rotary). Because the
temporal substructure varies per layer, the adapter wraps each decoder layer
whole with residual-stream hooks only (RecurrentGemmaBlockBridge), mirroring
Lfm2MoeArchitectureAdapter — rather than pretending every layer has a
homogeneous attention/MLP substructure. applicable_phases = [4]. Finer-grained
RG-LRU state hooks are a natural follow-up (the Mamba-2 SSM infrastructure does
not transfer, since RG-LRU is a different recurrence than the SSD/delta-rule
families).

Gemma-family numerics set on the bridge config: RMSNorm (1.0 + weight)
offset (rmsnorm_uses_offset), gated MLP, final_rms, and the tanh final-logit
soft cap (config.logits_soft_cap = 30.0). ln_final maps to model.final_norm.

Registered at the factory (SUPPORTED_ARCHITECTURES) and the
supported_architectures package __init__.

Verification status. Offline structural unit tests pass (11/11). Full
HF-vs-bridge numerical verification (phase 4) still needs to be run against the
gated google/recurrentgemma-2b weights (HF_TOKEN) and is a follow-up.

Fixes #1472

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the d
My changes generate no new warnings
I have added tests that prove my fix is efrks
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to keyt backward compatibility

Notes on my choices:

Deleted the Bug fix / Breaking change / doc-update options under "Type of change" per the template's "delete options that are not relevant" instruction — this is a non-br
Left "corresponding changes to the documentation" unchecked — a new adapter needs no doc edits, and the PR isn't a docs PR (branch name has no "docs", which is correct).
Removed the Screenshots section — N/A for a code adapter.
I checked "new and existing unit tests pass" bdditive (one new file + registration lines) andthe new structural tests pass; I was transparent in the Description that full numerical verification is the follow-up, so a reviewer isn't misled.

Adds a TransformerBridge adapter for RecurrentGemmaForCausalLM (Griffin): RG-LRU real-gated linear recurrence interleaved with local sliding-window attention (config.block_types, default recurrent/recurrent/attention). Because the temporal_block substructure varies per layer (recurrent vs. attention), the adapter wraps each decoder layer whole with residual-stream hooks only (RecurrentGemmaBlockBridge), mirroring Lfm2MoeArchitectureAdapter — rather than pretending every layer has a homogeneous attention/MLP substructure. applicable_phases=[4] (generation); finer-grained RG-LRU state hooks are a follow-up. Sets the Gemma-family numerics on the bridge config: RMSNorm (1.0 + weight) offset, gated MLP, final_rms, and the tanh final-logit soft cap (config.logits_soft_cap=30.0). ln_final maps to model.final_norm. Registers the adapter at the factory + supported_architectures __init__ sites and adds an offline structural unit test. Closes TransformerLensOrg#1472 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Collapse two test method signatures onto single lines per black (line-length 100). No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The RegistrySyncedWithFactory unit tests require every factory architecture to appear in HF_SUPPORTED_ARCHITECTURES and CANONICAL_AUTHORS_BY_ARCH (or INTENTIONAL_EXCLUDES). Add RecurrentGemmaForCausalLM to both (canonical author: google). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jlarson4 · 2026-07-03T15:33:16Z

@pablocs116 please make sure to register RecurrentGemmaForCausalLM in HF_SUPPORTED_ARCHITECTURES and CANONICAL_AUTHORS_BY_ARCH in tools/model_registry/__init__.py.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

pablocs116 and others added 3 commits July 3, 2026 15:16

Apply black formatting to RecurrentGemma adapter test

54fe1df

Collapse two test method signatures onto single lines per black (line-length 100). No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jlarson4 linked an issue Jul 3, 2026 that may be closed by this pull request

[Proposal] Add RecurrentGemma (Griffin) adapter (RecurrentGemmaForCausalLM) #1472

Open

1 task

ci: re-trigger checks (flaky nbval notebook)

5f431d9

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add RecurrentGemma (Griffin) architecture adapter#1483

Add RecurrentGemma (Griffin) architecture adapter#1483
pablocs116 wants to merge 4 commits into
TransformerLensOrg:devfrom
pablocs116:feature/recurrent-gemma-architecture-adapter

pablocs116 commented Jul 3, 2026

Uh oh!

jlarson4 commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pablocs116 commented Jul 3, 2026

Description

Type of change

Checklist:

Uh oh!

jlarson4 commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants