Skip to content

Add RecurrentGemma (Griffin) architecture adapter#1483

Open
pablocs116 wants to merge 4 commits into
TransformerLensOrg:devfrom
pablocs116:feature/recurrent-gemma-architecture-adapter
Open

Add RecurrentGemma (Griffin) architecture adapter#1483
pablocs116 wants to merge 4 commits into
TransformerLensOrg:devfrom
pablocs116:feature/recurrent-gemma-architecture-adapter

Conversation

@pablocs116

Copy link
Copy Markdown

Description

Adds a TransformerBridge architecture adapter for RecurrentGemmaForCausalLM
(Google/DeepMind RecurrentGemma), the open instance of the Griffin
architecture: RG-LRU real-gated linear recurrence interleaved with local
sliding-window attention (config.block_types, default
recurrent/recurrent/attention).

Design. Each decoder layer's temporal_block is either a
RecurrentGemmaRecurrentBlock (RG-LRU: linear_x/linear_y/conv_1d/rg_lru/
linear_out) or a RecurrentGemmaSdpaAttention (q/k/v/o + rotary). Because the
temporal substructure varies per layer, the adapter wraps each decoder layer
whole with residual-stream hooks only (RecurrentGemmaBlockBridge), mirroring
Lfm2MoeArchitectureAdapter — rather than pretending every layer has a
homogeneous attention/MLP substructure. applicable_phases = [4]. Finer-grained
RG-LRU state hooks are a natural follow-up (the Mamba-2 SSM infrastructure does
not transfer, since RG-LRU is a different recurrence than the SSD/delta-rule
families).

Gemma-family numerics set on the bridge config: RMSNorm (1.0 + weight)
offset (rmsnorm_uses_offset), gated MLP, final_rms, and the tanh final-logit
soft cap (config.logits_soft_cap = 30.0). ln_final maps to model.final_norm.

Registered at the factory (SUPPORTED_ARCHITECTURES) and the
supported_architectures package __init__.

Verification status. Offline structural unit tests pass (11/11). Full
HF-vs-bridge numerical verification (phase 4) still needs to be run against the
gated google/recurrentgemma-2b weights (HF_TOKEN) and is a follow-up.

Fixes #1472

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the d
  • My changes generate no new warnings
  • I have added tests that prove my fix is efrks
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to keyt backward compatibility

Notes on my choices:

  • Deleted the Bug fix / Breaking change / doc-update options under "Type of change" per the template's "delete options that are not relevant" instruction — this is a non-br
  • Left "corresponding changes to the documentation" unchecked — a new adapter needs no doc edits, and the PR isn't a docs PR (branch name has no "docs", which is correct).
  • Removed the Screenshots section — N/A for a code adapter.
  • I checked "new and existing unit tests pass" bdditive (one new file + registration lines) andthe new structural tests pass; I was transparent in the Description that full numerical verification is the follow-up, so a reviewer isn't misled.

pablocs116 and others added 3 commits July 3, 2026 15:16
Adds a TransformerBridge adapter for RecurrentGemmaForCausalLM (Griffin):
RG-LRU real-gated linear recurrence interleaved with local sliding-window
attention (config.block_types, default recurrent/recurrent/attention).

Because the temporal_block substructure varies per layer (recurrent vs.
attention), the adapter wraps each decoder layer whole with residual-stream
hooks only (RecurrentGemmaBlockBridge), mirroring Lfm2MoeArchitectureAdapter —
rather than pretending every layer has a homogeneous attention/MLP
substructure. applicable_phases=[4] (generation); finer-grained RG-LRU state
hooks are a follow-up.

Sets the Gemma-family numerics on the bridge config: RMSNorm (1.0 + weight)
offset, gated MLP, final_rms, and the tanh final-logit soft cap
(config.logits_soft_cap=30.0). ln_final maps to model.final_norm.

Registers the adapter at the factory + supported_architectures __init__ sites
and adds an offline structural unit test.

Closes TransformerLensOrg#1472

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Collapse two test method signatures onto single lines per black
(line-length 100). No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The RegistrySyncedWithFactory unit tests require every factory
architecture to appear in HF_SUPPORTED_ARCHITECTURES and
CANONICAL_AUTHORS_BY_ARCH (or INTENTIONAL_EXCLUDES). Add
RecurrentGemmaForCausalLM to both (canonical author: google).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jlarson4

jlarson4 commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

@pablocs116 please make sure to register RecurrentGemmaForCausalLM in HF_SUPPORTED_ARCHITECTURES and CANONICAL_AUTHORS_BY_ARCH in tools/model_registry/__init__.py.

@jlarson4 jlarson4 linked an issue Jul 3, 2026 that may be closed by this pull request
1 task
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Add RecurrentGemma (Griffin) adapter (RecurrentGemmaForCausalLM)

2 participants