Add RecurrentGemma (Griffin) architecture adapter#1483
Open
pablocs116 wants to merge 4 commits into
Open
Conversation
Adds a TransformerBridge adapter for RecurrentGemmaForCausalLM (Griffin): RG-LRU real-gated linear recurrence interleaved with local sliding-window attention (config.block_types, default recurrent/recurrent/attention). Because the temporal_block substructure varies per layer (recurrent vs. attention), the adapter wraps each decoder layer whole with residual-stream hooks only (RecurrentGemmaBlockBridge), mirroring Lfm2MoeArchitectureAdapter — rather than pretending every layer has a homogeneous attention/MLP substructure. applicable_phases=[4] (generation); finer-grained RG-LRU state hooks are a follow-up. Sets the Gemma-family numerics on the bridge config: RMSNorm (1.0 + weight) offset, gated MLP, final_rms, and the tanh final-logit soft cap (config.logits_soft_cap=30.0). ln_final maps to model.final_norm. Registers the adapter at the factory + supported_architectures __init__ sites and adds an offline structural unit test. Closes TransformerLensOrg#1472 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Collapse two test method signatures onto single lines per black (line-length 100). No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The RegistrySyncedWithFactory unit tests require every factory architecture to appear in HF_SUPPORTED_ARCHITECTURES and CANONICAL_AUTHORS_BY_ARCH (or INTENTIONAL_EXCLUDES). Add RecurrentGemmaForCausalLM to both (canonical author: google). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Collaborator
|
@pablocs116 please make sure to register RecurrentGemmaForCausalLM in |
1 task
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a TransformerBridge architecture adapter for
RecurrentGemmaForCausalLM(Google/DeepMind RecurrentGemma), the open instance of the Griffin
architecture: RG-LRU real-gated linear recurrence interleaved with local
sliding-window attention (
config.block_types, defaultrecurrent/recurrent/attention).Design. Each decoder layer's
temporal_blockis either aRecurrentGemmaRecurrentBlock(RG-LRU:linear_x/linear_y/conv_1d/rg_lru/linear_out) or aRecurrentGemmaSdpaAttention(q/k/v/o+ rotary). Because thetemporal substructure varies per layer, the adapter wraps each decoder layer
whole with residual-stream hooks only (
RecurrentGemmaBlockBridge), mirroringLfm2MoeArchitectureAdapter— rather than pretending every layer has ahomogeneous attention/MLP substructure.
applicable_phases = [4]. Finer-grainedRG-LRU state hooks are a natural follow-up (the Mamba-2 SSM infrastructure does
not transfer, since RG-LRU is a different recurrence than the SSD/delta-rule
families).
Gemma-family numerics set on the bridge config: RMSNorm
(1.0 + weight)offset (
rmsnorm_uses_offset), gated MLP,final_rms, and the tanh final-logitsoft cap (
config.logits_soft_cap = 30.0).ln_finalmaps tomodel.final_norm.Registered at the factory (
SUPPORTED_ARCHITECTURES) and thesupported_architecturespackage__init__.Verification status. Offline structural unit tests pass (11/11). Full
HF-vs-bridge numerical verification (phase 4) still needs to be run against the
gated
google/recurrentgemma-2bweights (HF_TOKEN) and is a follow-up.Fixes #1472
Type of change
Checklist:
Notes on my choices: