-
Notifications
You must be signed in to change notification settings - Fork 581
[Proposal] Add SmolLM3 TransformerBridge architecture adapter #1351
Copy link
Copy link
Open
Labels
TransformerBridgeBug specific to the new TransformerBridge systemBug specific to the new TransformerBridge systemcomplexity-moderateModerately complicated issues for people who have intermediate experience with the codeModerately complicated issues for people who have intermediate experience with the codeenhancementNew feature or requestNew feature or requesthigh-priorityMaintainers are interested in these issues being solved before othersMaintainers are interested in these issues being solved before othersnew-architectureThis card involves adding a new architecture .This card involves adding a new architecture .
Metadata
Metadata
Assignees
Labels
TransformerBridgeBug specific to the new TransformerBridge systemBug specific to the new TransformerBridge systemcomplexity-moderateModerately complicated issues for people who have intermediate experience with the codeModerately complicated issues for people who have intermediate experience with the codeenhancementNew feature or requestNew feature or requesthigh-priorityMaintainers are interested in these issues being solved before othersMaintainers are interested in these issues being solved before othersnew-architectureThis card involves adding a new architecture .This card involves adding a new architecture .
Type
Fields
Give feedbackNo fields configured for issues without a type.
Proposal
Add a TransformerBridge architecture adapter for SmolLM3 (
SmolLM3ForCausalLM, the HuggingFaceTB SmolLM3 family).Motivation
I have been running some local experiments and wanted to use
HuggingFaceTB/SmolLM3-3B. I could not find a bridge adapter for it. It also shows up as unsupported inarchitecture_gaps.json(relevancy 54.4, 8 models, ~1.1M downloads, tiny-random checkpoints available for CI). Happy to take it on if no one else is mid-flight.Pitch
Structurally a Llama-family block shape (RMSNorm + GQA + SwiGLU + RoPE, no biases) with tied embeddings. Two architectural quirks worth flagging:
config.no_rope_layers.sliding_window: null).Both ride existing bridge infrastructure with no new generalized components needed. Happy to share specifics in the PR.
Rough sketch:
qwen2.py(closest match for the no-bias, tied-embed shape).architecture_adapter_factory.py,supported_architectures/__init__.py,tools/model_registry/__init__.py).transformer_lens.tools.model_registry.verify_modelsagainstHuggingFaceTB/SmolLM3-3Bplus a tiny-random checkpoint.If the approach looks right, happy to open the PR shortly.
Checklist