Skip to content

Add Apertus model support with XIeLU activation#1197

Merged
jlarson4 merged 13 commits intoTransformerLensOrg:devfrom
sinievanderben:add-apertus-support
Mar 18, 2026
Merged

Add Apertus model support with XIeLU activation#1197
jlarson4 merged 13 commits intoTransformerLensOrg:devfrom
sinievanderben:add-apertus-support

Conversation

@sinievanderben
Copy link

@sinievanderben sinievanderben commented Mar 11, 2026

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

Comments

This PR adds support for the novel Extended Inverse ELU (xIELU) activation and makes the Apertus‑to‑TransformerLens weight converter aware of its trainable parameters.

Key changes:

This PR adds support for the Apertus model.

New trainable xIELU activation.

  • XIELU class in utils.py with learnable α₊, α₋, β.
  • Original xielu function kept for static use.
  • Added "xielu" to ACTIVATION_FN_DICT.

Activation wiring

  • can_be_used_as_mlp.select_activation_function() now instantiates XIELU when cfg.act_fn == "xielu".

Apertus weight converter improvements

  • Extracts trainable activation parameters (alpha_p, alpha_n, beta).
  • Handles different attribute locations (mlp.act_fn, mlp.act, or top‑level attrs).
  • Falls back to default constants if missing (back‑compatible with old checkpoints).

@jlarson4 jlarson4 changed the base branch from main to dev March 16, 2026 15:51
@jlarson4
Copy link
Collaborator

Hello @sinievanderben!

In order to get this PR merged and included in a future release there are a couple things I will need from you:

  1. Please make sure your code passes all CI checks. MyPy is currently flagging your changes.
  2. Please add tests for the new XIELU feature.

Thank you!

@sinievanderben
Copy link
Author

sorry, I am not 100% certain but the error message of the failure looks like something I did not introduce? I can still be wrong!

@jlarson4
Copy link
Collaborator

I believe you are correct, I will look into it!

@jlarson4 jlarson4 merged commit e15d32d into TransformerLensOrg:dev Mar 18, 2026
25 of 26 checks passed
@jlarson4 jlarson4 mentioned this pull request Mar 19, 2026
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants