STS (SAE-based Transferability Score)

This repository includes a PyTorch implementation of the ICLR 2026 paper SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training authored by Qi Zhang*, Yifei Wang*, Xiaohan Wang, Jiajun Chai, Guojun Yin, Wei Lin, and Yisen Wang.

STS is a metric that can predict the transferability of LLMs before training. STS identifies shifted dimensions in SAE representations and calculates their correlations with downstream domains. Extensive experiments across multiple models and domains show that STS accurately predicts the transferability of supervised fine-tuning, achieving Pearson correlation coefficients above 0.7 with actual performance changes.

Instructions

Environment Setup

To install the environment for STS with the following commands

pip install -r requirements.txt

Extracting SAE Features

The core operation of STS is to extract sparse features from an LLM using a trained Sparse Autoencoder (SAE). Below, we provide an example of extracting SAE features on LIMO, demonstrating how to load the SAE, hook intermediate activations, and obtain sparse feature representations for downstream usage.

To extract SAE features with the following commands

cd extract_features
VLLM_USE_V1=1 python evaluate2.py

Evaluation of Downstream Performance

We use the official evaluation implementation provided by https://github.com/TIGER-AI-Lab/MMLU-Pro.

Calculating STS Metrics

After obtaining SAE features, open sts.ipynb to calculate STS and correlation coefficient.

Citing this work

If you find the work useful, please cite the accompanying paper:

@inproceedings{
zhang2026sae,
title={{SAE} as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of {LLM}s without Training},
author={Qi Zhang and Yifei Wang and Xiaohan Wang and Jiajun Chai and Guojun Yin and Wei Lin and Yisen Wang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
extract_features		extract_features
README.md		README.md
requirements.txt		requirements.txt
sts.ipynb		sts.ipynb
test.png		test.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STS (SAE-based Transferability Score)

Instructions

Environment Setup

Extracting SAE Features

Evaluation of Downstream Performance

Calculating STS Metrics

Citing this work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STS (SAE-based Transferability Score)

Instructions

Environment Setup

Extracting SAE Features

Evaluation of Downstream Performance

Calculating STS Metrics

Citing this work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages