Skip to content

PKU-ML/STS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STS (SAE-based Transferability Score)

This repository includes a PyTorch implementation of the ICLR 2026 paper SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training authored by Qi Zhang*, Yifei Wang*, Xiaohan Wang, Jiajun Chai, Guojun Yin, Wei Lin, and Yisen Wang.

STS is a metric that can predict the transferability of LLMs before training. STS identifies shifted dimensions in SAE representations and calculates their correlations with downstream domains. Extensive experiments across multiple models and domains show that STS accurately predicts the transferability of supervised fine-tuning, achieving Pearson correlation coefficients above 0.7 with actual performance changes.

image

Instructions

Environment Setup

To install the environment for STS with the following commands

pip install -r requirements.txt

Extracting SAE Features

The core operation of STS is to extract sparse features from an LLM using a trained Sparse Autoencoder (SAE). Below, we provide an example of extracting SAE features on LIMO, demonstrating how to load the SAE, hook intermediate activations, and obtain sparse feature representations for downstream usage.

To extract SAE features with the following commands

cd extract_features
VLLM_USE_V1=1 python evaluate2.py 

Evaluation of Downstream Performance

We use the official evaluation implementation provided by https://github.com/TIGER-AI-Lab/MMLU-Pro.

Calculating STS Metrics

After obtaining SAE features, open sts.ipynb to calculate STS and correlation coefficient.

Citing this work

If you find the work useful, please cite the accompanying paper:

@inproceedings{
zhang2026sae,
title={{SAE} as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of {LLM}s without Training},
author={Qi Zhang and Yifei Wang and Xiaohan Wang and Jiajun Chai and Guojun Yin and Wei Lin and Yisen Wang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
}

About

Official Code for ICLR 2026 Paper: SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors