Official Implementation for paper "Your Embedding Matrix is secretly a Feature Lens for Text Embeddings". This repository introduces a simple, lightweight linear filter designed to refine zero-shot text embeddings.
- recommend: python 3.10, torch==2.6.0, mteb==1.4.0, transformers==4.52.3
- if fail to load
SickrSTS, change pathMMathematica/sickr-stsin the mteb package (*/mteb/tasks/STS/en/SickrSTS.py) tomteb/sickr-sts - if fail to load
MindSmallReranking, try datasets==2.18 bypip install datasets==2.18
- run the EmbFilter with
python run4qwen_prompteol.py --filter_ratio 2 filter_ratiois the ratio of dims to be saved, e.g.,filter_ratio=1means saving 1/1=100% dims,filter_ratio=2means saving 1/2=50% dims, and so on.
This paper has informed us a new design for LLM text embedding training, which stays tuned for the release! If you find this code useful useful for your research, please cite our paper.
@misc{wu2026unembeddingmatrixsecretlyfeature,
title={Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings},
author={Songhao Wu and Zhongxin Chen and Yuxuan Liu and Heng Cui and Cong Li and Rui Yan},
year={2026},
eprint={2606.07502},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2606.07502},
}