Loading quantized models with SentenceTransformers #2643

emanjavacas · 2024-05-13T10:33:13Z

Dear maintainers,

Does anybody know if loading quantized model is possible with sentence_transformers? I am currently looking at embedding models, and some of them like Qwen 7B seem to be SentenceTransformer models: https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct

However, SentenceTransformer loading code only accepts the model name, and doesn't expose the underlying transformers loading logic. For this reason, I can't find how to do the quantization (for example using quanto) before loading: https://huggingface.co/docs/transformers/en/quantization

I can't see why this wouldn't be compatible, since SentenceTransformers uses Transformers anyway under the hood.

Any hints appreciated!

The text was updated successfully, but these errors were encountered:

tomaarsen · 2024-05-13T10:41:17Z

Hello!

We're currently investigating the best approach to add this support in #2578. In particular, that PR will expose some parameters (model_kwargs, config_kwargs, tokenizer_kwargs) to the SentenceTransformer class to allow e.g. easy quantization. Until then, the best solution is to load the Transformer class separately and use the existing model_args. However, I'm not 100% sure if that'll work, as you also end up passing the quantization_config to the AutoConfig.

Something like:

from sentence_transformers.models import Transformer, Pooling, Normalize
from sentence_transformers import SentenceTransformer

transformer = Transformer("Alibaba-NLP/gte-Qwen1.5-7B-instruct", model_args={"trust_remote_code": True, "quantization_config": ...})
pooling = Pooling(transformer.get_word_embedding_dimension(), pooling_mode="lasttoken")
normalize = Normalize()
model = SentenceTransformer(modules=[transformer, pooling, normalize])

(Untested!)

Tom Aarsen

emanjavacas · 2024-05-13T10:53:09Z

Well, there's only one way to know. I am gonna try and report.
Thanks!

emanjavacas · 2024-05-14T08:12:28Z

I was able to run the model, which is cool and thanks a lot for the hints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading quantized models with SentenceTransformers #2643

Loading quantized models with SentenceTransformers #2643

emanjavacas commented May 13, 2024

tomaarsen commented May 13, 2024

emanjavacas commented May 13, 2024

emanjavacas commented May 14, 2024

Loading quantized models with SentenceTransformers #2643

Loading quantized models with SentenceTransformers #2643

Comments

emanjavacas commented May 13, 2024

tomaarsen commented May 13, 2024

emanjavacas commented May 13, 2024

emanjavacas commented May 14, 2024