Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading quantized models with SentenceTransformers #2643

Open
emanjavacas opened this issue May 13, 2024 · 3 comments
Open

Loading quantized models with SentenceTransformers #2643

emanjavacas opened this issue May 13, 2024 · 3 comments

Comments

@emanjavacas
Copy link

Dear maintainers,

Does anybody know if loading quantized model is possible with sentence_transformers? I am currently looking at embedding models, and some of them like Qwen 7B seem to be SentenceTransformer models: https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct

However, SentenceTransformer loading code only accepts the model name, and doesn't expose the underlying transformers loading logic. For this reason, I can't find how to do the quantization (for example using quanto) before loading: https://huggingface.co/docs/transformers/en/quantization

I can't see why this wouldn't be compatible, since SentenceTransformers uses Transformers anyway under the hood.

Any hints appreciated!

@tomaarsen
Copy link
Collaborator

Hello!

We're currently investigating the best approach to add this support in #2578. In particular, that PR will expose some parameters (model_kwargs, config_kwargs, tokenizer_kwargs) to the SentenceTransformer class to allow e.g. easy quantization. Until then, the best solution is to load the Transformer class separately and use the existing model_args. However, I'm not 100% sure if that'll work, as you also end up passing the quantization_config to the AutoConfig.

Something like:

from sentence_transformers.models import Transformer, Pooling, Normalize
from sentence_transformers import SentenceTransformer

transformer = Transformer("Alibaba-NLP/gte-Qwen1.5-7B-instruct", model_args={"trust_remote_code": True, "quantization_config": ...})
pooling = Pooling(transformer.get_word_embedding_dimension(), pooling_mode="lasttoken")
normalize = Normalize()
model = SentenceTransformer(modules=[transformer, pooling, normalize])

(Untested!)

  • Tom Aarsen

@emanjavacas
Copy link
Author

Well, there's only one way to know. I am gonna try and report.
Thanks!

@emanjavacas
Copy link
Author

I was able to run the model, which is cool and thanks a lot for the hints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants