Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama + sentence-transformers with torch cuda #4453

Open
qsdhj opened this issue May 15, 2024 · 0 comments
Open

Ollama + sentence-transformers with torch cuda #4453

qsdhj opened this issue May 15, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@qsdhj
Copy link

qsdhj commented May 15, 2024

What is the issue?

Hi,

I use ollama together with the intfloat/multilingual-e5-base sentence-transformer in langchain and llamaIndex in python.

If I use the torch version without CUDA everything works as expected, just my embeddings are created slow.
With the torch cuda version installed this way:
grafik

As soon as I loaded the sentence-transformer in my python script the weird behaviour starts.
The first prompt to a model in ollama is working normal (takes a few seconds). From the second prompt onwards my GPU is on 100% load for a few minutes, than I get the response from the llm.

This happens with the llamaindex / langchain API in python and with the cli.
If I terminate my python script and restart ollama its working normal again.

I use a Laptop with Windows 11
11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz 2.50 GHz,
32GB Ram
NVIDIA RTX A3000 6GB

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.37

@qsdhj qsdhj added the bug Something isn't working label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant