Ollama + sentence-transformers with torch cuda #4453

qsdhj · 2024-05-15T14:59:37Z

What is the issue?

Hi,

I use ollama together with the intfloat/multilingual-e5-base sentence-transformer in langchain and llamaIndex in python.

If I use the torch version without CUDA everything works as expected, just my embeddings are created slow.
With the torch cuda version installed this way:

As soon as I loaded the sentence-transformer in my python script the weird behaviour starts.
The first prompt to a model in ollama is working normal (takes a few seconds). From the second prompt onwards my GPU is on 100% load for a few minutes, than I get the response from the llm.

This happens with the llamaindex / langchain API in python and with the cli.
If I terminate my python script and restart ollama its working normal again.

I use a Laptop with Windows 11
11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz 2.50 GHz,
32GB Ram
NVIDIA RTX A3000 6GB

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.37

The text was updated successfully, but these errors were encountered:

qsdhj added the bug Something isn't working label May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama + sentence-transformers with torch cuda #4453

Ollama + sentence-transformers with torch cuda #4453

qsdhj commented May 15, 2024

Ollama + sentence-transformers with torch cuda #4453

Ollama + sentence-transformers with torch cuda #4453

Comments

qsdhj commented May 15, 2024

What is the issue?

OS

GPU

CPU

Ollama version