CUDA out of memory @ `util.paraphrase_mining` #1712

PhilipMay · 2022-10-04T14:46:22Z

Hi,

I am using util.paraphrase_mining on 3,463,703 sentences and a 16 GB GPU:

paraphrases = util.paraphrase_mining(
    model, sentences, 
    show_progress_bar=True,
    batch_size=128, 
)

I am getting a CUDA out of memory error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [7], line 1
----> 1 paraphrases = util.paraphrase_mining(
      2     model, sentences, 
      3     show_progress_bar=True,
      4     batch_size=128, 
      5 #    query_chunk_size=10_000,  # def: 5000
      6 #    corpus_chunk_size=200_000,  # def: 100000
      7 )

File ~/miniconda3/envs/paraphrase-mining/lib/python3.9/site-packages/sentence_transformers/util.py:130, in paraphrase_mining(model, sentences, show_progress_bar, batch_size, *args, **kwargs)
    113 """
    114 Given a list of sentences / texts, this function performs paraphrase mining. It compares all sentences against all
    115 other sentences and returns a list with the pairs that have the highest cosine similarity score.
   (...)
    126 :return: Returns a list of triplets with the format [score, id1, id2]
    127 """
    129 # Compute embedding for the sentences
--> 130 embeddings = model.encode(sentences, show_progress_bar=show_progress_bar, batch_size=batch_size, convert_to_tensor=True)
    132 return paraphrase_mining_embeddings(embeddings, *args, **kwargs)

File ~/miniconda3/envs/paraphrase-mining/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py:195, in SentenceTransformer.encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
    192 all_embeddings = [all_embeddings[idx] for idx in np.argsort(length_sorted_idx)]
    194 if convert_to_tensor:
--> 195     all_embeddings = torch.stack(all_embeddings)
    196 elif convert_to_numpy:
    197     all_embeddings = np.asarray([emb.numpy() for emb in all_embeddings])

RuntimeError: CUDA out of memory. Tried to allocate 9.91 GiB (GPU 0; 15.75 GiB total capacity; 10.95 GiB already allocated; 85.56 MiB free; 11.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am using a model based on xlm-r-distilroberta-base-paraphrase-v1 and the folling packages:

sentence-transformers 2.2.2
torch                 1.12.1
transformers          4.22.2

The text was updated successfully, but these errors were encountered:

PhilipMay · 2022-10-04T14:58:11Z

I guess all_embeddings = torch.stack(all_embeddings) should be done on CPU and not on GPU?

sentence-transformers/sentence_transformers/SentenceTransformer.py

Line 195 in a8cebb2

all_embeddings = torch.stack(all_embeddings)

PhilipMay · 2022-10-04T15:10:48Z

Putting this before the "stack" might fix the bug: all_embeddings = [e.cpu() for e in all_embeddings].

PhilipMay · 2022-10-07T06:40:01Z

@nreimers the solution above works for me and fixes the issue.
I am not 100% sure of the side effects. Is it ok to move all tensors in the list from GPU to CPU?

What do you think? Should I create a PR?

Many thanks
Philip

nreimers · 2022-10-07T07:24:30Z

Hi @PhilipMay
sadly it has side effects and it is unclear if you want to have this or not (or: It depends on the use-case if you want to have it or not).

If your GPU has enough memory, you want to keep the tensors on the GPU, because:

Subsequent operations e.g. for semantic search / paraphrase mining / clustering are much faster on the GPU
It can take quite some time to transfer to CPU.

So you only want this line if you run OOM. So maybe some option would be needed.

Also torch.stack currently doubles the need for memory, as it has at some time all old tensors and the new tensors.

Maybe a better solution would be to create the final matrix up-front in the encode method and to write the generated embeddings to this result matrix? Then we wouldn't have overhead of duplicating all embeddings.

Lavriz · 2022-12-09T11:50:47Z

@nreimers the solution above works for me and fixes the issue. I am not 100% sure of the side effects. Is it ok to move all tensors in the list from GPU to CPU?

What do you think? Should I create a PR?

Many thanks Philip

Hey @PhilipMay! Thank you for providing the fix.

I was wondering whether you encountered an issue like that after using this solution: the task is finished according to the progress bar, but it's still running in Jupyter (having an asterisk)?

PhilipMay · 2022-12-09T12:14:51Z

I was wondering whether you encountered an issue like that after using this solution: the task is finished according to the progress bar, but it's still running in Jupyter (having an asterisk)?

No. I can not remember something like that.

chschroeder mentioned this issue Dec 22, 2022

GPU out of memory issues #1793

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory @ `util.paraphrase_mining` #1712

CUDA out of memory @ `util.paraphrase_mining` #1712

PhilipMay commented Oct 4, 2022 •

edited

Loading

PhilipMay commented Oct 4, 2022 •

edited

Loading

PhilipMay commented Oct 4, 2022

PhilipMay commented Oct 7, 2022

nreimers commented Oct 7, 2022

Lavriz commented Dec 9, 2022

PhilipMay commented Dec 9, 2022

CUDA out of memory @ util.paraphrase_mining #1712

CUDA out of memory @ util.paraphrase_mining #1712

Comments

PhilipMay commented Oct 4, 2022 • edited Loading

PhilipMay commented Oct 4, 2022 • edited Loading

PhilipMay commented Oct 4, 2022

PhilipMay commented Oct 7, 2022

nreimers commented Oct 7, 2022

Lavriz commented Dec 9, 2022

PhilipMay commented Dec 9, 2022

CUDA out of memory @ `util.paraphrase_mining` #1712

CUDA out of memory @ `util.paraphrase_mining` #1712

PhilipMay commented Oct 4, 2022 •

edited

Loading

PhilipMay commented Oct 4, 2022 •

edited

Loading