Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory @ util.paraphrase_mining #1712

Open
PhilipMay opened this issue Oct 4, 2022 · 6 comments
Open

CUDA out of memory @ util.paraphrase_mining #1712

PhilipMay opened this issue Oct 4, 2022 · 6 comments

Comments

@PhilipMay
Copy link
Contributor

PhilipMay commented Oct 4, 2022

Hi,

I am using util.paraphrase_mining on 3,463,703 sentences and a 16 GB GPU:

paraphrases = util.paraphrase_mining(
    model, sentences, 
    show_progress_bar=True,
    batch_size=128, 
)

I am getting a CUDA out of memory error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [7], line 1
----> 1 paraphrases = util.paraphrase_mining(
      2     model, sentences, 
      3     show_progress_bar=True,
      4     batch_size=128, 
      5 #    query_chunk_size=10_000,  # def: 5000
      6 #    corpus_chunk_size=200_000,  # def: 100000
      7 )

File ~/miniconda3/envs/paraphrase-mining/lib/python3.9/site-packages/sentence_transformers/util.py:130, in paraphrase_mining(model, sentences, show_progress_bar, batch_size, *args, **kwargs)
    113 """
    114 Given a list of sentences / texts, this function performs paraphrase mining. It compares all sentences against all
    115 other sentences and returns a list with the pairs that have the highest cosine similarity score.
   (...)
    126 :return: Returns a list of triplets with the format [score, id1, id2]
    127 """
    129 # Compute embedding for the sentences
--> 130 embeddings = model.encode(sentences, show_progress_bar=show_progress_bar, batch_size=batch_size, convert_to_tensor=True)
    132 return paraphrase_mining_embeddings(embeddings, *args, **kwargs)

File ~/miniconda3/envs/paraphrase-mining/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py:195, in SentenceTransformer.encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
    192 all_embeddings = [all_embeddings[idx] for idx in np.argsort(length_sorted_idx)]
    194 if convert_to_tensor:
--> 195     all_embeddings = torch.stack(all_embeddings)
    196 elif convert_to_numpy:
    197     all_embeddings = np.asarray([emb.numpy() for emb in all_embeddings])

RuntimeError: CUDA out of memory. Tried to allocate 9.91 GiB (GPU 0; 15.75 GiB total capacity; 10.95 GiB already allocated; 85.56 MiB free; 11.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am using a model based on xlm-r-distilroberta-base-paraphrase-v1 and the folling packages:

sentence-transformers 2.2.2
torch                 1.12.1
transformers          4.22.2

2022-10-04 16_36_22-04_tk_hilft_… (2) - JupyterLab – Mozilla Firefox

@PhilipMay
Copy link
Contributor Author

PhilipMay commented Oct 4, 2022

I guess all_embeddings = torch.stack(all_embeddings) should be done on CPU and not on GPU?

all_embeddings = torch.stack(all_embeddings)

@PhilipMay
Copy link
Contributor Author

Putting this before the "stack" might fix the bug: all_embeddings = [e.cpu() for e in all_embeddings].

@PhilipMay
Copy link
Contributor Author

@nreimers the solution above works for me and fixes the issue.
I am not 100% sure of the side effects. Is it ok to move all tensors in the list from GPU to CPU?

What do you think? Should I create a PR?

Many thanks
Philip

@nreimers
Copy link
Member

nreimers commented Oct 7, 2022

Hi @PhilipMay
sadly it has side effects and it is unclear if you want to have this or not (or: It depends on the use-case if you want to have it or not).

If your GPU has enough memory, you want to keep the tensors on the GPU, because:

  • Subsequent operations e.g. for semantic search / paraphrase mining / clustering are much faster on the GPU
  • It can take quite some time to transfer to CPU.

So you only want this line if you run OOM. So maybe some option would be needed.

Also torch.stack currently doubles the need for memory, as it has at some time all old tensors and the new tensors.

Maybe a better solution would be to create the final matrix up-front in the encode method and to write the generated embeddings to this result matrix? Then we wouldn't have overhead of duplicating all embeddings.

@Lavriz
Copy link

Lavriz commented Dec 9, 2022

@nreimers the solution above works for me and fixes the issue. I am not 100% sure of the side effects. Is it ok to move all tensors in the list from GPU to CPU?

What do you think? Should I create a PR?

Many thanks Philip

Hey @PhilipMay! Thank you for providing the fix.

I was wondering whether you encountered an issue like that after using this solution: the task is finished according to the progress bar, but it's still running in Jupyter (having an asterisk)?

@PhilipMay
Copy link
Contributor Author

I was wondering whether you encountered an issue like that after using this solution: the task is finished according to the progress bar, but it's still running in Jupyter (having an asterisk)?

No. I can not remember something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants