Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with GPU allocation after updating to CTranslate2 4.0.0 #1628

Closed
carolinaxxxxx opened this issue Feb 22, 2024 · 1 comment
Closed

Comments

@carolinaxxxxx
Copy link

carolinaxxxxx commented Feb 22, 2024

When the device_index = 1 parameter (GPU 1), the GPU 0 is charged with low value data (in my case about 263 MB) and GPU itself shows signs of work, although it should not. This is clearly the result of the CTranslate2 4.0.0. After returning to ctranslate2 3.24.0 the problem disappears.

306856309-6f1b5824-3b02-4496-bf3e-9b90b95e6554

The above example is for the whisper model, but I also tried with llm models and the same result.

@minhthuc2502
Copy link
Collaborator

This is not a bug, just from CUDA 12, it seems that it takes more memory for initializing GPU. The logic is the same as the version before. I have a small fix here to prevent initializing unused GPU. Thank you for reporting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants