Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to unload/reload model from VRAM/RAM after IDLE timeout #196

Open
v3DJG6GL opened this issue Feb 15, 2024 · 3 comments
Open

Comments

@v3DJG6GL
Copy link

First of all thanks for this great project!

Description

I would like to have an option to set an idle time after which the model is unloaded from RAM/VRAM.

Background:

I have several applications that use the VRAM of my GPU, one of these is LocalAI.
Since I don't have unlimited VRAM, these applications have to share the available memory among themselves.
Luckily, since some time LocalAI has implemented a watchdog functionality that can be used to unload the model after a specified idle timeout. I'd love to have some similar functionality for whisper-asr-webservice
For now, whisper-asr-webservice is occupying 1/3rd of my VRAM although it is used only from time to time.

@LuisMalhadas
Copy link

I'd like to point out that it implies energy savings as well.

@thfrei
Copy link

thfrei commented Apr 14, 2024

Wouldn't it be this feature?
mudler/LocalAI#1341

@v3DJG6GL
Copy link
Author

Wouldn't it be this feature? mudler/LocalAI#1341

Yes, that's the PR I also linked up there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants