How to unload and change models for local offline inferencing with Aphrodite? #510
Unanswered
murtaza-nasir
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to compare a few different models by running the same prompts through them using local offline inferencing with Aphrodite, since the API doesn't support changing models.
Here's the code I'm using:
I'm trying to unload the first model and load the second one, but I haven't been able to get it to work. I've tried a few different approaches like
destroy_model_parallel()
, deleting thellm
andllm_engine
objects, callinggc.collect()
andtorch.cuda.empty_cache()
, but I can never get the second model to load successfully after running the first one.What is the correct way to unload a model and load a new one in Aphrodite for local inferencing? Is there a specific sequence of steps or additional cleanup required to fully release the GPU memory and resources used by the first model?
I'd appreciate any guidance or code examples showing the proper way to handle switching between models. Thanks in advance for any help!
Beta Was this translation helpful? Give feedback.
All reactions