How to unload and change models for local offline inferencing with Aphrodite? #510

murtaza-nasir · 2024-06-13T05:36:45Z

murtaza-nasir
Jun 13, 2024

I'm trying to compare a few different models by running the same prompts through them using local offline inferencing with Aphrodite, since the API doesn't support changing models.

Here's the code I'm using:

import pandas as pd
import numpy as np
from tqdm.auto import tqdm
import json
from notify_run import Notify
from aphrodite import LLM, SamplingParams
import torch; import gc; from aphrodite.distributed.parallel_state import destroy_model_parallel

datasets = data
models = [
    "/work/ml/text-generation-webui/models/MurtazaNasir_Llama-3-70B-Instruct-32k-v0.1-h6-exl2_4.25bpw",
    "/work/ml/text-generation-webui/models/turboderp_Llama-3-8B-Instruct-exl2_5.0bpw",
]
max_tokens = 25

prompts = [
    "What is a man? A miserable little",
    "Once upon a time",
]

sampling_params = SamplingParams(temperature=1.1, min_p=0.05)

for model_path in models:
    llm = LLM(model=model_path, tensor_parallel_size=2, kv_cache_dtype="fp8", quantization="exl2", disable_custom_all_reduce=True, max_model_len=3000, gpu_memory_utilization=0.95)
    
    outputs = llm.generate(prompts, sampling_params)
    print(f"Results for model {model_path}:")
    print(outputs)

    destroy_model_parallel()
    del llm.llm_engine
    del llm
    gc.collect()
    torch.cuda.empty_cache()

I'm trying to unload the first model and load the second one, but I haven't been able to get it to work. I've tried a few different approaches like destroy_model_parallel(), deleting the llm and llm_engine objects, calling gc.collect() and torch.cuda.empty_cache(), but I can never get the second model to load successfully after running the first one.

What is the correct way to unload a model and load a new one in Aphrodite for local inferencing? Is there a specific sequence of steps or additional cleanup required to fully release the GPU memory and resources used by the first model?

I'd appreciate any guidance or code examples showing the proper way to handle switching between models. Thanks in advance for any help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to unload and change models for local offline inferencing with Aphrodite? #510

{{title}}

Replies: 0 comments

Select a reply

How to unload and change models for local offline inferencing with Aphrodite? #510

murtaza-nasir Jun 13, 2024

Replies: 0 comments

murtaza-nasir
Jun 13, 2024