Help to run GPTFast on Mixtral-8x7B-Instruct-v0.1 #25

davideuler · 2024-04-09T16:36:02Z

Could you help to give an example code to run GPTFast on Mixtral-8x7B-Instruct-v0.1?

I load the model with GPTFast with empty draft_model_name. Error shows when loading the model as following.

model_name = "./Mixtral-8x7B-v0.1"
draft_model_name = ""

tokenizer = AutoTokenizer.from_pretrained(model_name)
initial_string = "Write me a short story."
input_tokens = tokenizer.encode(initial_string, return_tensors="pt").to(device)

# ....

Traceback (most recent call last):
File "/data/gptfast.py", line 77, in
gpt_fast_model = gpt_fast(model_name, sample_function=argmax, max_length=60, cache_config=cache_config, draft_model_name=draft_model_name)
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/GPTFast.py", line 11, in gpt_fast
model = add_kv_cache(model, sample_function, max_length, cache_config, dtype=torch.float16)
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/KVCache/KVCacheModel.py", line 208, in add_kv_cache
model = KVCacheModel(transformer, sampling_fn, max_length, cache_config, dtype)
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/KVCache/KVCacheModel.py", line 21, in init
self._model = self.add_static_cache_to_model(model, cache_config, max_length, dtype, self.device)
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/KVCache/KVCacheModel.py", line 48, in add_static_cache_to_model
module_forward_str_kv_cache = add_input_pos_to_func_str(module_forward_str, forward_prop_ref, "input_pos=input_pos")
File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Helpers/String/add_input_pos_to_func_str.py", line 18, in add_input_pos_to_func_str
raise ValueError("Submodule forward pass not found.")
ValueError: Submodule forward pass not found.

MDK8888 · 2024-04-10T23:17:30Z

Hey David, apologies for the late response. Mixtral should support static caching natively, and a new branch should be up this weekend or early next week with the fixes.

davideuler · 2024-04-11T02:44:08Z

Hey David, apologies for the late response. Mixtral should support static caching natively, and a new branch should be up this weekend or early next week with the fixes.

Thanks, looking forward the new branch.

Rinocerbul mentioned this issue Apr 24, 2024

Help run GPTFast on Mistral-7B-v0.1 (or v.02) for CausalML #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help to run GPTFast on Mixtral-8x7B-Instruct-v0.1 #25

Help to run GPTFast on Mixtral-8x7B-Instruct-v0.1 #25

davideuler commented Apr 9, 2024

MDK8888 commented Apr 10, 2024

davideuler commented Apr 11, 2024

Help to run GPTFast on Mixtral-8x7B-Instruct-v0.1 #25

Help to run GPTFast on Mixtral-8x7B-Instruct-v0.1 #25

Comments

davideuler commented Apr 9, 2024

MDK8888 commented Apr 10, 2024

davideuler commented Apr 11, 2024