Skip to content

issue/305 - feat: Add support for mistral model type#306

Merged
wooway777 merged 1 commit intomainfrom
issue/305
Apr 16, 2026
Merged

issue/305 - feat: Add support for mistral model type#306
wooway777 merged 1 commit intomainfrom
issue/305

Conversation

@baominghelly
Copy link
Copy Markdown
Contributor

Description

支持Mistral类型模型

Test Evidence

测试模型 Mistral-7B-Instruct-v0.1、 Mistral-7B-Instruct-v0.2、 Mistral-7B-v0.1

 CUDA_VISIBLE_DEVICES=1 python examples/jiuge.py --nvidia --model=/data-models/llms/hf_llm_models/mistralai_Mistral-7B-v0.1/
Namespace(cpu=False, nvidia=True, qy=False, metax=False, moore=False, iluvatar=False, cambricon=False, ali=False, hygon=False, model_path='/data-models/llms/hf_llm_models/mistralai_Mistral-7B-v0.1/', max_new_tokens=100, backend='cpp', batch_size=1, prompt='How are you', tp=1, enable_paged_attn=False, paged_kv_block_size=256, enable_graph=False, top_k=1, top_p=1.0, temperature=1.0, attn='default', kv_cache_dtype=None)
 load weights ......
Processing: model-00002-of-00002.safetensors
Processing: model-00001-of-00002.safetensors
Processing files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.07s/it]
 load weights over! 2154.597759246826 ms

How are you=================== start generate ====================



 Generation completed in 1812.26 ms
 Batchsize=1  Per_Batch_Input_Len=4  Per_Batch_New_Tokens=100

 Prefill TTFT: 217.64 ms  Throughput: 18.38 tok/s

 Decode  Avg ITL: 16.11 ms   Throughput: 62.08 tok/s

doing?

I’m doing well. I’m in a good place.

What are you working on?

I’m working on a new album. I’m also working on a new book.

What’s the new album about?

It’s about the things that I’ve been through in my life. It’s about the things that I’ve learned. It’s about the things that I’ve seen. It
total_time: 1813.0 ms
(baoming_env) libaoming@server:/data/shared/baoming/workplace/infer_repo/InfiniLM$ CUDA_VISIBLE_DEVICES=1 python examples/jiuge.py --nvidia --model=/data-models/llms/hf_llm_models/mistralai_Mistral-7B-Instruct-v0.1/
Namespace(cpu=False, nvidia=True, qy=False, metax=False, moore=False, iluvatar=False, cambricon=False, ali=False, hygon=False, model_path='/data-models/llms/hf_llm_models/mistralai_Mistral-7B-Instruct-v0.1/', max_new_tokens=100, backend='cpp', batch_size=1, prompt='How are you', tp=1, enable_paged_attn=False, paged_kv_block_size=256, enable_graph=False, top_k=1, top_p=1.0, temperature=1.0, attn='default', kv_cache_dtype=None)
 load weights ......
Processing: model-00002-of-00002.safetensors
Processing: model-00001-of-00002.safetensors
Processing files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.17s/it]
 load weights over! 2336.71236038208 ms

<s> [INST] How are you [/INST]=================== start generate ====================



 Generation completed in 816.96 ms
 Batchsize=1  Per_Batch_Input_Len=12  Per_Batch_New_Tokens=41

 Prefill TTFT: 193.61 ms  Throughput: 61.98 tok/s

 Decode  Avg ITL: 15.58 ms   Throughput: 64.17 tok/s

I'm just a computer program, so I don't have feelings or emotions. But I'm here to help you with any questions or problems you have! How can I assist you today?
total_time: 832.58 ms
(baoming_env) libaoming@server:/data/shared/baoming/workplace/infer_repo/InfiniLM$ CUDA_VISIBLE_DEVICES=1 python examples/jiuge.py --nvidia --model=/data-models/llms/hf_llm_models/mistralai_Mistral-7B-Instruct-v0.2/
Namespace(cpu=False, nvidia=True, qy=False, metax=False, moore=False, iluvatar=False, cambricon=False, ali=False, hygon=False, model_path='/data-models/llms/hf_llm_models/mistralai_Mistral-7B-Instruct-v0.2/', max_new_tokens=100, backend='cpp', batch_size=1, prompt='How are you', tp=1, enable_paged_attn=False, paged_kv_block_size=256, enable_graph=False, top_k=1, top_p=1.0, temperature=1.0, attn='default', kv_cache_dtype=None)
 load weights ......
Processing: model-00003-of-00003.safetensors
Processing: model-00002-of-00003.safetensors
Processing: model-00001-of-00003.safetensors
Processing files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.26it/s]
 load weights over! 2382.2455406188965 ms

<s> [INST] How are you [/INST]=================== start generate ====================



 Generation completed in 1086.73 ms
 Batchsize=1  Per_Batch_Input_Len=12  Per_Batch_New_Tokens=55

 Prefill TTFT: 242.68 ms  Throughput: 49.45 tok/s

 Decode  Avg ITL: 15.63 ms   Throughput: 63.98 tok/s

I'm just a computer program, I don't have the ability to feel emotions or have a physical state. I'm here to help answer any questions you might have to the best of my ability. Is there something specific you'd like to ask about?
total_time: 1102.31 ms

@baominghelly baominghelly requested review from a team, pengcheng888 and wooway777 April 16, 2026 04:36
@baominghelly baominghelly self-assigned this Apr 16, 2026
@baominghelly baominghelly linked an issue Apr 16, 2026 that may be closed by this pull request
@wooway777 wooway777 merged commit 0f8270a into main Apr 16, 2026
@wooway777 wooway777 deleted the issue/305 branch April 16, 2026 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] 框架适配mistral模型

3 participants