issue/305 - feat: Add support for mistral model type by baominghelly · Pull Request #306 · InfiniTensor/InfiniLM

baominghelly · 2026-04-16T04:36:22Z

Description

支持Mistral类型模型

Test Evidence

测试模型 Mistral-7B-Instruct-v0.1、 Mistral-7B-Instruct-v0.2、 Mistral-7B-v0.1

 CUDA_VISIBLE_DEVICES=1 python examples/jiuge.py --nvidia --model=/data-models/llms/hf_llm_models/mistralai_Mistral-7B-v0.1/
Namespace(cpu=False, nvidia=True, qy=False, metax=False, moore=False, iluvatar=False, cambricon=False, ali=False, hygon=False, model_path='/data-models/llms/hf_llm_models/mistralai_Mistral-7B-v0.1/', max_new_tokens=100, backend='cpp', batch_size=1, prompt='How are you', tp=1, enable_paged_attn=False, paged_kv_block_size=256, enable_graph=False, top_k=1, top_p=1.0, temperature=1.0, attn='default', kv_cache_dtype=None)
 load weights ......
Processing: model-00002-of-00002.safetensors
Processing: model-00001-of-00002.safetensors
Processing files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.07s/it]
 load weights over! 2154.597759246826 ms

How are you=================== start generate ====================



 Generation completed in 1812.26 ms
 Batchsize=1  Per_Batch_Input_Len=4  Per_Batch_New_Tokens=100

 Prefill TTFT: 217.64 ms  Throughput: 18.38 tok/s

 Decode  Avg ITL: 16.11 ms   Throughput: 62.08 tok/s

doing?

I’m doing well. I’m in a good place.

What are you working on?

I’m working on a new album. I’m also working on a new book.

What’s the new album about?

It’s about the things that I’ve been through in my life. It’s about the things that I’ve learned. It’s about the things that I’ve seen. It
total_time: 1813.0 ms
(baoming_env) libaoming@server:/data/shared/baoming/workplace/infer_repo/InfiniLM$ CUDA_VISIBLE_DEVICES=1 python examples/jiuge.py --nvidia --model=/data-models/llms/hf_llm_models/mistralai_Mistral-7B-Instruct-v0.1/
Namespace(cpu=False, nvidia=True, qy=False, metax=False, moore=False, iluvatar=False, cambricon=False, ali=False, hygon=False, model_path='/data-models/llms/hf_llm_models/mistralai_Mistral-7B-Instruct-v0.1/', max_new_tokens=100, backend='cpp', batch_size=1, prompt='How are you', tp=1, enable_paged_attn=False, paged_kv_block_size=256, enable_graph=False, top_k=1, top_p=1.0, temperature=1.0, attn='default', kv_cache_dtype=None)
 load weights ......
Processing: model-00002-of-00002.safetensors
Processing: model-00001-of-00002.safetensors
Processing files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.17s/it]
 load weights over! 2336.71236038208 ms

<s> [INST] How are you [/INST]=================== start generate ====================



 Generation completed in 816.96 ms
 Batchsize=1  Per_Batch_Input_Len=12  Per_Batch_New_Tokens=41

 Prefill TTFT: 193.61 ms  Throughput: 61.98 tok/s

 Decode  Avg ITL: 15.58 ms   Throughput: 64.17 tok/s

I'm just a computer program, so I don't have feelings or emotions. But I'm here to help you with any questions or problems you have! How can I assist you today?
total_time: 832.58 ms
(baoming_env) libaoming@server:/data/shared/baoming/workplace/infer_repo/InfiniLM$ CUDA_VISIBLE_DEVICES=1 python examples/jiuge.py --nvidia --model=/data-models/llms/hf_llm_models/mistralai_Mistral-7B-Instruct-v0.2/
Namespace(cpu=False, nvidia=True, qy=False, metax=False, moore=False, iluvatar=False, cambricon=False, ali=False, hygon=False, model_path='/data-models/llms/hf_llm_models/mistralai_Mistral-7B-Instruct-v0.2/', max_new_tokens=100, backend='cpp', batch_size=1, prompt='How are you', tp=1, enable_paged_attn=False, paged_kv_block_size=256, enable_graph=False, top_k=1, top_p=1.0, temperature=1.0, attn='default', kv_cache_dtype=None)
 load weights ......
Processing: model-00003-of-00003.safetensors
Processing: model-00002-of-00003.safetensors
Processing: model-00001-of-00003.safetensors
Processing files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.26it/s]
 load weights over! 2382.2455406188965 ms

<s> [INST] How are you [/INST]=================== start generate ====================



 Generation completed in 1086.73 ms
 Batchsize=1  Per_Batch_Input_Len=12  Per_Batch_New_Tokens=55

 Prefill TTFT: 242.68 ms  Throughput: 49.45 tok/s

 Decode  Avg ITL: 15.63 ms   Throughput: 63.98 tok/s

I'm just a computer program, I don't have the ability to feel emotions or have a physical state. I'm here to help answer any questions you might have to the best of my ability. Is there something specific you'd like to ask about?
total_time: 1102.31 ms

issue/305 - Add support for mistral model type

99396da

baominghelly requested review from a team, pengcheng888 and wooway777 April 16, 2026 04:36

baominghelly self-assigned this Apr 16, 2026

baominghelly linked an issue Apr 16, 2026 that may be closed by this pull request

[DEV] 框架适配mistral模型 #305

Closed

pengcheng888 approved these changes Apr 16, 2026

View reviewed changes

wooway777 merged commit 0f8270a into main Apr 16, 2026

wooway777 deleted the issue/305 branch April 16, 2026 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue/305 - feat: Add support for mistral model type#306

issue/305 - feat: Add support for mistral model type#306
wooway777 merged 1 commit intomainfrom
issue/305

baominghelly commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

baominghelly commented Apr 16, 2026

Description

Test Evidence

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants