Add static libraries for batch manager #2

kaiyux · 2023-09-21T03:32:02Z

No description provided.

juney-nvidia · 2023-09-21T03:52:12Z

LGTM, thanks for the quick fix.

# This is the 1st commit message: add download models form www.modelscope.cn # This is the commit message NVIDIA#2: debug # This is the commit message NVIDIA#3: debug

* Fix model name mapping (#2)

* Add README * Add unified converter (#1) * init v3 lite feat * fix moe topk method * fix noaux_tc logic * fix deepseek v3 normal rope * refactor * wo conversion ok debugging build * add quantize for attn.dense * add unified converter support * testing unified converter * add convert checkpoint and update docs --------- Co-authored-by: Zeyu Wang <zeyuw@nvidia.com> * update README * add FP8 notes * Update run.py result * Update V3 README * Update usages of FP8 to BF16 instruction * fix model name mapping (#2) * Update HF ckpt BF16 conversion. * fix config of deepseek kv cache * Remove source code * Deepseek V3 FP8 Support --------- Co-authored-by: jershi425 <83951930+jershi425@users.noreply.github.com> Co-authored-by: Zeyu Wang <zeyuw@nvidia.com> Co-authored-by: Hanyue He <hanyueh@nvidia.com> Co-authored-by: root <root@h20-2.cm.cluster>

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add MNNVL memory mapping support Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> * add more MPI environment for trtllm-llmapi-launch Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> * add MoE communication and prepare kernels Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> * add MNNVL AlltoAll support for DeepSeekV3 Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> * add output dump for throughput benchmark Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> * support dynamic kernel launch grid Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> * address review comments Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> * address review comments #2 Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> --------- Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

# This is the 1st commit message: kernel Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> remove prints Signed-off-by: Ubuntu <dafrimi@nvidia.com> test pass Signed-off-by: Ubuntu <dafrimi@nvidia.com> test refactor with more use cases Signed-off-by: Ubuntu <dafrimi@nvidia.com> refacor Signed-off-by: Ubuntu <dafrimi@nvidia.com> refacor_2 Signed-off-by: Ubuntu <dafrimi@nvidia.com> add tuner wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> autotuner works Signed-off-by: Ubuntu <dafrimi@nvidia.com> bfloat16 works. moer changes to the thop file Signed-off-by: Ubuntu <dafrimi@nvidia.com> is tune for autotuner is True --> gets real tactics configs Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> zeros + quant mode is works Signed-off-by: Ubuntu <dafrimi@nvidia.com> act int8 Signed-off-by: Ubuntu <dafrimi@nvidia.com> removed fp8 for now Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> w4a16 linear module Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> changed cutalss for sm==89 Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> test linear work Signed-off-by: Ubuntu <dafrimi@nvidia.com> add license Signed-off-by: Ubuntu <dafrimi@nvidia.com> works! Signed-off-by: Ubuntu <dafrimi@nvidia.com> refactor + linear test pass Signed-off-by: Ubuntu <dafrimi@nvidia.com> preprocess in load weights Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> refactor + rebase Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> Blackwell not supported Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com> wip Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com> skip blackwell Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com> wip Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com> works Signed-off-by: Ubuntu <dafrimi@nvidia.com> # This is the commit message NVIDIA#2: rebased Signed-off-by: Ubuntu <dafrimi@nvidia.com> # This is the commit message NVIDIA#3: align with my pld worked version of linear Signed-off-by: Ubuntu <dafrimi@nvidia.com> # This is the commit message NVIDIA#4: wip Signed-off-by: Ubuntu <dafrimi@nvidia.com> # This is the commit message NVIDIA#5: refactor Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> # This is the commit message NVIDIA#6: refactor Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> # This is the commit message NVIDIA#7: refactor Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> # This is the commit message NVIDIA#8: refactor Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> # This is the commit message NVIDIA#9: sys path Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> # This is the commit message NVIDIA#10: sys path Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

Add static libraries

ac45219

kaiyux self-assigned this Sep 21, 2023

kaiyux requested review from jdemouth and juney-nvidia September 21, 2023 03:32

juney-nvidia merged commit 9b563ba into main Sep 21, 2023

kaiyux deleted the kaiyu/add_static_libraries branch September 21, 2023 03:52

tdeng521 mentioned this pull request Mar 7, 2024

batch size will affect llm inference results? #1250

Closed

4 tasks

zxs789 mentioned this pull request Jun 4, 2024

H20 Using random weights to infer llama2-13B results in a divide-by-zero error. #1717

Closed

4 tasks

yingcanw added a commit that referenced this pull request Jan 2, 2025

Fix model name mapping (#2) (#2644)

718ef13

* Fix model name mapping (#2)

dongxuy04 added a commit to dongxuy04/TensorRT-LLM that referenced this pull request Apr 25, 2025

address review comments NVIDIA#2

2070dca

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

wu1du2 pushed a commit to wu1du2/TensorRT-LLM that referenced this pull request May 11, 2025

Add static libraries (NVIDIA#2)

73ce7dd

yuxianq added a commit to yuxianq/TensorRT-LLM that referenced this pull request Jul 17, 2025

Online resmooth for fp8 checkpoint on Blackwell. (NVIDIA#2)

a2fb8e5

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>

HuiGao-NV mentioned this pull request Aug 20, 2025

[https://nvbugs/5410391][bug] Support to share device buffers in attention meta #6557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add static libraries for batch manager #2

Add static libraries for batch manager #2

Uh oh!

kaiyux commented Sep 21, 2023

Uh oh!

juney-nvidia commented Sep 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add static libraries for batch manager #2

Add static libraries for batch manager #2

Uh oh!

Conversation

kaiyux commented Sep 21, 2023

Uh oh!

juney-nvidia commented Sep 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants