Skip to content

Conversation

@kaiyux
Copy link
Member

@kaiyux kaiyux commented Sep 21, 2023

No description provided.

@kaiyux kaiyux self-assigned this Sep 21, 2023
@juney-nvidia
Copy link
Collaborator

LGTM, thanks for the quick fix.

@juney-nvidia juney-nvidia merged commit 9b563ba into main Sep 21, 2023
@kaiyux kaiyux deleted the kaiyu/add_static_libraries branch September 21, 2023 03:52
liuyhwangyh pushed a commit to liuyhwangyh/TensorRT-LLM that referenced this pull request Mar 21, 2024
# This is the 1st commit message:

add download models form www.modelscope.cn

# This is the commit message NVIDIA#2:

debug

# This is the commit message NVIDIA#3:

debug
yingcanw added a commit that referenced this pull request Jan 2, 2025
* Fix model name mapping (#2)
nv-guomingz pushed a commit that referenced this pull request Jan 24, 2025
* Add README

* Add unified converter (#1)

* init v3 lite feat

* fix moe topk method

* fix noaux_tc logic

* fix deepseek v3 normal rope

* refactor

* wo conversion ok debugging build

* add quantize for attn.dense

* add unified converter support

* testing unified converter

* add convert checkpoint and update docs

---------

Co-authored-by: Zeyu Wang <zeyuw@nvidia.com>

* update README

* add FP8 notes

* Update run.py result

* Update V3 README

* Update usages of FP8 to BF16 instruction

* fix model name mapping (#2)

* Update HF ckpt BF16 conversion.

* fix config of deepseek kv cache

* Remove source code

* Deepseek V3 FP8 Support

---------

Co-authored-by: jershi425 <83951930+jershi425@users.noreply.github.com>
Co-authored-by: Zeyu Wang <zeyuw@nvidia.com>
Co-authored-by: Hanyue He <hanyueh@nvidia.com>
Co-authored-by: root <root@h20-2.cm.cluster>
dongxuy04 added a commit to dongxuy04/TensorRT-LLM that referenced this pull request Apr 25, 2025
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
dongxuy04 added a commit that referenced this pull request Apr 25, 2025
* add MNNVL memory mapping support

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add more MPI environment for trtllm-llmapi-launch

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add MoE communication and prepare kernels

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add MNNVL AlltoAll support for DeepSeekV3

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add output dump for throughput benchmark

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* support dynamic kernel launch grid

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* address review comments

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* address review comments #2

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

---------

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
wu1du2 pushed a commit to wu1du2/TensorRT-LLM that referenced this pull request May 11, 2025
danielafrimi added a commit to danielafrimi/TensorRT-LLM that referenced this pull request Jun 30, 2025
# This is the 1st commit message:

kernel

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

remove prints

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

test pass

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

test refactor with more use cases

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

refacor

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

refacor_2

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

add tuner wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

autotuner works

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

bfloat16 works. moer changes to the thop file

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

is tune for autotuner is True --> gets real tactics configs

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

zeros + quant mode is works

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

act int8

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

removed fp8 for now

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

w4a16 linear module

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

changed cutalss for sm==89

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

test linear work

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

add license

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

works!

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

refactor + linear test pass

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

preprocess in load weights

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

refactor + rebase

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

Blackwell not supported

Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>

wip

Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>

skip blackwell

Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>

wip

Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>

works

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

# This is the commit message NVIDIA#2:

rebased

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

# This is the commit message NVIDIA#3:

align with my pld worked version of linear

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

# This is the commit message NVIDIA#4:

wip

Signed-off-by: Ubuntu <dafrimi@nvidia.com>

# This is the commit message NVIDIA#5:

refactor

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

# This is the commit message NVIDIA#6:

refactor

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

# This is the commit message NVIDIA#7:

refactor

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

# This is the commit message NVIDIA#8:

refactor

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

# This is the commit message NVIDIA#9:

sys path

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

# This is the commit message NVIDIA#10:

sys path

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
yuxianq added a commit to yuxianq/TensorRT-LLM that referenced this pull request Jul 17, 2025
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
litaotju pushed a commit to litaotju/TensorRT-LLM that referenced this pull request Jul 24, 2025
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
yuxianq added a commit to yuxianq/TensorRT-LLM that referenced this pull request Jul 28, 2025
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
zongfeijing pushed a commit to zongfeijing/TensorRT-LLM that referenced this pull request Jul 31, 2025
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants