Skip to content

fix: avoid GPU-CPU sync in MTP#1558

Merged
AlpinDale merged 1 commit into
mainfrom
mtp-d2h
Nov 3, 2025
Merged

fix: avoid GPU-CPU sync in MTP#1558
AlpinDale merged 1 commit into
mainfrom
mtp-d2h

Conversation

@AlpinDale

Copy link
Copy Markdown
Member

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a performance issue by avoiding a GPU-CPU synchronization in the MTP forward pass. The change replaces an indexed assignment with torch.where. While this is a valid approach, I've suggested an alternative using masked_fill_ which is more memory-efficient as it performs an in-place modification, similar to the original code's intent.

Comment thread aphrodite/modeling/models/deepseek_mtp.py
@AlpinDale AlpinDale merged commit b72fba8 into main Nov 3, 2025
0 of 4 checks passed
@AlpinDale AlpinDale deleted the mtp-d2h branch November 3, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant