feat: implement DeepSeek-V4 model by AlpinDale · Pull Request #1651 · dphnAI/sonar

AlpinDale · 2026-04-27T13:05:05Z

Based on vllm-project/vllm#40860

Signed-off-by: AlpinDale <alpindale@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 22277e7f47

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T13:11:32Z

  dim3 block(std::min<int64_t>(num_heads * rot_dim / 2, 512));
  const at::cuda::OptionalCUDAGuard device_guard(device_of(query));
  const cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  auto cache_f32 = cos_sin_cache.to(torch::kFloat32);


Avoid per-call fp32 conversion of RoPE cache

This change unconditionally does cos_sin_cache.to(torch::kFloat32) inside rotary_embedding, which runs every kernel invocation. Most callers (e.g. RotaryEmbedding.forward_cuda) pass fp16/bf16 caches, so this introduces a full cache dtype conversion/allocation on each decode step, causing major throughput and memory regressions across non-DeepSeek models; the conversion should be done once (or only when needed) instead of in the hot path.

Useful? React with 👍 / 👎.

Signed-off-by: AlpinDale <alpindale@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8274e7401b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T13:44:21Z

+    if any(
+        isinstance(g.kv_cache_spec, MambaSpec) and g.kv_cache_spec.block_size != cache_config.block_size for g in groups
+    ):
+        return scheduler_block_size, scheduler_block_size


Keep hash_block_size as a divisor for hybrid Mamba

When a hybrid configuration includes a non-align MambaSpec, this branch returns hash_block_size = scheduler_block_size (the LCM of all group block sizes). For mixed block sizes, that value is larger than at least one group size, so it violates the downstream requirement that each block_size be divisible by hash_block_size (see the assertion in HybridKVCacheCoordinator). In practice this causes coordinator initialization to fail for hybrid Mamba+attention setups with prefix caching/connector enabled, instead of gracefully disabling finer-grained hashing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-27T13:44:21Z

+        offsets: torch.Tensor | None = None,
+        inverse: bool = False,
+    ) -> tuple[torch.Tensor, torch.Tensor | None]:
+        return self.forward_native(positions, query, key, offsets)


Propagate inverse flag in HIP DeepseekV4 rotary path

forward_hip accepts an inverse argument but drops it when delegating to forward_native, so ROCm execution always applies the non-inverse rotation. Any call site that requests inverse RoPE on HIP (for example, de-rotation flows) will silently compute the wrong transform and produce incorrect outputs.

Useful? React with 👍 / 👎.

Signed-off-by: AlpinDale <alpindale@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 283de047a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T17:08:35Z

+        topk_indices = hash_indices_table[input_tokens]
+    else:
+        use_sorted = envs.APHRODITE_BATCH_INVARIANT
+        topk_indices = torch.topk(scores_for_choice, k=topk, dim=-1, sorted=use_sorted)[1]
    topk_weights = scores.gather(1, topk_indices)


Cast hashed expert indices to int64 before gather

When hash_indices_table is used, topk_indices comes directly from the table and is then passed to scores.gather(...) without dtype normalization. In DeepSeek V4 hash routing, that table is explicitly created as torch.int32 for the non-mega path (deepseek_v4.py sets hash_indices_dtype=torch.int32 and builds tid2eid with that dtype), so the fallback path here can hit a runtime error because gather expects long/int64 indices. This affects hash-MoE routing on the non-custom-op path (e.g., ROCm/aiter-enabled fallback) and will fail before routing can complete.

Useful? React with 👍 / 👎.

feat: implement DeepSeek-V4 model

22277e7

Signed-off-by: AlpinDale <alpindale@gmail.com>

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

fix: namespace issues

8274e74

Signed-off-by: AlpinDale <alpindale@gmail.com>

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

fix: cmake context

283de04

Signed-off-by: AlpinDale <alpindale@gmail.com>

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

AlpinDale merged commit 61aad7c into main Apr 27, 2026
1 check failed

AlpinDale deleted the feat/deepseek_v4_impl branch April 27, 2026 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: implement DeepSeek-V4 model#1651

feat: implement DeepSeek-V4 model#1651
AlpinDale merged 3 commits into
mainfrom
feat/deepseek_v4_impl

AlpinDale commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

AlpinDale commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant