[Cherry-Pick][RL] change glm rope_emb calculation #7316 by zoooo0820 · Pull Request #7318 · PaddlePaddle/FastDeploy

zoooo0820 · 2026-04-10T14:25:28Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-10T14:25:38Z

Thanks for your contribution!

codecov-commenter · 2026-04-10T15:51:08Z

Codecov Report

❌ Patch coverage is 14.28571% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@c756038). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...stdeploy/model_executor/layers/rotary_embedding.py	14.28%	6 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7318   +/-   ##
==============================================
  Coverage               ?   73.84%           
==============================================
  Files                  ?      376           
  Lines                  ?    52960           
  Branches               ?     8268           
==============================================
  Hits                   ?    39110           
  Misses                 ?    11115           
  Partials               ?     2735

Flag	Coverage Δ
GPU	`73.84% <14.28%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-11 14:34 CST

📋 Review 摘要

PR 概述：Cherry Pick PR，调整 GLM 模型 RoPE embedding 计算，新增环境变量 FD_ENABLE_RL 用于对齐训练精度

变更范围：custom_ops/gpu_ops/append_attn/、fastdeploy/envs.py、fastdeploy/model_executor/layers/rotary_embedding.py

影响面 Tag：[RL] [OP]

📝 PR 规范检查

PR 描述中的 Motivation 和 Modifications 章节未填写，建议补充：

描述模板（可直接复制）：

## Motivation

为了使 GLM 模型的推理结果与训练结果对齐，需要调整 RoPE embedding 的计算精度。

## Modifications

1. 新增环境变量 `FD_ENABLE_RL`，用于控制是否启用 RL 对齐模式
2. 在 Python 层（GlmRotaryEmbedding）中，根据 `FD_ENABLE_RL` 选择不同的计算方式
3. 在 CUDA kernel 中，将 `EnforceFmulRN` 硬编码为 `false` 以避免使用 IEEE-754 compliant rounding

问题

级别	文件	概述
🔴 Bug	`custom_ops/gpu_ops/append_attn/decoder_write_cache_with_rope_kernel.cu:152`	硬编码 `EnforceFmulRN=false` 影响范围过大
🔴 Bug	`fastdeploy/model_executor/layers/rotary_embedding.py:91`	Python 和 CUDA 层逻辑不一致
🔴 Bug	`fastdeploy/envs.py:272`	环境变量在 CUDA 层未被使用

总体评价

PR 的目的是对齐 GLM 模型的 RoPE embedding 计算精度，但实现存在问题：CUDA kernel 中硬编码 EnforceFmulRN=false 会影响所有使用这些 kernel 的模型（不仅是 GLM），且 Python 层使用环境变量控制而 CUDA 层直接硬编码，逻辑不一致。建议明确修改范围或使用更细粒度的控制机制。

PaddlePaddle-bot · 2026-04-11T06:34:08Z

+        auto* kernelFn = append_decode_cache_T_neox_partial_rope_kernel<
+            T,
+            PackSize,
+            false>;  // GLM use EnforceFmulRN=false


🔴 Bug 硬编码 EnforceFmulRN=false 会影响所有使用此 kernel 的模型，不仅仅是 GLM。

append_decode_cache_rope() 函数被多种模型调用（如 Qwen、Ernie 等），但注释仅说明 "GLM use EnforceFmulRN=false"。如果此修改仅针对 GLM，则会影响其他模型的精度。

建议：

如果只针对 GLM，需要在调用方判断模型类型或使用单独的 kernel

或者将 EnforceFmulRN 参数化，通过 getEnvEnableRL() 读取环境变量来控制

PaddlePaddle-bot · 2026-04-11T06:34:08Z

+          T,
+          PackSize,
+          QKV_TYPE,
+          false>  // GLM use EnforceFmulRN=false


🔴 Bug 硬编码 EnforceFmulRN=false 会影响所有使用此 kernel 的模型。

同样的问题，此函数不仅用于 GLM 模型，硬编码会影响其他模型。

PaddlePaddle-bot · 2026-04-11T06:34:08Z

        bsz, max_seq_len = position_ids.shape[:2]
-        inv_freq = self.base ** (-paddle.arange(0, self.rotary_dim, 2, dtype="float32") / self.rotary_dim)
-        freqs = paddle.einsum("ij,k->ijk", position_ids.cast("float32"), inv_freq)
+        if envs.FD_ENABLE_RL == 1:


🔴 Bug Python 层使用环境变量 FD_ENABLE_RL 控制，但 CUDA 层直接硬编码为 false，逻辑不一致。

Python 代码中根据 FD_ENABLE_RL 选择不同的计算方式，但在 CUDA kernel 中，EnforceFmulRN 被硬编码为 false，没有读取环境变量 FD_ENABLE_RL。

在 custom_ops/gpu_ops/helper.cu 中定义了 getEnvEnableRL() 函数来读取 FD_ENABLE_RL，但这个函数没有被使用。

建议：如果要使用环境变量控制，CUDA kernel 也应该根据 getEnvEnableRL() 的返回值来设置 EnforceFmulRN。

PaddlePaddle-bot · 2026-04-11T06:34:08Z

        int(os.getenv("FD_SAVE_OUTPUT_CACHE_FOR_PREEMPTED_REQUEST", "1"))
    ),
+    # Whether to align RoPE and moe gate precision with training
+    "FD_ENABLE_RL": lambda: int(os.getenv("FD_ENABLE_RL", "0")),


🔴 Bug 环境变量 FD_ENABLE_RL 在 CUDA 层未被实际使用。

在 custom_ops/gpu_ops/helper.cu:160-166 中定义了 getEnvEnableRL() 函数读取 FD_ENABLE_RL 环境变量，但在 CUDA kernel 调用时没有使用这个返回值，而是直接硬编码 EnforceFmulRN=false。

建议：要么在 CUDA kernel 中使用 getEnvEnableRL() 的返回值，要么移除这个未使用的函数。

EmmonsCurse

LGTM～ Skip coverage check as it mainly relies on tests with RL.

change glm rope_emb calculation

5293e65

zoooo0820 had a problem deploying to Metax_ci April 10, 2026 14:25 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

plusNew001 had a problem deploying to Metax_ci April 10, 2026 16:28 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

glm without EnforceFmulRN

9b0a18b

zoooo0820 force-pushed the cp26_rope branch from ce8b190 to 9b0a18b Compare April 11, 2026 05:12

zoooo0820 had a problem deploying to Metax_ci April 11, 2026 05:12 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix ci

44c3d30

zoooo0820 had a problem deploying to Metax_ci April 11, 2026 06:21 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes Apr 11, 2026

View reviewed changes

ckl117 approved these changes Apr 11, 2026

View reviewed changes

yuanlehome approved these changes Apr 11, 2026

View reviewed changes

EmmonsCurse approved these changes Apr 11, 2026

View reviewed changes

EmmonsCurse added the skip-ci: coverage label Apr 11, 2026

zoooo0820 merged commit 42b0f59 into PaddlePaddle:release/2.6 Apr 11, 2026
52 of 57 checks passed

zoooo0820 deleted the cp26_rope branch April 11, 2026 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-Pick][RL] change glm rope_emb calculation #7316#7318

[Cherry-Pick][RL] change glm rope_emb calculation #7316#7318
zoooo0820 merged 3 commits into
PaddlePaddle:release/2.6from
zoooo0820:cp26_rope

zoooo0820 commented Apr 10, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented Apr 10, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 10, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 11, 2026

Uh oh!

PaddlePaddle-bot Apr 11, 2026

Uh oh!

PaddlePaddle-bot Apr 11, 2026

Uh oh!

PaddlePaddle-bot Apr 11, 2026

Uh oh!

EmmonsCurse left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

zoooo0820 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Apr 10, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

EmmonsCurse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zoooo0820 commented Apr 10, 2026 •

edited

Loading

codecov-commenter commented Apr 10, 2026 •

edited

Loading