Add comprehensive unit tests for limit_thinking_content_length operators #4510

Copilot · 2025-10-21T07:19:48Z

Overview

This PR adds comprehensive unit tests for the limit_thinking_content_length and speculate_limit_thinking_content_length operators that control thinking phase length in model generation.

Background

The operators limit_thinking_content_length_v1, limit_thinking_content_length_v2, speculate_limit_thinking_content_length_v1, and speculate_limit_thinking_content_length_v2 are GPU custom operators designed to limit the length of "thinking" content during model inference. These operators work by:

v1 variants: Injecting a </think> token when max_think_len is exceeded
v2 variants: Injecting a \n</think>\n\n sequence when max_think_len is exceeded
Speculative variants: Handling multiple tokens per step in speculative decoding scenarios

Previously, these operators lacked unit tests, making it difficult to verify their correctness and catch regressions.

Changes

Added two new test files with 33 comprehensive test methods (925 lines total):

1. `tests/operators/test_limit_thinking_content_length.py`

Tests the standard (non-speculative) variants with 16 test methods covering:

Normal thinking phase operation
Force truncation when step >= max_think_len
Natural think_end_id generation by the model
Status transitions through all phases
Disabled feature handling (negative max_think_len)
Terminal status behavior
Mixed batch scenarios

2. `tests/operators/test_speculate_limit_thinking_content_length.py`

Tests the speculative decoding variants with 17 test methods covering:

Multi-token acceptance and processing
Force truncation with accept_num adjustment
step_idx and seq_lens_decoder updates
Zero accept_num early return
Sequential token injection for v2 (4-token sequence: \n, </think>, \n, \n)
Status transitions through multiple accepted tokens

Test Coverage

The tests verify:

✅ Correct token replacement when limits are exceeded
✅ Proper status state machine transitions (0→1→2 for v1, 0→1→2→3 for v2)
✅ Handling of edge cases (disabled feature, terminal states)
✅ Batch processing with sequences in different states
✅ Speculative decoding token truncation and metadata updates

Example Test Case

def test_force_truncation_when_max_think_len_exceeded(self):
    """Test force truncation when step >= max_think_len"""
    next_tokens = paddle.to_tensor([[100], [200]], dtype="int64")
    max_think_lens = paddle.to_tensor([5, 8], dtype="int32")
    step_idx = paddle.to_tensor([[5], [10]], dtype="int64")  # Both exceed or equal limit
    limit_think_status = paddle.to_tensor([0, 0], dtype="int32")
    think_end_id = 999

    limit_thinking_content_length_v1(
        next_tokens, max_think_lens, step_idx, limit_think_status, think_end_id
    )

    # Verify: tokens replaced with think_end_id, status changed to 1
    assert next_tokens.numpy()[0, 0] == 999  # Replaced
    assert next_tokens.numpy()[1, 0] == 999  # Replaced
    assert limit_think_status.numpy()[0] == 1  # Status updated
    assert limit_think_status.numpy()[1] == 1  # Status updated

Verification

✅ All tests follow existing patterns in tests/operators/
✅ Code style compliant (line length < 120, proper formatting)
✅ No security vulnerabilities (CodeQL check passed)
✅ Syntax validated

Note: Tests require a GPU environment with PaddlePaddle and dependencies installed to run.

Related Issue

Closes the issue requesting unit tests for limit_thinking_content_length and speculate_limit_thinking_content_length functions in fastdeploy/model_executor/pre_and_post_process.py.

Original prompt

This section details on the original issue you should resolve

<issue_title>参考tests/operators中的单测，给fastdeploy/model_executor/pre_and_post_process.py中的limit_thinking_content_length、speculate_limit_thinking_content_length添加单测文件</issue_title>
<issue_description></issue_description>

<agent_instructions>通过翻阅custom_ops/gpu_ops中对应算子的实现，了解算子功能，然后添加合理的单元测试</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes #4509

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

CLAassistant · 2025-10-21T07:19:57Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ yuanlehome
❌ Copilot
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot · 2025-10-21T07:20:00Z

Thanks for your contribution!

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

Initial plan

5d5a19b

Copilot AI assigned Copilot and yuanlehome Oct 21, 2025

paddle-bot bot added the contributor External developers label Oct 21, 2025

Copilot started work on behalf of yuanlehome October 21, 2025 07:20 View session

Add comprehensive unit tests for limit_thinking_content_length functions

babc6ad

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

yuanlehome marked this pull request as ready for review October 21, 2025 07:31

Copilot AI changed the title ~~[WIP] Add unit tests for limit_thinking content length methods~~ Add comprehensive unit tests for limit_thinking_content_length operators Oct 21, 2025

Copilot AI requested a review from yuanlehome October 21, 2025 07:33

Copilot finished work on behalf of yuanlehome October 21, 2025 07:33

fix (#4514)

8da32c9

yuanlehome approved these changes Oct 21, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 1753913 into develop Oct 21, 2025
14 of 16 checks passed

yuanlehome deleted the copilot/add-unit-tests-for-limit-thinking branch October 21, 2025 10:59

yuanlehome mentioned this pull request Oct 22, 2025

Optimizing the performance of think length limit using custom operators #4279

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add comprehensive unit tests for limit_thinking_content_length operators #4510

Add comprehensive unit tests for limit_thinking_content_length operators #4510

Copilot AI commented Oct 21, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Oct 21, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add comprehensive unit tests for limit_thinking_content_length operators #4510

Add comprehensive unit tests for limit_thinking_content_length operators #4510

Conversation

Copilot AI commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Background

Changes

1. tests/operators/test_limit_thinking_content_length.py

2. tests/operators/test_speculate_limit_thinking_content_length.py

Test Coverage

Example Test Case

Verification

Related Issue

Comments on the Issue (you are @copilot in this section)

Uh oh!

CLAassistant commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Oct 21, 2025 •

edited

Loading

1. `tests/operators/test_limit_thinking_content_length.py`

2. `tests/operators/test_speculate_limit_thinking_content_length.py`

CLAassistant commented Oct 21, 2025 •

edited

Loading