Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 21, 2025

Overview

This PR adds comprehensive unit tests for the limit_thinking_content_length and speculate_limit_thinking_content_length operators that control thinking phase length in model generation.

Background

The operators limit_thinking_content_length_v1, limit_thinking_content_length_v2, speculate_limit_thinking_content_length_v1, and speculate_limit_thinking_content_length_v2 are GPU custom operators designed to limit the length of "thinking" content during model inference. These operators work by:

  • v1 variants: Injecting a </think> token when max_think_len is exceeded
  • v2 variants: Injecting a \n</think>\n\n sequence when max_think_len is exceeded
  • Speculative variants: Handling multiple tokens per step in speculative decoding scenarios

Previously, these operators lacked unit tests, making it difficult to verify their correctness and catch regressions.

Changes

Added two new test files with 33 comprehensive test methods (925 lines total):

1. tests/operators/test_limit_thinking_content_length.py

Tests the standard (non-speculative) variants with 16 test methods covering:

  • Normal thinking phase operation
  • Force truncation when step >= max_think_len
  • Natural think_end_id generation by the model
  • Status transitions through all phases
  • Disabled feature handling (negative max_think_len)
  • Terminal status behavior
  • Mixed batch scenarios

2. tests/operators/test_speculate_limit_thinking_content_length.py

Tests the speculative decoding variants with 17 test methods covering:

  • Multi-token acceptance and processing
  • Force truncation with accept_num adjustment
  • step_idx and seq_lens_decoder updates
  • Zero accept_num early return
  • Sequential token injection for v2 (4-token sequence: \n, </think>, \n, \n)
  • Status transitions through multiple accepted tokens

Test Coverage

The tests verify:

  • ✅ Correct token replacement when limits are exceeded
  • ✅ Proper status state machine transitions (0→1→2 for v1, 0→1→2→3 for v2)
  • ✅ Handling of edge cases (disabled feature, terminal states)
  • ✅ Batch processing with sequences in different states
  • ✅ Speculative decoding token truncation and metadata updates

Example Test Case

def test_force_truncation_when_max_think_len_exceeded(self):
    """Test force truncation when step >= max_think_len"""
    next_tokens = paddle.to_tensor([[100], [200]], dtype="int64")
    max_think_lens = paddle.to_tensor([5, 8], dtype="int32")
    step_idx = paddle.to_tensor([[5], [10]], dtype="int64")  # Both exceed or equal limit
    limit_think_status = paddle.to_tensor([0, 0], dtype="int32")
    think_end_id = 999

    limit_thinking_content_length_v1(
        next_tokens, max_think_lens, step_idx, limit_think_status, think_end_id
    )

    # Verify: tokens replaced with think_end_id, status changed to 1
    assert next_tokens.numpy()[0, 0] == 999  # Replaced
    assert next_tokens.numpy()[1, 0] == 999  # Replaced
    assert limit_think_status.numpy()[0] == 1  # Status updated
    assert limit_think_status.numpy()[1] == 1  # Status updated

Verification

  • ✅ All tests follow existing patterns in tests/operators/
  • ✅ Code style compliant (line length < 120, proper formatting)
  • ✅ No security vulnerabilities (CodeQL check passed)
  • ✅ Syntax validated

Note: Tests require a GPU environment with PaddlePaddle and dependencies installed to run.

Related Issue

Closes the issue requesting unit tests for limit_thinking_content_length and speculate_limit_thinking_content_length functions in fastdeploy/model_executor/pre_and_post_process.py.

Original prompt

This section details on the original issue you should resolve

<issue_title>参考tests/operators中的单测,给fastdeploy/model_executor/pre_and_post_process.py中的limit_thinking_content_length、speculate_limit_thinking_content_length添加单测文件</issue_title>
<issue_description></issue_description>

<agent_instructions>通过翻阅custom_ops/gpu_ops中对应算子的实现,了解算子功能,然后添加合理的单元测试</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes #4509


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@CLAassistant
Copy link

CLAassistant commented Oct 21, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ yuanlehome
❌ Copilot
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Oct 21, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Oct 21, 2025
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
@yuanlehome yuanlehome marked this pull request as ready for review October 21, 2025 07:31
Copilot AI changed the title [WIP] Add unit tests for limit_thinking content length methods Add comprehensive unit tests for limit_thinking_content_length operators Oct 21, 2025
Copilot AI requested a review from yuanlehome October 21, 2025 07:33
Copilot finished work on behalf of yuanlehome October 21, 2025 07:33
@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 1753913 into develop Oct 21, 2025
14 of 16 checks passed
@yuanlehome yuanlehome deleted the copilot/add-unit-tests-for-limit-thinking branch October 21, 2025 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

4 participants