Skip to content

branch-4.1: [Improvement](function) support window funnel v2 #61566#61935

Open
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-61566-branch-4.1
Open

branch-4.1: [Improvement](function) support window funnel v2 #61566#61935
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-61566-branch-4.1

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #61566

apache/doris-website#3506 文档更新

```sql
select count(user_id)
from
(
SELECT 
    user_id,
    WINDOW_FUNNEL_v1(
        1800,             
        'default',        
        event_time,       
        event_type = 'view', 
        event_type = 'add_to_cart', 
        event_type = 'purchase'
    ) AS funnel_step
FROM user_events
GROUP BY user_id
) t
where funnel_step = 1;

1.57 sec -> 0.12 sec
```

<hr>
<h3>V2 与 V1 的语义差异</h3>
<p>V2 在 DEFAULT、INCREASE、DEDUPLICATION 三种模式下与 V1 完全一致,仅 FIXED
模式存在有意的语义变更:</p>
<p><strong>FIXED 模式语义变更</strong></p>
<ul>
<li><strong>V1
语义(物理行相邻)</strong>:要求漏斗链中相邻两个匹配事件之间不能存在任何不匹配条件的行,否则链条断裂。</li>
<li><strong>V2 语义(事件级别连续)</strong>:只要匹配的事件级别是连续递增的(level
1→2→3→4),链条就不断裂。不匹配任何条件的行在 V2 中根本不会被存储,因此不影响链条连续性。</li>
</ul>
<p><strong>示例</strong>:用户 100123 的事件序列为 <code>登录(10:01) → 访问(10:02) →
登录2(10:03) → 下单(10:04) → 付款(10:10)</code>,漏斗条件为 <code>登录 → 访问 → 下单 →
付款</code>。其中 <code>登录2</code> 不匹配任何漏斗条件:</p>
<ul>
<li>V1 结果:2(<code>登录2</code> 打断了物理行相邻,链条在 <code>访问</code> 后断裂)</li>
<li>V2 结果:4(<code>登录2</code> 不参与匹配,漏斗级别 1→2→3→4 连续递增,链条完整)</li>
</ul>
<p>V2 的 FIXED 语义更符合业务直觉——用户关心的是"漏斗步骤是否连续完成",而非"中间是否插入了无关行"。V1
的物理行相邻检查依赖于数据中不相关行的存在与否,在实际业务场景中容易产生非预期的结果。</p>
<p><strong>受影响的测试文件</strong>:</p>

文件 | 变更 | 原因
-- | -- | --
regression-test/data/nereids_p0/aggregate/window_funnel.out |
window_funnel_fixed1: 用户 100123 从 2→4 | FIXED 模式语义变更

regression-test/data/nereids_p0/sql_functions/aggregate_functions/test_aggregate_window_functions.out
| agg_window_window_funnel: 用户 100123 从 2→4(×5行) | 同上(window function
形式,每个 partition 行输出一次)



This pull request introduces a new implementation of the `window_funnel`
aggregate function, called `window_funnel_v2`, which is designed to be
more memory efficient by only storing matched events as (timestamp,
event_index) pairs. The changes also rename the original implementation
to `window_funnel_v1` and update function registrations and aliases
accordingly. Additionally, the front-end and test files are updated to
support and validate the new implementation.

The most important changes are:

### Backend (BE) Function Implementation and Registration

* Added a new file `aggregate_function_window_funnel_v2.cpp`
implementing `window_funnel_v2`, which only supports DateTime types for
the window argument and is registered with the factory.
[[1]](diffhunk://#diff-a4382e322c2d2744d8c9078201207dac43afab8cca4cf0ee27c430057ba4f3f7R1-R53)
[[2]](diffhunk://#diff-d14e703e022713963a2ea0aa14bb71f0fb4ccbf8ada913c71347e97e74cfddb8R59)
[[3]](diffhunk://#diff-d14e703e022713963a2ea0aa14bb71f0fb4ccbf8ada913c71347e97e74cfddb8R113)
* Renamed the original `window_funnel` function to `window_funnel_v1` in
the registration, and set up an alias so that `window_funnel` now points
to `window_funnel_v1` for backward compatibility.

### Frontend (FE) Function Classes and Registration

* Added a new class `WindowFunnelV2` in the FE codebase, with signature
checks and visitor support, to represent the new aggregate function.
[[1]](diffhunk://#diff-9a98756874c78819dbec3dd1b58e35ca9812fd97b52fd62063776d80a6d8a5a7R1-R133)
[[2]](diffhunk://#diff-df12d42e7cf55119d84a24ac5dc8e7b8207e18e88ea0776f8176593facf5dd76R98)
[[3]](diffhunk://#diff-0a9452ef18cf7215271cce4f886ed4759322441a133d15bcbbf57aac61b7652fR96)
[[4]](diffhunk://#diff-0a9452ef18cf7215271cce4f886ed4759322441a133d15bcbbf57aac61b7652fR408-R411)
* Updated the function registration so that `window_funnel` now points
to `WindowFunnelV2`, and the old implementation is available as
`window_funnel_v1`.
[[1]](diffhunk://#diff-df12d42e7cf55119d84a24ac5dc8e7b8207e18e88ea0776f8176593facf5dd76L196-R198)
[[2]](diffhunk://#diff-1311d9eafa9e1244a161e21a24d97af049dcd05b08be9a3bc655805ab1780a8eL68-R68)
[[3]](diffhunk://#diff-bf1978d57acdd865d169ed2b9c0b67c932698cf9313e5947078f11fb8d151335R58)

### Test and Output Updates

* Added a new regression test output file for `window_funnel_v2` with
various test cases.
* Updated existing test output files to reflect the new behavior and
results of the `window_funnel` function, which now uses the v2
implementation by default.
[[1]](diffhunk://#diff-ec7a35e4876ce5b3093a11196b969d389806bfa0bfc7570a3d9163c3f45adcdeL120-R120)
[[2]](diffhunk://#diff-6cdd8c456c7c16a3e70306901d104420d68a89d90960f931e81e088db6c805adL810-R814)

### Miscellaneous

* Made `string_to_window_funnel_mode` inline for potential performance
improvement.
@github-actions github-actions bot requested a review from yiguolei as a code owner March 31, 2026 07:27
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Mar 31, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Mar 31, 2026
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Mar 31, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 39.02% (16/41) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 31, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor Author

PR approved by anyone and no changes requested.

@yiguolei
Copy link
Copy Markdown
Contributor

yiguolei commented Apr 1, 2026

skip buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants