Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using taobao dataset and trying to do a JOIN; BE just crashes #41182

Open
alberttwong opened this issue Feb 19, 2024 · 2 comments
Open

Using taobao dataset and trying to do a JOIN; BE just crashes #41182

alberttwong opened this issue Feb 19, 2024 · 2 comments
Labels
type/bug Something isn't working

Comments

@alberttwong
Copy link
Contributor

alberttwong commented Feb 19, 2024

using https://forum.starrocks.io/t/retail-ecommerce-funnel-analysis-demo-with-1-million-members-and-87-million-record-dataset-using-starrocks/269 with https://github.com/StarRocks/demo/tree/master/documentation-samples/hudi

running

mysql> select current_version();
+-------------------+
| current_version() |
+-------------------+
| 3.2.2-269e832     |
+-------------------+
1 row in set (1.95 sec)

creating item table.

create table item ( ItemID bigint(20), Name String);
insert into item(ItemID, name) select distinct ItemID, concat("item ", ItemID) from user_behavior;

Now running JOIN. No message from BE. BE container just fails.

with tmp1 as (
  with tmp as (
    select 
      ItemID, 
      t.level as level, 
      count(UserID) as res 
    from 
      (
        select 
          ItemID, 
          UserID, 
          window_funnel(
            1800, 
            timestamp, 
            0, 
            [BehaviorType = 'pv', 
            BehaviorType ='buy' ]
          ) as level 
        from 
          user_behavior 
        where timestamp >= '2017-12-02 00:00:00' 
            and timestamp <= '2017-12-02 23:59:59'
        group by 
          ItemID, 
          UserID
      ) as t 
    where 
      t.level > 0 
    group by 
      t.ItemID, 
      t.level 
  ) 
  select 
    tmp.ItemID, 
    tmp.level, 
    sum(tmp.res) over (
      partition by tmp.ItemID 
      order by 
        tmp.level rows between current row 
        and unbounded following
    ) as retention 
  from 
    tmp
) 
select 
  tmp1.ItemID, 
  i.name,
  tmp1.level, 
  tmp1.retention / last_value(tmp1.retention) over(
    partition by tmp1.ItemID 
    order by 
      tmp1.level desc rows between current row 
      and 1 following
  ) as retention_ratio 
from 
  tmp1 
JOIN item i ON tmp1.ItemID = i.ItemID
order by 
  tmp1.level desc, 
  retention_ratio 
limit 
  10;
@alberttwong alberttwong added the type/bug Something isn't working label Feb 19, 2024
@alberttwong
Copy link
Contributor Author

works fine in celerdata cloud 3.1.3

@mofeiatwork
Copy link
Contributor

hey @alberttwong, it's better to attach the crash reason from be.out file, otherwise we cannot reproduce or diagnose the crash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants