Skip to content

Using UNION ALL for self-joins may produce inaccurate results. #59203

@crossoverJie

Description

@crossoverJie

When performing a self-join query, the results are sometimes incorrect.

Steps to reproduce the behavior (Required)

CREATE TABLE `par_tbl1` (
  `datekey` datetime NOT NULL COMMENT "",
  `item_id` varchar(65533) NOT NULL COMMENT "",
  `v1` int(11) NULL COMMENT ""
) ENGINE=OLAP 
PRIMARY KEY(`datekey`, `item_id`)
COMMENT "OLAP"
PARTITION BY date_trunc('day', `datekey`)
DISTRIBUTED BY HASH(`datekey`, `item_id`)

INSERT INTO par_tbl1 (datekey, item_id, v1) VALUES ('2025-01-02 00:00:00', '1', 2);
INSERT INTO par_tbl1 (datekey, item_id, v1) VALUES ('2025-01-01 00:00:00', '1', 2);

# union all
select date_trunc("day", a.datekey) as p_time, sum(a.v1) as value
from par_tbl1 a
group by p_time, a.item_id
union all
select date_trunc('day', date_add(b.datekey, INTERVAL 1 DAY)) AS p_time, sum(b.v1) as value
from par_tbl1 b
group by p_time, item_id
;

Expected behavior (Required)

p_time value
2025-01-01 00:00:00 2
2025-01-02 00:00:00 2
2025-01-02 00:00:00 2
2025-01-03 00:00:00 2

Real behavior (Required)

Image
p_time value
2025-01-01 00:00:00 2
2025-01-02 00:00:00 2
2025-01-02 00:00:00 2

A piece of data is missing:

p_time value
2025-01-03 00:00:00 2

select date_trunc("day", a.datekey) as p_time, sum(a.v1) as value
from par_tbl1 a
group by p_time, a.item_id
Image
select date_trunc('day', date_add(b.datekey, INTERVAL 1 DAY)) AS p_time, sum(b.v1) as value
from par_tbl1 b
group by p_time, item_id
Image

If I execute these two SQL statements separately by hand, the results are correct, but if I use union all, the result is incorrect.


Image

If I manually clear the data and then rewrite it, the result is correct.

StarRocks version (Required)

  • 3.3.x
  • 3.4.x

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions