Skip to content

approx_percentile_cont return wrong result when scan multi parquet files. #3140

@Ted-Jiang

Description

@Ted-Jiang

Describe the bug
I found in our env approx_percentile_cont return different result when running the same query.

❯ create external table line STORED AS PARQUET LOCATION '/Users/yangjiang/test-data/tpch-1g-oneFile' ;

❯  select o_orderstatus, approx_percentile_cont(o_totalprice, 0.5) from order group by o_orderstatus  order by 1 limit 5 ;

❯ create external table order stored as parquet location '/Users/yangjiang/test-data/tpch-1g/orders';
0 rows in set. Query took 0.043 seconds.
❯   select o_orderstatus, approx_percentile_cont(o_totalprice, 0.5) from order group by o_orderstatus  order by 1 limit 5 ;
+---------------+-------------------------------------------------------+
| o_orderstatus | APPROXPERCENTILECONT(order.o_totalprice,Float64(0.5)) |
+---------------+-------------------------------------------------------+
| F             | 143401.16095215187                                    |
| O             | 142955.62738411513                                    |
| P             | 181527.60330546615                                    |
+---------------+-------------------------------------------------------+
3 rows in set. Query took 0.719 seconds.
❯   select o_orderstatus, approx_percentile_cont(o_totalprice, 0.5) from order group by o_orderstatus  order by 1 limit 5;

+---------------+-------------------------------------------------------+
| o_orderstatus | APPROXPERCENTILECONT(order.o_totalprice,Float64(0.5)) |
+---------------+-------------------------------------------------------+
| F             | 143281.01048942824                                    |
| O             | 143237.76755123492                                    |
| P             | 181240.3599115534                                     |
+---------------+-------------------------------------------------------+
3 rows in set. Query took 0.684 seconds.
❯   select o_orderstatus, approx_percentile_cont(o_totalprice, 0.5) from order group by o_orderstatus  order by 1 limit 5;

+---------------+-------------------------------------------------------+
| o_orderstatus | APPROXPERCENTILECONT(order.o_totalprice,Float64(0.5)) |
+---------------+-------------------------------------------------------+
| F             | 143462.16658943877                                    |
| O             | 142949.42279252067                                    |
| P             | 181449.91485153142                                    |
+---------------+-------------------------------------------------------+
3 rows in set. Query took 0.595 seconds.
❯   select o_orderstatus, approx_percentile_cont(o_totalprice, 0.5) from order group by o_orderstatus  order by 1 limit 5;

+---------------+-------------------------------------------------------+
| o_orderstatus | APPROXPERCENTILECONT(order.o_totalprice,Float64(0.5)) |
+---------------+-------------------------------------------------------+
| F             | 143033.71820461476                                    |
| O             | 143033.13028498186                                    |
| P             | 181449.54772100857                                    |
+---------------+-------------------------------------------------------+
3 rows in set. Query took 0.705 seconds.
❯


 pwd
/Users/yangjiang/test-data/tpch-1g/orders
yangjiang@LM-SHC-15009782 orders % ll
total 123520
-rw-r--r--  1 yangjiang  110538165     0B May 10 11:58 _SUCCESS
-rw-r--r--  1 yangjiang  110538165    11M May 10 11:58 part-00000-e87df013-b3f8-493f-93c2-3da94f34e357-c000.snappy.parquet
-rw-r--r--  1 yangjiang  110538165    11M May 10 11:58 part-00001-e87df013-b3f8-493f-93c2-3da94f34e357-c000.snappy.parquet
-rw-r--r--  1 yangjiang  110538165    11M May 10 11:58 part-00002-e87df013-b3f8-493f-93c2-3da94f34e357-c000.snappy.parquet
-rw-r--r--  1 yangjiang  110538165    11M May 10 11:58 part-00003-e87df013-b3f8-493f-93c2-3da94f34e357-c000.snappy.parquet
-rw-r--r--  1 yangjiang  110538165    11M May 10 11:58 part-00004-e87df013-b3f8-493f-93c2-3da94f34e357-c000.snappy.parquet
yangjiang@LM-SHC-15009782 orders %

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions