Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sumMap result value with and without max_bytes_before_external_group_by query setting #16587

Closed
mkabilov opened this issue Nov 1, 2020 · 7 comments

Comments

@mkabilov
Copy link
Contributor

mkabilov commented Nov 1, 2020

ClickHouse server version 20.10.2 revision 54441

let's create a table:

create table my_table(
    `col1` DateTime, 
    `col2` Int64
) engine = MergeTree() 
order by col1 
settings index_granularity = 8192;

load test_data.csv.gz (2.4 Mb) into it

And run two queries:

select sumMap(sm) from (select sumMap([1],[1]) as sm from my_table group by col1, col2);

┌─sumMap(sm)──────┐
│ ([1],[5000000]) │
└─────────────────┘
select sumMap(sm) from (select sumMap([1],[1]) as sm from my_table group by col1, col2) 
settings max_bytes_before_external_group_by = 100000;

┌─sumMap(sm)──────┐
│ ([1],[1782592]) │
└─────────────────┘

— result values should match.

@mkabilov mkabilov added the bug Confirmed user-visible misbehaviour in official release label Nov 1, 2020
@den-crane
Copy link
Contributor

den-crane commented Nov 1, 2020

20.11.1.5027.

cat test_data.csv.gz|gunzip |clickhouse-client -q 'insert into my_table format TSV'

select sumMap(sm) from (select sumMap([1],[1]) as sm from my_table group by col1, col2) 
settings max_bytes_before_external_group_by = 100000;
┌─sumMap(sm)──────┐
│ ([1],[1785408]) │
└─────────────────┘

┌─sumMap(sm)──────┐
│ ([1],[1786176]) │
└─────────────────┘

┌─sumMap(sm)──────┐
│ ([1],[1784640]) │
└─────────────────┘



19.13.7.57

select sumMap(sm.1, sm.2) from (select sumMap([1],[1]) as sm from my_table group by col1, col2)  settings max_bytes_before_external_group_by = 100000;
┌─sumMap(tupleElement(sm, 1), tupleElement(sm, 2))─┐
│ ([1],[1788224])                                  │
└──────────────────────────────────────────────────┘

┌─sumMap(tupleElement(sm, 1), tupleElement(sm, 2))─┐
│ ([1],[1787712])                                  │
└──────────────────────────────────────────────────┘

┌─sumMap(tupleElement(sm, 1), tupleElement(sm, 2))─┐
│ ([1],[1786944])                                  │
└──────────────────────────────────────────────────┘

@KochetovNicolai
Copy link
Member

SELECT sumMap(sm)
FROM 
(
    SELECT sumMap([1], [1]) AS sm
    FROM remote('127.0.0.{1,1}', default.my_table)
    GROUP BY
        col1,
        col2
)

Query id: 385ec036-1adf-482f-964a-8c442a48cea4

┌─sumMap(sm)───────┐
│ ([1],[10000000]) │
└──────────────────┘

1 rows in set. Elapsed: 0.190 sec. Processed 10.00 million rows, 120.00 MB (52.61 million rows/s., 631.33 MB/s.) 



SELECT sumMap(sm)
FROM 
(
    SELECT sumMap([1], [1]) AS sm
    FROM remote('127.0.0.{1,2}', default.my_table)
    GROUP BY
        col1,
        col2
)

Query id: 61ae65a8-eb78-491a-9b24-08fe3dcabdaf

┌─sumMap(sm)──────┐
│ ([1],[6768000]) │
└─────────────────┘

looks like something is wrong with sumMap serialization

@KochetovNicolai
Copy link
Member

CREATE TABLE my_table3
(
    `col1` DateTime,
    `col2` Int64
)
ENGINE = MergeTree()
ORDER BY col1
SETTINGS index_granularity = 8192

Query id: 837aa6a7-b03f-4060-a565-af2d53ab4f46

Ok.

0 rows in set. Elapsed: 0.024 sec. 


 :) insert into my_table3 select '2020-10-26 00:00:00', 70724110 from numbers(300)

INSERT INTO my_table3 SELECT
    '2020-10-26 00:00:00',
    70724110
FROM numbers(300)

Query id: 57c13a1b-d0be-45e8-aaa3-673f1a876e8f

Ok.

0 rows in set. Elapsed: 0.003 sec. 

 :) select sumMap(sm) from (select sumMap([1],[1]) as sm from remote('127.0.0.{1,2}', default.my_table3) group by col1, col2);

SELECT sumMap(sm)
FROM 
(
    SELECT sumMap([1], [1]) AS sm
    FROM remote('127.0.0.{1,2}', default.my_table3)
    GROUP BY
        col1,
        col2
)

Query id: 013399ed-af8c-4147-98d0-941ef0dc50ad

┌─sumMap(sm)──┐
│ ([1],[344]) │
└─────────────┘

1 rows in set. Elapsed: 0.009 sec. 

@KochetovNicolai
Copy link
Member

Well, it is just an integer overflow which happens when sumMap state is serialized. Cause value type is UInt8.

SELECT sumMap(sm)
FROM 
(
    SELECT sumMap([1], [toUInt64(1)]) AS sm
    FROM my_table
    GROUP BY
        col1,
        col2
)
SETTINGS max_bytes_before_external_group_by = 100000

Query id: 48e55e7f-01b8-4df3-93ff-e6fd4a425aa0

┌─sumMap(sm)──────┐
│ ([1],[5000000]) │
└─────────────────┘

I am afraid it is difficult to change without breaking compatibility

@KochetovNicolai
Copy link
Member

For now, it was decided to postpone this issue until #12552 is implemented.

@akuzm
Copy link
Contributor

akuzm commented Nov 13, 2020

If we are postponing it, it probably makes sense to leave the issue open.

@den-crane den-crane added usability and removed bug Confirmed user-visible misbehaviour in official release labels Nov 13, 2020
@den-crane
Copy link
Contributor

den-crane commented Nov 13, 2020

yes, toUInt64 solves the issue

select sumMap(sm) from (select sumMap( ([1],[toUInt64(1)]) ) as sm from my_table group by col1, col2) 
settings max_bytes_before_external_group_by = 100000;
┌─sumMap(sm)──────┐
│ ([1],[5000000]) │
└─────────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants