Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with log2m >= 18 #5

Closed
metdos opened this issue Apr 5, 2013 · 3 comments
Closed

Problem with log2m >= 18 #5

metdos opened this issue Apr 5, 2013 · 3 comments

Comments

@metdos
Copy link
Contributor

metdos commented Apr 5, 2013

Great extension, thanks for making it open-source.

I came across with this issue while playing around with it:

When I use bigger values than 18 for log2m parameter in hll_add_agg function it crashes. Here is how to replicate problem:
I tried this on PostgreSQL 9.2.4

-- First create dataset:
wget http://examples.citusdata.com/tpch_2_13_0.tar.gz
tar xvfz tpch_2_13_0.tar.gz
cd tpch_2_13_0
gmake

./dbgen -f -s 1 -T L

-- Create table and load data
CREATE TABLE LINEITEM
(
l_orderkey BIGINT not null,
l_partkey INTEGER not null,
l_suppkey INTEGER not null,
l_linenumber INTEGER not null,
l_quantity DECIMAL(15,2) not null,
l_extendedprice DECIMAL(15,2) not null,
l_discount DECIMAL(15,2) not null,
l_tax DECIMAL(15,2) not null,
l_returnflag CHAR(1) not null,
l_linestatus CHAR(1) not null,
l_shipdate DATE not null,
l_commitdate DATE not null,
l_receiptdate DATE not null,
l_shipinstruct CHAR(25) not null,
l_shipmode CHAR(10) not null,
l_comment VARCHAR(44) not null
);

COPY lineitem FROM 'YOUR_PATH/tpch_2_13_0/lineitem.tbl' WITH DELIMITER '|';

-- This query works without a problem:
SELECT
hll_cardinality(hll_add_agg(hll_hash_bigint(l_orderkey, 0), 17))
FROM
lineitem;

-- This query crashes:
SELECT
hll_cardinality(hll_add_agg(hll_hash_bigint(l_orderkey, 0), 18))
FROM
lineitem;

@ghost
Copy link

ghost commented Apr 11, 2013

Hi Metin,

This is a bug on our end, though it's one we'd need to seriously consider "fixing". Currently, we allocate 128KB as an internal buffer for each hll instance. At log2m=18, regwidth=5, the buffer would have to be at least (2^18 * 5 / 8) = 164KB. According to the docs, we support log2m up to 31, but it would be insane to allocate nearly a GB of internal buffer space per hll to accomodate this.

I imagine the only real solution to this problem would be to allow for a compile-time flag indicating the maximum log2m to be supported (since I believe PG only allows a fixed size for the backing memory buffer.) Is there a pressing business need for this for you or can we just change the docs to say that the maximum log2m is 17?

@ghost
Copy link

ghost commented Apr 17, 2013

@metdos Hi, haven't heard back from you. Is this a pressing issue for you (that is, do you require log2m > 17?) or is this just experimentation?

If it's simply experimentation, I'm inclined to just change the documentation to say the max is 17 and be done with it. We haven't found anyone else who uses a log2m > 16, so I'd love to be able to defer addressing this issue.

I should also correct what I said before:

(since I believe PG only allows a fixed size for the backing memory buffer.)

This is not correct. A pointer is possible, but we suspect that it would cause a significant performance degradation compared to the union method currently implemented.

@metdos
Copy link
Contributor Author

metdos commented Apr 17, 2013

Hey Timon, my apologies for the late reply. @TimonK

We were considering possible use cases and currently there is not any urgent business need. You can change the documentation. If a need for it appears, I will ping you or reopen issue.

Thanks a lot.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant