Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to disable fwd and inverted index on a BYTES metric column. #11659

Closed
t0mpere opened this issue Sep 22, 2023 · 3 comments
Closed

How to disable fwd and inverted index on a BYTES metric column. #11659

t0mpere opened this issue Sep 22, 2023 · 3 comments

Comments

@t0mpere
Copy link
Contributor

t0mpere commented Sep 22, 2023

Hello I'm struggling to follow the doc here regarding disabling fwd and inverted indexes for a bytes column on an existing table. It says that I can't disable the fwd index without having the inverted index enabled, althoug the doc says that it's possible to disable both. What am I missing here?

My issue is that column has a very high cardinality and it’s in BYTES format since it contains an HLL intermediate state. noDictionary is not an option since 30mb of raw data it creates a 2GB forwardIndex (just for this column) and noDictionaryConfig does nothing when i set lz4 compression on the column. On the other hand raw or invertedIndex hover around 1gb per 30mb of data which is still huge considering that the column is only used in an aggregated function.

@Jackie-Jiang
Copy link
Contributor

Trying to understand the problem. What do you store in this column? If you disable both fwd and inverted index, the column won't be queryable, in which case you should just remove the column.

@t0mpere
Copy link
Contributor Author

t0mpere commented Sep 25, 2023

Ok so misunderstanding from my side. I thought both indexes could be disabled and the column still be used in AGG fuctions. I'm trying to reduce the space used by this column. It's very high cardinality and it contains a 12 bit HLL state. Something like 0000000c00000aac000000000<many zeros...>00 what do you think would be the best configuration for a column like this?
What I've tried so far, for 30mb of data:

  • no dictionary = 2GB fwd index
  • raw = ~1GB fwd index + dictionary
  • inverted = ~1GB fwd index

I'm wandering if I'm doing something wrong here, how would I enable lz4 compression on the fwd index? "noDictionaryConfig": {"hll_column": "LZ4"} seem to have no effects after a reload. Do I have to refresh all segments to make it work? Is it because it's a metrics column? What if i switch to a dimension column, would it enable lz4 by default? (seems to be the case from the documentation). Also is there a way to check if compression is actually used?

@t0mpere
Copy link
Contributor Author

t0mpere commented Sep 25, 2023

Ok, found the answer. To enable LZ4 on noDictionary columns I needed to re-ingest the data. I think I'll update the doc on this. There's not a lot on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants