Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cached columns #120

Open
broneill opened this issue Jun 10, 2023 · 1 comment
Open

Support cached columns #120

broneill opened this issue Jun 10, 2023 · 1 comment

Comments

@broneill
Copy link
Member

Constructing strings from UTF-8 is expensive. Create an annotation which allows a column to be cached, either "soft" or "weak", where soft is the default. Document that caching is best suited for columns with low cardinality due to potential GC overhead.

The cache itself can be simple -- it has no max capacity and it doesn't perform any LRU reordering. A single global cache should work fine, and it needs to support high concurrency.

@broneill
Copy link
Member Author

In addition to referring to strings, the cache entries also need to refer to the UTF-8 encoded bytes. This is necessary for making quick comparisons, but it also means that the cache occupies much more memory than might be expected. All the more reason to document that the caching feature should only be used for columns with low cardinality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant