Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy loading of primary key and/or data parts. #43424

Closed
alexey-milovidov opened this issue Nov 21, 2022 · 2 comments · Fixed by #49351
Closed

Lazy loading of primary key and/or data parts. #43424

alexey-milovidov opened this issue Nov 21, 2022 · 2 comments · Fixed by #49351

Comments

@alexey-milovidov
Copy link
Member

Use case

We want to implement zero-cost attaching of example datasets.
But loading the list of parts and their primary keys takes time.

Describe the solution you'd like

Variant 1:

For a subset of tables (like the tables on remote filesystems), don't wait for loading before accepting connections and continue loading in the background after server startup. Queries to the table would wait for loading, potentially rethrowing an exception if the loading was unsuccessful.

This solution also automatically solves loading dependencies between tables and dictionaries by implicitly arranging them in the graph of dependencies.

Variant 2:

Load all tables as usual, but don't load the primary key in the data parts in memory at all. It can use a separate LRUCache, similar to MarkCache. It will also control the total size of the primary key in memory, protecting from #11188

Additional context

It will also improve server startup in general.

@den-crane
Copy link
Contributor

den-crane commented Nov 21, 2022

then it will be possible to store parts meta-information in Zookeeper/Keeper (path to the part (for multidisk), count.txt , columns.txt, min_max.idx) as a value of part_id key and attach a table without disk I/O.

@UnamedRus
Copy link
Contributor

It potentially can make parallel replica feature more interesting / working on bigger scale dataset without sharding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants