Skip to content

Towards stable storage of indexes and the ART #8703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Sep 7, 2023

Conversation

taniabogatsch
Copy link
Contributor

This PR reworks our ART serialization implementation. Currently, we (de)serialize on a per-node basis, as described here.

We already moved away from per-node allocations for in-memory allocations (#6951). Instead, we store ART nodes on fixed-size allocators. In this PR, we implement (de)serializing these fixed-size allocators directly.

Previously, any fixed-size allocator would store its buffers as memory pointers in a vector. Instead, each FixedSizeAllocator now holds an unordered_map of <idx_t, FixedSizeBuffer> elements (buffer ID and buffer). These fixed-size buffers keep track of their state. They can store both a memory pointer and a disk pointer. Additionally, they have three flags: (1) in_memory, (2) on_disk, and (3) dirty. With these flags, it is possible only to serialize buffers that are not yet on disk or dirty. Also, we continue to deserialize lazily, i.e., if a node is on a buffer that is not yet in memory, then we deserialize the respective buffer.

Next steps

  • @Mytherin: Currently, we eagerly serialize. I.e., when running a checkpoint, we always serialize if in_memory && (!on_disk || dirty). However, after that serialization, we do not reset the dirty flag. So, if any buffer is dirty, it will repeatedly get serialized at each checkpoint until closing/restarting the database connection. For a more in-depth example, see test/sql/index/art/storage/test_art_storage_multi_checkpoint.test. We cannot yet reset the dirty flag, which causes overwritten/invalid disk memory issues that we have to look into. For reference, see here: b70af69, c224d2b, and Rework Metadata Storage #8513.
  • @pdet: Maybe you can look at this point during your review. Windows was complaining a lot about templating build failures, so I ended up with a templated Node::GetChild and Node::GetNextChild function and two implementations for each (const Node and Node); see node.hpp. The same holds for Prefix::Traverse in prefix.hpp. Also, all node-type specific files now inline these functions into their header file. There may be a more elegant way of solving this (Windows could not resolve the types of templates)! 😄
  • @Maxxen: Please let me know your thoughts on the generalized index memory management (FixedSizeAllocator, FixedSizeBuffer, and IndexPointer) concerning pluggable indexes!

Towards stable storage and pluggable indexes

Working towards stable storage and pluggable indexes, we made the following changes.

  • The FixedSizeAllocator class is now completely decoupled from the ART. Ideally, any future index can use these allocators to store its memory. Therefore, we unify how indexes (de)serialize their memory through these allocators.
  • There exists a new IndexPointer class. The FixedSizeAllocator provides these index pointers to be used by arbitrary indexes. Any index requiring memory can allocate multiple buffers of up to approximately BUFFER_SIZE = Storage::BLOCK_ALLOC_SIZE; memory segments, with an IndexPointer to each memory section.
  • The Node class is now a specialization of the IndexPointer, with additional functionality specific to ART nodes.
  • Any index will have to implement functionality calling Serialize on every FixedSizeAllocator it utilizes.

Benchmarks

I've added a new benchmark, benchmark/micro/index/checkpoint/checkpoint_art.benchmark, to ensure we do not regress on (de)serialization speed. Also, here are some general timings for some of our ART benchmarks. We increase performance for checkpoints and do not regress for other index operations. We also increase the performance of constraint checking, deletions, and insertions with minor general changes in this PR.

Benchmark info main branch [s] PR [s]
checkpoint_art.benchmark 10M - 11M INT64 1.73 1.23
create_art.benchmark 10M INT64, 2.5M distinct 0.23 0.23
create_art_duplicates.benchmark 10M INT, approx. 100 distinct 0.22 0.22
create_art_varchar.benchmark 28.8M VARCHAR, 8GB memory limit 11.17 11.69
create_art_varchar_long.benchmark 10M long VARCHAR 0.22 0.20
create_art_varchar_short.benchmark 10M short VARCHAR 0.15 0.14
lookup_art_constraint.benchmark 10M INT 1.82 1.48
delete_art.benchmark 5M out of 10M INT64 1.07 0.97
insert_art.benchmark 10M INT64 1.76 1.63
point_query_with_art.benchmark 100M INT64, one point 0.00011 0.00012

Notes

  • The checkpoint function now (de)serializes each BlockPointer directly instead of block_id and offset separately.
  • We currently use the metadata manager for (de)serialization of fixed-size buffers. We should move to something that allows writing big blocks efficiently (@Maxxen, @Mytherin).
  • There now is a clear separation of functions that alter the ART and those that don't (many more consts).
  • Deletion of all default copy constructors for nodes, leaves, and prefixes.
  • Deletion of the default copy constructor of the metadata manager.
  • This PR increases the storage version number.

@taniabogatsch taniabogatsch requested a review from pdet August 28, 2023 11:26
@taniabogatsch taniabogatsch requested a review from Mytherin August 31, 2023 08:49
@github-actions github-actions bot marked this pull request as draft August 31, 2023 11:23
@taniabogatsch taniabogatsch marked this pull request as ready for review August 31, 2023 11:25
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great - some minor comments below:

@github-actions github-actions bot marked this pull request as draft September 1, 2023 13:30
@taniabogatsch taniabogatsch marked this pull request as ready for review September 6, 2023 07:29
@github-actions github-actions bot marked this pull request as draft September 6, 2023 12:20
@taniabogatsch taniabogatsch marked this pull request as ready for review September 6, 2023 20:23
@Mytherin Mytherin merged commit c7eed4c into duckdb:main Sep 7, 2023
@Mytherin
Copy link
Collaborator

Mytherin commented Sep 7, 2023

Thanks! Looks excellent

@taniabogatsch taniabogatsch deleted the art-storage branch September 7, 2023 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants