feat: very minimal code that adds b-tree to the codebase #1596

romange · 2023-07-29T18:14:43Z

The motivation to have our own b-tree to repalce zskiplist is shown by #1567 Based on the results we should greatly reduce the memory overhead per item when using a modern b-tree.

Currently the functionality supports Insert method only to reduce the review complexity. The design decisions behind the data structure are described in src/core/detail/btree_internal.h

src/core/detail/btree_internal.h

chakaz

Please see some comments, mostly nits.
I actually have not looked yet into the actual search / rebalance logic quite yet, but I'm happy to do that as well ofcourse.

chakaz · 2023-07-30T05:31:28Z

src/core/btree_set.h

+  using KeyCompareTo = DefaultCompareTo<T>;
+};
+
+template <typename T, typename Policy = BTreePolicy<T> > class BTree {


Do you plan to extend BTreePolicy in the future?
Because right now it contains Key_t which is redundant (it's part of BTree itself) and the default comparer, so maybe make the default comparer a direct template argument?
(that's also how STL does it, like in maps)

Also, no need in modern C++ to have a space between > > :)

The reason behind this - to pass a single template param to the internal classes. That's what absl tree does as well.
I think it's matter of taste.

chakaz · 2023-07-30T05:33:32Z

src/core/btree_set.h

+  using KeyCompareTo = DefaultCompareTo<T>;
+};
+
+template <typename T, typename Policy = BTreePolicy<T> > class BTree {


Maybe call it BPTree? B trees are a similar yet different data structure

chakaz · 2023-07-30T05:42:12Z

src/core/btree_set.h

+  size_t Size() const {
+    return count_;
+  }
+
+  size_t NodeCount() const {
+    return num_nodes_;
+  }


It's unclear to me what's the difference between count_ and num_nodes_..
num_nodes_ seem to track allocated nodes, whereas count_ tracks the ones actually inserted, but aren't they supposed to be the same? Is it for debugging?

I checked, they're updated correctly. Count is the number of items, num_nodes the number of nodes. Different things in a B+ tree

yes, each node can hold as much as 31 keys, hence the difference.

chakaz · 2023-07-30T05:49:13Z

src/core/detail/btree_internal.h

+// 5. We assume we store POD types - this greatly reduces the complexity of the generics
+//    in the code.


As such, I'd add static_assert(is_pod_v<T>); somewhere

Probably better to use std::is_trivially_copyable_v<T> instead 😅

chakaz · 2023-07-30T05:50:15Z

src/core/detail/btree_internal.h

+
+namespace detail {
+
+// Internal classes related to B+tree implementation. The design is largely based on the


I'd drop this file and move its content to btree_set.h. I think it's easier to read / use, but maybe it's just me.

I thought in the opposite direction: we use it only for a small number of types, so it can be explicitly instantiated for those and the internals can be hidden away

To not insult anyone, I will leave it as is :)

chakaz · 2023-07-30T06:10:57Z

src/core/detail/btree_internal.h

+    uint32_t found : 1;
+  };
+
+  // Searches for key in the node using binary seach.


seach->search

src/core/detail/btree_internal.h

chakaz · 2023-07-30T06:18:29Z

src/core/detail/btree_internal.h

+    unsigned pos;
+  };
+
+  Record record_[kMaxDepth];


Use std::array<Record, kMaxDepth> instead?

I change it but inside I weakly disagree. I feel that array is c++ struct that has advantage in SOME cases but in those it does not, it's just another include.

src/core/detail/btree_internal.h

dranikpg · 2023-07-30T10:05:01Z

src/core/btree_set.h

+  size_t Size() const {
+    return count_;
+  }
+
+  size_t NodeCount() const {
+    return num_nodes_;
+  }


I checked, they're updated correctly. Count is the number of items, num_nodes the number of nodes. Different things in a B+ tree

dranikpg · 2023-07-30T10:06:21Z

src/core/btree_set.h

+    return false;
+  }
+
+  assert(path.Depth() > 0u);


raw assert to avoid being dependant on glog?

yes, I do not like having glog in header files

this is why I like explicit instantiations 😆

I do not mind it. If @chakaz does not have strong objections - I can move this to cc file.

I always prefer implementations to be in the .cc, like.. always :)

Obviously except for when it's impossible, like with templates :(

dranikpg · 2023-07-30T10:20:20Z

src/core/detail/btree_internal.h

+
+namespace detail {
+
+// Internal classes related to B+tree implementation. The design is largely based on the


I thought in the opposite direction: we use it only for a small number of types, so it can be explicitly instantiated for those and the internals can be hidden away

dranikpg · 2023-07-30T10:31:28Z

src/core/detail/btree_internal.h

+
+  Key_t Key(unsigned index) const {
+    Key_t res;
+    memcpy(&res, KeyPtr(index), kKeySize);


Agree. If the offset will stay 8 bytes in the future, the keys will be aligned correctly so they can be read and written in-place

dranikpg · 2023-07-30T10:33:06Z

src/core/detail/btree_internal.h

+  struct SearchResult {
+    uint32_t index : 31;
+    uint32_t found : 1;
+  };


Only only use it to return result from functions and never place it anywhere besides the stack, surprised you use bitfields here 😵

you are right.

dranikpg · 2023-07-30T10:37:11Z

src/core/detail/btree_internal.h

+  union {
+    struct {
+      uint64_t num_items_ : 7;
+      uint64_t leaf_ : 1;
+      uint64_t : 56;
+    };
+
+    uint64_t val_;
+  };


So you use val_ only to cleanly initialize the fields above? Why can't you just init them directly?

I can if you prefer to it this way.

I just mentioned it because the top-level union is a little confusing, its not obvious that val_ has no other uses

dranikpg · 2023-07-30T10:46:37Z

src/core/detail/btree_internal.h

+  uint32_t i = 0;
+  uint32_t n = num_items_;
+  typename Policy::KeyCompareTo cmp_op;
+  while (i < n) {
+    uint32_t j = (i + n) >> 1;
+    assert(j < n);
+
+    Key_t item = Key(j);
+
+    int cmp_res = cmp_op(key, item);
+    if (cmp_res == 0) {
+      return SearchResult{.index = j, .found = 1};
+    }
+
+    if (cmp_res < 0) {
+      n = j;
+    } else {
+      i = j + 1;  // we never return indices upto j because they are strictly less than key.
+    }
+  }
+  assert(i == n);
+
+  return {.index = i, .found = 0};
+}


A little hard to read, instead of i, n, j, >>1 I'd use lo, hi, mid, /2 or smth else

But again, if we allow getting a Key_t* we can just use stl's lower_bound instead of re-writing it

I will rename the variables. I do not want to use lower_bound because of inefficient less abstraction. I prefer a three-way compare because we gonna use bptree with sds strings.

Hm, I see, so you want it to stop immediately if the item was found when placing the mid point

I did not understand the question :)
I use cmp_op once per iteration, where lower_bound must use compare operator twice.

dranikpg · 2023-07-30T10:49:51Z

src/core/detail/btree_internal.h

+template <typename Policy>
+std::pair<BTreeNode<Policy>*, unsigned> BTreeNode<Policy>::RebalanceChild(unsigned pos,


A high level comment on rebalancing would make understanding the code easier

dranikpg · 2023-07-30T10:56:44Z

src/core/btree_set.h

+detail::BTreeNode<Policy>* BTree<T, Policy>::CreateNode(bool leaf) {
+  num_nodes_++;
+  void* ptr = mr_->allocate(BTreeNode::kTargetNodeSize, 8);
+  BTreeNode* node = new (ptr) BTreeNode(leaf);
+
+  return node;
+}
+
+template <typename T, typename Policy> void BTree<T, Policy>::DestroyNode(BTreeNode* node) {
+  void* ptr = node;
+  mr_->deallocate(ptr, BTreeNode::kTargetNodeSize, 8);
+  num_nodes_--;
+}


I'd leave a high level comment somewhere above BTreeNode to explain why it offsets its this pointer beyond its size. Shahar is right, we could just keep the buffer in the struct to occupy this space, but it would make implementing point 3 on your todo (smaller nodes for small trees) more difficult and you'd have to revert back to this approach

Exactly. absl implements small object optimization. In short, the answer is - I embrace the fact that high-level languages can not support dynamically created structs. We usually do not use this approach but when it's needed I am not going to avoid it because it's "not nice". And btw, I used a similar approach in dense_set when allocating metadata for expire time of hash elements.

dranikpg · 2023-07-30T10:57:11Z

src/core/detail/btree_internal.h

+//          allocate less then 256 bytes (special case) to avoid relative blowups in memory for
+//          small trees.
+
+template <typename Policy> class BTreeNode {


It should be immovable and non-copyable

The motivation to have our own b-tree to repalce zskiplist is shown by #1567 Based on the results we should greatly reduce the memory overhead per item when using a modern b-tree. Currently the functionality supports Insert method only to reduce the review complexity. The design decisions behind the data structure are described in src/core/detail/btree_internal.h Signed-off-by: Roman Gershman <roman@dragonflydb.io>

romange · 2023-07-30T18:05:23Z

Please see some comments, mostly nits. I actually have not looked yet into the actual search / rebalance logic quite yet, but I'm happy to do that as well ofcourse.

Shahar, I do not think it's required and I asked @dranikpg also not to spend time searching for correctness bugs. It's not that I am sure I do not have bugs :), it's just this code is far from being prod ready anyway and I trust tests, assertions, and absl reference code to guide me through most of the obstacles.

romange · 2023-07-30T18:09:42Z

To be more precise I do not look at absl code (which is incomprehensible, imho), I look at the original library coming from Google: https://code.google.com/archive/p/cpp-btree/

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

dranikpg

Given its "very minimal" I don't have further comments 🙂

romange added the enhancement New feature or request label Jul 29, 2023

romange requested a review from dranikpg July 29, 2023 18:14

romange force-pushed the AddBtree1 branch 3 times, most recently from 8b6d9be to 0e8e45d Compare July 30, 2023 04:48

romange commented Jul 30, 2023

View reviewed changes

src/core/detail/btree_internal.h Outdated Show resolved Hide resolved

chakaz reviewed Jul 30, 2023

View reviewed changes

dranikpg reviewed Jul 30, 2023

View reviewed changes

romange force-pushed the AddBtree1 branch from 0e8e45d to 1278a8a Compare July 30, 2023 18:01

romange requested review from chakaz and dranikpg July 30, 2023 18:01

chore: rewrote template logic for internal classes

7b8a2ad

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

dranikpg approved these changes Jul 31, 2023

View reviewed changes

romange merged commit 723cc62 into main Jul 31, 2023
10 checks passed

romange deleted the AddBtree1 branch July 31, 2023 10:40

		// 5. We assume we store POD types - this greatly reduces the complexity of the generics
		// in the code.


		namespace detail {

		// Internal classes related to B+tree implementation. The design is largely based on the

		template <typename Policy>
		std::pair<BTreeNode<Policy>*, unsigned> BTreeNode<Policy>::RebalanceChild(unsigned pos,

feat: very minimal code that adds b-tree to the codebase #1596

feat: very minimal code that adds b-tree to the codebase #1596

Conversation

romange commented Jul 29, 2023

chakaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dranikpg Jul 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romange Jul 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romange commented Jul 30, 2023

romange commented Jul 30, 2023

dranikpg left a comment

Choose a reason for hiding this comment

dranikpg Jul 30, 2023 •

edited

romange Jul 30, 2023 •

edited