# Notebook 9

# Cache oblivious

Recall that in the DAM/external memory model, we model a machine with a memory of size $M$, a *disc* of unlimited size, and pay cost 1 to move $B$ consecutive items from/to memory to/from disc. In this model we described the B-tree structure, which supported insertion, deletion, and search in time $O(\log_B n)$ time, and the $B^\epsilon$ tree which improved the insertion and deletion time to $O(\frac{1}{B^{1-\epsilon}}\log_B n)$ for any $\epsilon>0$.

However, real computers have many different levels of registers, memory, address translation, and storage, with different speeds and block sizes. It would be difficult to try to figure out the parameters for each machine and create data structures and algorithms optimized for them. Instead the inventors of the cache oblivious model made a brilliant observation, that if we analyzed algorithms just like in the DAM model, but where $B$ and $M$ are unknown to the algorithm, then the analysis would be valid between any two levels of a memory hierarchy. So, that is the cache-oblivious model: same as DAM but the algorithm does not know $M$ or $B$.

For some algorithms, this is not a problem. For example, scanning $n$ items requires $O(\frac{n}{B})$ time in the DAM and cache-oblivious models, because the algorithm *scan* is not parameterized by $B$. However, the $B$-tree crucially must know $B$ in order to decide how big a node should be. As such, a completely different approach is required.

We now describe the *van Emde Boas* structure, which is a way to support searching in the cache-oblivious model in the same $O(\log_B n)$ time of the B-tree, but that does not know $B$.

The structure is as follows: build a perfectly balanced binary search tree containing the data items, which has height $\log n$. Cut the tree in half height-wise, which gives tree of height $\frac{1}{2}\log n$; one of which is the top and $\approx \sqrt{n}$ are below the cut; each of these trees has $\approx 2^{\frac{1}{2}\log n}=\sqrt{n}$ items. Place the data from each of these trees into an array recursively.

Now look at a search path. This is stored in

- The one big tree of $n$ nodes and height $\log n$
- Two recursive trees of $\sqrt{n}$ nodes of height $\frac{1}{2}\log n$
- Four recursive trees (in the second level of recursion) of $n^{\frac{1}{4}}$ nodes of height $\frac{1}{4}\log n$
- Generalizing: $2^i$ recursive trees (in the $i$th level of recursion) of $n^{\frac{1}{2^i}}$ nodes of height $\frac{1}{2^i}\log n$
- Set $i=\log \log_B N$: $2^{\log \log_B N}=\log_B N$ recursive trees (in the $i$th level of recursion) of $n^{\frac{1}{2^i}}=n^{\frac{1}{2^{\log \log_B N}}}=n^{\frac{\log B}{\log N}}=2^{\frac{\log B \log n}{\log N}}=2^{\log B}  =B$ nodes of height $\frac{1}{2^i}\log n=\frac{1}{2^{\log \log_B N}}\log n = \log B$

This last statement says on any search path, it will pass through only $\log_B N$ trees who are stored in memory consecutively and have size at most $B$. Thus the cache-oblivious cost for search is $O(\log_B N)$ in the cache-oblivious model.

Observe that this structure did not use $B$ for the construction, only the analysis. The multi-level recursive nature is typical of cache-oblivious algorithms and is the main way to take advantage of many different levels of locality.

This structure does not support insertion and deletion, or the speedups we obtained for $B^\epsilon$ trees. However, structures have been obtained with these results (any many others) but are too complex to present in the limited time we have.