# Chapter 20: B-Trees and External Memory

Memory hierarchy:

- **CPU registers**: very fast access, but relatively few locations
- **cache memory**: longer access times, but more exists
- **random access memory/internal memory**: slower access than cache with more existing
- **external memory**: slowest access with largest space

## External Memory

Algorithms are not designed with memory hierarchy in mind. It is assumed that OS designers that developed general mechanisms for fast memory access, known as **locality of reference** properties:

- **temporal locality**: if a program accesses a certain memory location, then it is likely to access this location again in the near future 
- **spatial locality**: if a program accesses a certain memory location, then it is likely to access other locations that are near this one

These localities have in turn given rise to two computer memory systems: 

- **virtual memory**: provides an address space as large as the capacity of the secondary-level memory and transfers into primary-level memory when addressed. Not limited to the constraint of internal memory size. The concept of bringing data into primary memory is known as **caching**
- **blocking**: if data stored at a secondary-level memory location l is accessed, then bring into primary-level memory a large block of contiguous locations that include the location l. Blocks of memory are known as **pages**

## (2,4) Trees and B-Trees

**Multi-way search tree**: ordered tree T that has the following properties:

- each internal node of T has at least two children. That is, each internal node is a d-node where $d \geq 2$
- each internal node of T stores a collection of items of the form (k,x)
- each d-node v of T with children $v_1,...,v_d$ stores d-1 items $(k_1,x_1),...,(k_{d-1},x_{d-1})$
- $k_0 = -\infty$ and $k_d = +\infty$. For each item (k,x) stored at a node in the subtree of v, rooted at $v_i$, i = 1, ..., d we have $k_{i-1} \leq k \leq k_i$

A multi-way search tree storing n items has n+1 external nodes

The running time for performing a search is $O(h)$. The space requirement for T is $O(n)$

### (2,4) Trees

Balance in (2,4) trees can be maintained by two properties:

- **size property**: every node has at most 4 children
- **depth property**: all the external nodes have the same depth

The height of a (2,4) tree storing n items is $\Theta(logn)$

The time to perform an insertion and a removal in a (2,4) tree is $O(logn)$

### (a,b) Trees

An (a,b) tree, where a and b are integers such that $2 \leq a \leq (b+1)/2$, is a multi-way search tree T with the following additional restrictions:

- **size property**: each internal node has at least a children, unless it is the root, and has at most b children
- **depth property**: all the external nodes have the same depth

The height of an (a,b) tree storing n items is $\Omega(logn/logb)$ and $O(logn/loga)$

An (a,b) tree implements an n-item dictionary to support performing insertions and removals in $O((g(b)/log )logn)$ time, and performing find queries in $O((f(b)/loga)logn)$ time

A B-tree with n items executes $O(log_Bn)$ disk transfers in a search or update operation, where B is the number of items that can fit in one block






