# van Emde Boas Tree

## Why do we need van Emde Boas Tree?

We know about many datastructures that support priority queue operations
eg binary heaps, Red Black Trees, fibonacci heaps.

But in each of these atleast one among INSERT and EXTRACT-MIN takes $\Omega(logn)$ time.
A verification of the above conclusion is, because all of the above data structures are based on comparison between the keys and the lower bound for comparison based sorting is $\Omega(nlogn)$  
if we would have been able to perform INSERT and EXTRACT-MIN in $o(logn)$ then we could 
sort n keys in o(nlogn) using heap sort (n INSERTions followed by n EXTRACT-MIN operations).

But we also know that non-comparison based sorting techniques break this lower bound for sorting eg: Counting Sort takes $O(n+k)$ time to sort n elements in the range 0 to k.
Of course it is only true when integers(keys) are in a bounded range.

So can we improve priority queue operations when keys are in a bounded range?  
van Emde Boas Tree uses this idea to support priority-queue operations in $O(log log u)$ time
when all keys belong to the range from 0 to u-1.

## Operations  

van Emde Boas Tree supports the following operations:- 
  
`SEARCH(S,x)` : Search for x in the set S. Returns `True` if x exists in S, `False` otherwise.

`INSERT(S,x)` : Inserts element x into S.

`DELETE(S,x)` : Deletes element x from S, assuming that S contains x. 

`MINIMUM(S)` : Returns the minimum element in the set, None if S is empty. 

`MAXIMUM(S)` : Returns the minimum element in the set, None if S is empty. 

`SUCCESSOR(S,x)` : Returns the smallest element greater that x, `None` if x = `MAXIMUM(S)`

`PREDECESSOR(S,x)` : Returns the largest element smaller than x, `None` if x = `MINIMUM(S)` 

All these operations except MINIMUM and MAXIMUM run in $O(loglogu)$.  
MINIMUM and MAXIMUM are O(1) time operations.

## Real world applications:-  
- Routing packets to a subnet.
![alt text](img/router.jpeg)

Problem in routing packets :-


We know that the IP adresses are distributed in blocks where each block is a subnet. Each subnet has some range, for example assume Router C shown above has one of its ports corresponding to subnet A which has IP addresses in the range $[ 128.23.45.0 - 128.255.255.255 ]$. 

Whenever a packet arriving at the router has a destination address falling in above range, this packet has to be forwarded to the port corresponding to subnet A.Eg a packet with destination IP address 128.28.67.90 arrives at Router C ,since it falls in the range of subnet A, so it will be forwarded to port corresponding to the subnet A.  
Each router in the network may have multiple ports , and each port is used to forward the packet to a particular subnet.  
Every time a packet arrives at the router,using subnet mask its outgoing port number is calculated ,routing table dictates the port to which the packet has to be forwarded, this takes lot of time as millions of packets are processed in the network and the processing delay includes, using masking tool to get the network id and then looking up into the routing table ,although binary search trees come to rescue but still we can get rid of $Olog(n)$ by using van Emde Boas trees in $O(log log u)$ time.

Remedy :-


Now starting IP address of each range(subnet) is a node and we are required to find the predecessor of the IP address of the arriving packet to forward the packet to a port efficiently.

Eg. a packet with destination IP address 128.28.67.90 arrives at Router C ,prdecessor of this address is 128.23.45.0 which is the starting address of the subnet A, so it will be forwarded to port corresponding to the subnet A.vEB-tree makes the search exponentially faster than the BST.Successor and predecessor operations will take $O(loglogu)$ time where as balanced binary search tree takes $O(logn)$ time.  
 
In general van Emde Boas trees can be used anywhere in place of a normal binary search tree as long as the keys in the search tree are integers in some fixed range. Thus for applications where we are required to find the integer in a set that is closest to some other integer(predecessor or successor), using a vEB-tree can potentially be faster than using a simple balanced binary search tree.

As an example, of you have a linear layout of stores on some line and want to find the closest store to some particular customer, using a vEB-tree could make the search exponentially faster than the BST.

## Notations:  
T = van Emde Boas Tree  
n =  number of elements in vEB Tree  
u = range of elements \[0,u-1\]  
i.e., $ \forall x \in\mathbb T $;   $0<=x<u$  
For simplicity we assume that u is always an exact power of 2 i.e., $u=2^k$ where $k \in\mathbb Z^+$

## Background
Before using a van Emde Boas Tree we need to initialize the whole empty tree. For a universe of size u, space requirement of a vEB Tree is O(u) and creating such an empty tree takes O(u) time. Whereas creating an empty Red Black Tree takes constant amount of time. Therefore it would be a very bad idea to use vEB Tree when we need to perform only a small number of operations. Time spent in creating the datastructure would exceed the time spent on performing operations.

The following recurrence relation characterises the running time of various operations of vEB Tree  
T(u) = T($\sqrt{u}$) + O(1)  


Let m = log u,so that u = 2$^m$
now we have  
T(2$^m$) = T(2$^\frac{m}{2}$) + O(1)

now we rename T(2$^m$) to S(m)    
S(m) = S($\frac{m}{2}$) + O(1).

By case 2 of the master method, this recurrence has the solution  
S(m) = O(log m).  
Moving back from S(m) to T(u)  
S(m) = O(log m)  
=> T(2$^m$) =  O(log m)  
Replacing m = log u   
=> T(u) = O(log log u) 

## vEB Tree Structure
As clear from the recurrence relation above, we are dividing the universe size u to $\sqrt{u}$ recursively, but what if u is not of the form $2^{2^k}$ because then $\sqrt {u}$ won't be an integer. We solve this problem as follows:-  
Since universe size can be of the form $2^k$ for some  $k \in\mathbb Z^+$, and so u can be an odd power of 2 so  we divide the lg u bits of a number into the most significant $\displaystyle \left \lceil \frac{\lg u}{2} \right \rceil$ bits and the least significant $\displaystyle \left \lfloor \frac{\lg u}{2} \right \rfloor$ bits.  

### Implementation
Implementation wise we need to define two separate square roots.
We define upper root as $2 ^{ceil{(log(u) / 2)}}$ and lower root as $2 ^{floor{(log(u) / 2)}}$
```python
upper_root = 2 ** ceil(log2(u) / 2)  # eg 4
lower_root = 2 ** floor(log2(u) / 2)  # eg 2

class VEBTree:
    def __init__(self, u):
        self.u = u  # eg 8
        self.min = None
        self.max = None

        if u > 2:
            # unless u equals base size 2,
            # attribute summary points to a veb tree of size upper_root
            # and each cluster in cluster [0 ... upper_root -1 ] points to vEB Trees of size lower_root.
            # eg u = 8, upper_root = 4, lower_root = 2
            # summary point to a veb tree of size 4
            # cluster is an array of size 4
            # each element of cluster points to a veb tree of size 2
            # so cluster points to 4 veb(2) tree
            self.summary = VEBTree(upper_root)
            self.cluster = [VEBTree(lower_root) for _ in range(upper_root)]
```

>A vEB Tree structure contains u (the universe size), min (minimum value in the tree root at that structure) , max (maximum value rooted at that structure).  
In addition to these a non base case vEB structure also contains a summary pointer and an array of cluster pointers. Summary list contains a summary about which clusters contain atleast one key clusters themselves are vEB Trees of size $\sqrt{u}$.

> From the above implementation we see that the following recurrence relation characterizes the space requirement $S(u)$ of a van Emde Boas tree with universe size u:-  
$S(u) = (\sqrt{u} + 1) S(\sqrt{u}) + \Theta(\sqrt{u})$  
$1$: summary cluster of size $S(\sqrt{u})$.  
$\sqrt{u}$: sub clusters each of size $S(\sqrt{u})$.  
$\theta(\sqrt{u})$: pointers of clusters.  
Solving this recurrence gives $S(u) = O(u)$

## Finding minimum or maximum element in the set
### Algorithm
Finding minimum or maximum element is as simple as returning the stored minimum or maximum value in the cluster.  
Clearly both `MINIMUM` and `MAXIMUM` are constant time operations because the min and max values are cached in the vEB structure.

### Implementation
```python
def MINIMUM(V):
    return V.min


def MAXIMUM(V):
    return V.max
```


## Insertion into vEB Tree

### Implementation
```python
def INSERT_EMPTY(V, x):
    V.min = V.max = x


def INSERT(V, x):
    # V is an empty vEB Tree (Base case)
    if V.min is None:
        INSERT_EMPTY(V, x)
        return

    # else V is non empty
    else:
        if x < V.min:
            # If x < min, then x needs to become the new min.
            # But we don't want to lose the original min.
            # So we need to insert it into one of V's clusters.

            # exchange x and V.min
            x, V.min = V.min, x
            # now insert the original min (now x) into one of the V's clusters

        if V.u > 2:
            # Non base case

            # check whether the cluster that x will go into is currently empty
            # checking MINIMUM or MAXIMUM is sufficient to check for empty cluster
            if MINIMUM(V.cluster[V.high(x)]) is None:
                # insert x's cluster number into summary
                INSERT(V.summary, V.high(x))
                # insert x into the empty cluster
                INSERT_EMPTY(V.cluster[V.high(x)], V.low(x))

            else:
                # x's cluster is not empty
                # so we do not need to update the summary, since x's cluster number is already a member of the summary.

                # insert x into its cluster
                INSERT(V.cluster[V.high(x)], V.low(x))

        # update max
        if x > V.max:
            V.max = x
```


## Deletion from vEB Tree

### Implementation
```python
# assumes that x is currently an element in the set
# represented by the vEB tree V.
# this means if the tree contains only one key
# then no matter what value of x you pass to delete,
# the existing tree element will be deleted
def DELETE(V, x):
    # exactly one element in the tree
    if V.min == V.max:
        V.min = V.max = None

    # Base case: set min and max to the one remaining element.
    elif V.u == 2:
        # if exactly 2 elements exist
        if x == 0:  # and the key to be deleted is 0
            # then set min and max to key 1
            V.max = V.min = 1
        else:  # else key to be deleted is 1
            # so set min and max to key 0
            V.max = V.min = 0

    else:
        # we will have to delete an element from a cluster

        if x == V.min:  # we need to delete the min element
            # but before that we need to find the new min which is some other element within one of V's clusters

            # first_cluster = the cluster id that contains the lowest element other than min
            first_cluster = MINIMUM(V.summary)  # cluster id of cluster containing new min

            # x = lowest element in the found cluster
            x = V.index(first_cluster, MINIMUM(V.cluster[first_cluster]))

            # x becomes new min
            V.min = x

            # now x will be deleted from its cluster

        # Now we need to delete element x from its cluster,
        # whether x was the value originally passed to DELETE()
        # or x is the element becoming the new minimum.
        DELETE(V.cluster[V.high(x)], V.low(x))  # delete x from its cluster

        # That cluster might now become empty
        if MINIMUM(V.cluster[V.high(x)]) is None:
            # if it does, then we need to remove x's cluster number from the summary,
            DELETE(V.summary, V.high(x))

            # After updating the summary, we might need to update max if x is max
            if x == V.max:  # check whether we are deleting max element of V

                # summary_max = the number of the highest numbered nonempty cluster.
                # This works bcoz we have already recursively called DELETE on V.summary
                # and so V.summary.max has already been updated.
                summary_max = MAXIMUM(V.summary)

                if summary_max is None:
                    # If all of V's clusters are empty, then the only remaining element in V is min
                    V.max = V.min  # update max accordingly

                else:
                    # else set max to the maximum element in the highest numbered cluster
                    V.max = V.index(summary_max, MAXIMUM(V.cluster[summary_max]))

        # else if the cluster did not become empty (there is at least one element in x's cluster even after deleting x.)
        # then although we do not have to update the summary in this case, we might have to update max.
        elif x == V.max:  # if max element was deleted, update max
            V.max = V.index(V.high(x), MAXIMUM(V.cluster[V.high(x)]))

```

> For full source code of vEB Tree implementation see [src/datastructures/vEB_Tree.py](src/datastructures/vEB_Tree.py) 

# Kruskal

![alt text](img/comparision.png)
