## Ball Tree

## Learning objectives

After reading this notebook, students will be able to:

- Explain importance of Ball Tree,
- Exemplify ball Tree construction algorithm.
- Search a nearest neighbour using Ball Tree algorithm.
- Point the pros and cons of a ball tree.

__Recap__

In previous chapter, We discussed KD trees that split two-dimensional data using the median value. Imagine similar splitting, but in a 1000 dimensions. Due to Kd-tree's inefficient space and time complexity in higher dimension, time required for computing the tree will be exponentially large.It is also called curse of dimension. In curse of dimensionality all the points seems to be close to each other. It is difficult to calculate distance between points. If you cannot calculate distance you can't find nearest point.


 In this section, we will discuss a slightly different version of the tree algorithm for the nearest neighbor search called the ball tree.






<center>
<img src="https://i.postimg.cc/J4xpWf9s/image.jpg" height=500/>
<figcaption> Figure 1: 2D manifold in 1000 dimension. </figcaption>
</center>


Suppose your dataset is in 2D manifold embedded in high dimension say 1000. The "X"  points in are confined so every point are close to each other. In this case K-D Tree will fail to find your neighbour. However, Ball tree will do your job.





## Ball Tree

__Introduction to a Ball tree__


Ball Tree is a binary tree in which each __node or a ball represents a set of data points,__ $\text{Points(Node)}.$ In a binary tree there are three types of nodes: root node, internal node and leaf node. Given a dataset, the root node of a ball tree represents the full set of points in the dataset.
Internal node is also called non leaf node. An internal node contains __left child__ $\text{(Node.child1)}$ and __right child__ $\text{(Node.child2)}.$








<center>
<img src="https://i.postimg.cc/fW23n58w/image.png" height=450/>
<figcaption> Figure 2: Ball Tree</figcaption>
</center>

__Properties of Ball Tree:__

1. $ \text{Points(Node.child1)} ~~{\bigcap} ~~ \text{Points(Node.child2)} = \phi $
    * Points on $\text{child1}$ and points on $\text{child2}$ of a node don't intersect.

2. $\text{Point(child1)} ~~\bigcup ~~ \text{Point(child2)} = \text{Points(Node)}$

    * If you take union of points inside in $\text{child1}$ and $\text{child2}$  it will give you points in a Node.



3. Each node/ball saves its pivot point so-called centroid and the radius of a ball.

4. Each node records the the farthest point from it's pivot, $f$. It is taken as a **radius of a ball**.

$$
\text{Node.Radius} = \max_{\text{f} \in \text{Points(Node)}} |\text{Node.Pivot - f}|
$$

5. If ${x \in \text{Points(Node)}}$ then distance from a query point $q$ to any point in any ball tree node is given by:

$$ |\text{x-q}| \geq |\text{q - Node.Pivot}| - \text{Node.Radius} $$

$$ |\text{x-q}| \leq |\text{q-Node.Pivot}| + \text{Node.Radius} $$









<center>
<img src="https://i.postimg.cc/7hFwPCCN/image.png"/>
<figcaption> Figure 2: Triangle inequality </figcaption>
</center>

## Ball Tree Construction Algorithm

Let's formulate ball tree algorithm using example. For ease, Each data point is given a numerical name 1-10.


<center>
<img src="https://i.postimg.cc/g0CCZ9cY/image.png" height=500/>
</center>

The points on the above image are enclosed inside a big ball/sphere. It is also called the root of the ball tree.


<center>
<img src="https://i.postimg.cc/FK3PwYm4/image.png"/>
</center>


Let's forget this big ball and just foucs on points. The algorithm of ball tree starts as follows.

1. The ball tree algorithm randomly finds a point from the given set of points.
    * Suppose in our case it is point number 4.





<center>
<img src="https://i.postimg.cc/BnxH0H8M/image.png" height=500/>
</center>


2. After selecting a point it finds a farthest point from selected point. From the farthest point found find another farthest point.
    * In our case the farthest point form 4 is 10.
    * The farthest point from 10 is 1.



<center>
<img src="https://i.postimg.cc/N0s9vVvj/image.png" height=500/>
</center>


3. Draw a vector between two farthest point you just identified.
    * Draw a vector from 1 to 10.

<center>
<img src="https://i.postimg.cc/X7gCJFcJ/image.png" height=500/>
</center>


4. After finding a vector project all point on that vector.

<center>
<img src="https://i.postimg.cc/jdjsH7YC/image.png" height=500/>
</center>

5. Find the median of data on that vector.

<center>
<img src="https://i.postimg.cc/mkt5DRKZ/image.png" height=500/>
</center>


The median represented by the yellow marker sucessfully divides your dataset into two halves. Now you need to include data points in two subtree.

6. Find mean of data points or centroid on each half.
    * In our case centroid is $c_1$ and $c_2$.


<center>
<img src="https://i.postimg.cc/CMR0dVGf/image.png" height=500 />
</center>

In the given figure $c_1$ represents the centroid of data points on left to the median and $c_2$ represents the centroid of data points on right to the median.

Note centroid is calculated taking the mean of each coordinate. The centroid is often called a pivot. Now you need to enclose all points on the left and right of the median using the pivot value.

7. After finding the pivot find the farthest point from the pivot. Draw a radius and create a sphere.
    * On the left half, for $c_1$ farthest point is 1.Draw radius $r_1$ from the pivot and construct a sphere. Save the value of radius and pivot.
    * On the left half, for $c_2$ farthest point is 10.Draw radius $r_2$ from the pivot and construct a sphere. Save the value of radius and pivot.





<center>
<img src="https://i.postimg.cc/W3QHhxVM/image.png" height = 500 />
</center>


Now you have two big balls that encloses all your dataset. The left ball is left subtree and the right ball is right subtree.
Note ball can intersect. If a data point is in both balls. It is usually assigned to ball with nearest centroid. It completely depends on programmers choice.

The  time  complexity  of  each  split  is $O(n)$  where $n$ indicates  the number of data points in the parent partition.

8. This process is recursive till defined depth, given by programmer. I will use depth = 2.  If you split data on left and right ball again then after repeating same steps. The final structure looks like a given figure.


<center>
<img src="https://i.postimg.cc/FshxDhTY/image.png" height = 500 />
</center>



Lets plot a tree of the graph. The A on root node represents all data points. We have 10 data points in our dataset.


<center>
<img src="https://i.postimg.cc/50MzymGX/image.png"  />
</center>


__Summary of the algorithm__

1. Select a random point $x_t$ form your dataset.

2. Find a farthest point from $x_t$ say $p_1.$

3. Find a farthest point from $p_1$ say $p_2.$

4. Draw a vector from $p_1$ to $p_2.$ say $\vec{p}.$

5. Project all points on the vector $\vec{p}$ and find median point of projected points. The median divides data points into two halves.

5. Find the centroid/pivot of each half. Draw a cirlce/sphere or hypersphere from pivot to farthest point in each half.

6. Repeat the step in each halves until given depth of tree.

## Nearest neighbour search in the Ball Tree.

The nearest neighbor search proceeds by traversing the tree and computing a distance between the query and the center of each node’s sphere. The tree traversal methodology is a greedy depth-first search.


Let $PS$ be a set of data points, and $PS⊆V,$ where $V$ is the training set.



$$\text{q = query}$$ and

$$\text{k = Number of Nearest Neighbors}$$

$PS$ consists of the K-Nearest Neighbors(K-NN) of $q$ in $V$ iff

* $(|V| \geq k $ and $PS$ are the K-NN of $q$ in $V$) or
* ($|V| < k $ and $PS==V)$




__Problem__



$$V = \{1,2,3,4,5,6,7,8,9,10\},$$

$$k = 2$$

* Assume that $PS^{in}$ is a variable that consists of the candidate $\text{K-NN}$ of $q$ in $V.$
* Assume that $PS^{out}$ is a variable that consists of the Nearest neighbor of $q.$

Define a recursive function:

$${PS}^{out} = BallKNN(PS^{in} , Node, k)$$

__Root Node__

Let $ PS = V $


Therefore,
$$|V| > k$$
$$PS^{in} = PS $$   





<center>
<img src="https://i.postimg.cc/TwF3h2hC/image.png"  height=500/>
</center>


__Internal Node__


$$\text{B = \{1,2,3,4,5\},}$$
$$\text{ C = \{6,7,8,9,10\}}$$

Calculate the distance between $q$ and the centroid of B and C.

* $B$ is close to $q$ and $C$ is far from $q.$


$B$ is not a leaf node therefore $PS^{in} = B $




<center>
<img src="https://i.postimg.cc/4NMQZpJ9/image.png"  height=500/>
</center>


__Leaf Node__

$$\text{B.child1 = \{1,2,3\},}$$

$$\text{B.child2 = \{4,5\}}$$

Calculate the distance between $q$ and the centroid of $\text{B.child1}$ and $\text{B.child2}.$


* $\text{B.child2}$ is nearest therefore $PS^{in} = \{4,5\} $


<center>
<img src="https://i.postimg.cc/cLhYCYWh/image.png"  height=500/>
</center>








__Calculate the distance between each point in $PS^{in}$ and query point and find k minimum distances.__


Here $$k=2$$
${PS}^{out}  = {PS}^{in}  = \{4,5\} $  

__Results__

$\text{2 Nearest Neighbor}$ of $q$ are:

 $${PS}^{out} = \{4,5\}$$

## Pros and Cons of Ball Tree


__Pros of ball tree:__

* Ball-trees tend to still work if data exhibits local structure (e.g. lies on a low-dimensional manifold).



__Cons of ball tree:__

Ball-tree  splits  that we made above have  two  shortcomings:  



* First, Splitting  a  partition:  You divide data into two halves; however, the partition doesn't have the same number of data points. The number of points assigned to each sub-partition is not taken into account. The resulting tree is imbalanced.

* Second, Sensative to outliers:  The hyperplane is determined by two farthest points, completely ignoring the distribution of points. It makes the ball tree very sensitive to outlier data points.

* It is also affected by curse of dimension.

## Key Take-Aways

* KD-Tree performs worst in higher dimensional data. The solution is to use Ball tree.

* The ball Tree is a binary tree in which each node represents a set of points.
* Points inside a child of a ball tree don't intersect. If they do, it is assigned to one node.

$$ Points(Node.child1) ~~{\bigcap} ~~Points(Node.child2) = \phi $$
* The union of points in the child node gives points in a node of a ball tree.

 $$Point(child1) ~~\bigcup ~~ Point(child2) = Points(Node)$$


* Each node records a pivot called centroid and radius.
* The ball tree node is constructed by a splitting plane in two halves.  
* Sometimes the Ball tree is Imbalanced.
* Ball tree is sensitive to outliers.
* Ball-trees work if data exhibits local structure (e.g., lies on a low-dimensional manifold).

