# Binary search trees (Solution)
Module Algorithms & Data structures | Chapter 1 | Notebook 4

Welcome to your fourth exercise on algorithms and data structures.
Here you will get to know a set structure: the binary search tree.
By the end of this exercise you will be able to: 
* Define and differentiate between the terms binary tree and binary search tree,
* Describe the binary search tree property, 
* Use the binary search algorithm to recursively implement methods for finding and inserting data.  
***


**Scenario:**
The online retail company management approaches you with another request:
They want customer data to be recorded in a system. They want a solution in Python because they hope it will be easy to integrate into existing processes. It is particularly important to management that personal data can be found quickly. As soon as a new account is created on the website, the person's user account should be added to the data system.


In our scenario, the arrangement of the elements in the data structure is unimportant. 
So, we don't need a sequential data structure like we did in our previous queue scenario.
This means that we do not have to manage our data in an order specified by the user. Instead, the data can be arranged in any structure. This gives us a little more leeway in terms of design. 
In the introductory text lesson on data structures, you already briefly learned about the hash table as a set structure. In Python, the data types `dict` and `set` are representatives of this data structure. In this exercise, we will look at another set structure that has no built-in representative in Python: the binary search tree.

There are mainly two operations that are important in our scenario; the quality of our data structure will be measured by its time complexity: 
* Finding existing user accounts  
* Adding new user accounts.

In this exercise, we will create a binary search tree with the associated methods and think about their runtime complexities.
But before we start, we will first look at what a binary search tree is and how it differs from the binary tree.


## Basic structure: the binary tree


The binary search tree is a special form of the binary tree. 
We will look at the difference between the binary search tree and other binary trees later.
First, we will define and implement the basic structure of a binary tree. 

Like the linked list, the binary tree is also a pointer-based data structure which, unlike the linked list, has two pointers: one to the left child and one to the right child.
The children are either empty or are binary trees themselves.
As with the linked list, the elements are filled with data. And the data structure class, in this case the tree, has a pointer to the start element. 
When we talk about tree structures, we call the elements *nodes* and the first element the *root* of the tree.
If both children of a node are empty, we call these nodes *leaves*.


For our scenario, we use the account username to identify the nodes.
The following diagram shows an example of a binary tree:
    
<img src="pyp_ads_nb4_binaerbaum_struktur.png" alt="Binary tree" style="width: 600px;"/>


We can of course store other information together with the user names.
The user name is used here to uniquely identify a user account and therefore also the node itself.
We call this a *key*.
Each user name can only be assigned once, otherwise we cannot uniquely identify the nodes.


Let's start by implementing the basic structure.
It will have a `DatabaseTree` class and a `DatabaseNode` class.

```python
class DatabaseTree: 
    """A basic binary tree data structure. 

    Attributes: 
        root (DatabaseNode): the root node. Defaults to None.

    """
    
class DatabaseNode: 
    """A Node in DatabaseTree. 
    
    Args: 
        name (str): unique user name. 
        *kwargs: additional information to be stored.

    Attributes: 
        left (DatabaseNode): left child. Defaults to None.
        right (DatabaseNode): right child. Defaults to None.
        name (int): unique user name
        add_info (dict): additional information to be stored.

    """
```


##### <font color="#3399DB">Task 1</font>
> Implement the binary tree for our data system.
> To do this, create a new Python script with the name *tree.py*. As always, you can use the cells here in the notebook to write your code.  Then check your code using the prepared unit tests.


In [None]:
#Solution 

class DatabaseTree: 
    """A basic binary tree data structure. 

    Attributes: 
        root (DatabaseNode): the root node. Defaults to None.

    """

    def __init__(self): 
        self.root = None
    

class DatabaseNode: 
    """A Node in DatabaseTree. 
    
    Args: 
        name (str): unique user name. 
        *kwargs: additional information to be stored.

    Attributes: 
        left (DatabaseNode): left child. Defaults to None.
        right (DatabaseNode): right child. Defaults to None.
        name (int): unique user name
        add_info (dict): additional information to be stored.

    """
    
    def __init__(self, name, **kwargs):
        assert isinstance(name, str)
        self.name = name
        self.add_info = kwargs
        self.left = None 
        self.right = None 

In [2]:
!pytest test_tree.py::test_node_init test_tree.py::test_node_init_errors

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.11.0, pluggy-0.13.1
rootdir: /home/jovyan/work/pyprog/stackfuel-python-programmer-product-de/module-05/chapter-01-solutions
plugins: anyio-3.5.0
collected 2 items                                                              [0m

test_tree.py [32m.[0m[32m.[0m[32m                                                          [100%][0m



We can use the basic structure for our use case. 
In order for the tree to become a binary search tree, the nodes need to be arranged in a certain way, which is called the *binary search tree property*. 
It ensures that we can search the tree with a binary search algorithm like the one we implemented in the first exercise.
If new nodes are added to the tree, the property should be retained. 
We learn about it in the next section.


## The binary search tree property


Our online retail business has a service center that deals with inquiries, about things like an order's status.
For each request, an employee searches for the corresponding user account via a search query in the system.  Our task is to make this search operation as fast as possible.
How could we systematically search a tree for a user account? 

Do you remember the binary search algorithm from the first exercise? 
In that lesson, we searched an alphabetically sorted library for a given title. To do this, we used a binary search algorithm that repeatedly divided the shelf into two parts. 
We can take a similar approach here! 
However, to do this, our nodes must be arranged in such a way that we can always know whether the node we are looking for is to the left or right of node we are visiting. 

This is exactly what the binary search tree property describes:  

* The keys of all the nodes in the left child are smaller than the key of the node itself.
* The keys of all the nodes in the right child are larger than the key of the node itself.

The binary search tree property distinguishes the binary search tree from the more general binary tree. Incidentally, it makes no assumptions about the distribution of nodes in the tree. In contrast to our book search example from the first exercise, the search space here is not necessarily divided into two equal parts.


Let's take another look at the illustration from above.
The nodes do not follow any particular order.
In the next task, you should sort the nodes so that the binary search tree property is fulfilled. 
Here is the tree again:

<img src="pyp_ads_nb4_binaerbaum_struktur.png" alt="Binary tree" style="width: 600px;"/>


##### <font color="#3399DB">Task 2</font>
> Sort the nodes in the diagram so that the binary search tree property is fulfilled.
> Leave the structure in place: To the left of the root there should be five nodes again in the same place and to the right there should be three.
> Write down your solution as comments in the following cell.


In [None]:
#Solution 
#No change: fred 
#Root: paolo

#vanessa --> alfred_10
#paolo --> emma22
#fred --> fred
#zoe --> george
#goerge --> hans1998
#alfred_10 --> paolo
#hans1998 --> rita
#rita --> vanessa
#emma22 --> zoe@g.de

In your rearranged tree, the node with the user name 'fred' should be in the same place as before. The corresponding user name of the root node is now `paolo`.
We recommend keeping your draft with the new arrangement. You'll need it again later!


## Finding a node in the tree


We now know the binary search tree property and how the nodes in our tree should be arranged.
If a user account is added, this property must be retained.
We'll take care of that in a second. 
Let's start by looking for a user account. We can assume that the binary search property is fulfilled. 
The search should be carried out using a `find()` method in our `DatabaseTree` class. 

``` python 
def find(self, name): 
    """
    Return DatabaseNode with provided user name. Return None if no node is found or if tree is empty.

    Args: 
        name (str): user name to be searched for. 

    Returns: 
       DatabaseNode with specified user name, or None if nothing found. 

    """
```


In the next task, you should first think about possible helper functions and the required variables. If necessary, take another look at your solution from exercise 1.
Just like in the first exercise, a recursive implementation is also suitable here.


##### <font color="#3399DB">Task 3</font>
> Write down the pseudocode for `find()` and any helper functions in the following cell.
> Try and use a recursive implementation.


In [None]:
#Solution 
#Pseudocode 

#find():
#input: name (str)
#call and return recursive helper function _find_rec with root node and name as input

#_find_rec()
#input: name (str), node (DatabaseNode)
#output: node (DatabaseNode) or None 

##base case 1: nothing found 
#if node is None: return None 

##base case 2: node found 
#if node.name is name: return node 

##recursive step: 
#if name < node.name: recursively call and return _find_rec() in left subtree 
#if name > node.name: recursively call and return _find_rec() in right subtree 


Now that your concept is ready, you can start implementing it.


##### <font color="#3399DB">Task 4</font>
> Implement `find()` and any required helper functions.
> Add the method for the `DatabaseTree` class to the script *tree.py*.
> Then test your code again with the prepared unit tests.


In [None]:
#Solution
class DatabaseTree: 
    """A basic binary tree data structure. 

    Attributes: 
        root (DatabaseNode): the root node. Defaults to None.

    """
    
    def __init__(self): 
        self.root = None

    def find(self, name): 
        """
        Return DatabaseNode with provided user name. Return None if no node is found or if tree is empty.

        Args: 
            name (str): user name to be searched for. 

        Returns: 
           DatabaseNode with specified user name, or None if nothing found. 

        """
        
        assert isinstance(name, str)
        return self._find_rec(self.root, name)
    
    def _find_rec(self, sub_root, name):
        #recursive helper function for find. Returns DatabaseNode object or None if nothing found. 
        if sub_root is None: #base case 1: nothing found
            return sub_root
        if sub_root.name == name: #base case 2: node found  
            return sub_root 
        if name < sub_root.name: #recursive step
            new_root = sub_root.left
        if name > sub_root.name: 
            new_root = sub_root.right
        return self._find_rec(new_root, name)

In [3]:
!pytest test_tree.py::test_find

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.11.0, pluggy-0.13.1
rootdir: /home/jovyan/work/pyprog/stackfuel-python-programmer-product-de/module-05/chapter-01-solutions
plugins: anyio-3.5.0
collected 1 item                                                               [0m

test_tree.py [32m.[0m[32m                                                           [100%][0m



Employees at the service center should now be able to get results quickly in response to their search queries.
But how quickly?
In exercise 1, we already thought a bit about the time complexity of a binary search.
The algorithm from exercise 1 always divided the shelf into two approximately equal parts.
Our binary search tree is not necessarily balanced: A tree can be longer on one side than on the other.
In the example tree from the diagram, we have to potentially search longer for nodes to the left of the root than for nodes on the right-hand side.


##### <font color="#3399DB">Task 5</font>
> If $h$ is the number of nodes from the root to the furthest child, what is the time complexity of `find()`?
> 1) $O(n)$
> 2) $O(h)$
> 3) $O(log \: n)$
> 4) $O(log \: h)$
> 
> Write down your answer as a comment in the following code cell. You will find the solution again in the following expandable box.


In [8]:
#Solution 
# O(h), as we would potentially have to visit every node along the longest path from the root
# O(log n) would be the efficiency in a balanced tree in which the path to the left and right of the root would have approximately the same length (+-1)

**Solution**: Time complexity of `find()`. 
<div class="details">
In the worst case, the user account you are looking for is located in the leaf that is furthest away from the root, or is not found there.
The distance is exactly the number of nodes that you have to traverse.
Because all the nodes you traverse are visited along the way, this results in a time complexity of $O(h)$.
If the tree were balanced, or approximately balanced, as in the book search example from exercise 1, the time complexity would again be $O(log n)$.
</div>


We achieve the best time complexity for `find()` with balanced binary search trees, because then the path from the root to the most distant child is as short as possible.
There are special subforms of the binary search tree that balance themselves as soon as nodes are inserted or removed.
*AVL-Tree* or *Red-Black Tree* are examples of this.
They use different mechanisms to balance themselves. 
As the balancing itself also uses resources, it's important to weigh up the pros and cons of using it.
In use cases like our scenario, where finding nodes is very important, the advantages of balancing usually outweigh the disadvantages. However, in this exercise we will forgo this balancing. In the next section, we will only focus on inserting nodes into `DatabaseTree` in a way that doesn't violate the binary search tree property. So, we have to just 'hope' that our binary search tree is automatically balanced to some extent.


## Inserting a node into the tree


We haven't set up our data system yet.
To do this, we need a method for adding new accounts.
We will create the method in the `DatabaseTree` class again.
Here is the docstring for the `insert()` method:

```python 
def insert(self, node): 
    """
    Insert DatabaseNode object into self. The binary search tree property must be preserved. 
    
    Args: 
        node (DatabaseNode): node to be inserted. 
    
    Returns: 
        None 
        
    """
```


We still have to answer one important question before implementation.
How can we insert a data point into our binary search tree without violating the binary search tree property? 
In principle, we proceed in a very similar way to the search:
We recursively search for the key to our new account.
We won't find it, because it has to be unique, i.e. it must not yet exist.
Instead, we will come across an as yet unoccupied space.
This is exactly where we can insert the account. 

The algorithm for `insert()` can be described as follows: 
1. If the binary search tree is empty, add the new account as a root to the tree and end the process. 
2. Otherwise, look at the user name of the root of the tree. If the user name of the new account comes before the user name of the root alphabetically, continue with the left child of the root. If it comes after, continue with the right child and call the corresponding child the root. Repeat all steps from 1.


Now it's your turn. In the next task, you should consider where new user accounts should be inserted in our example tree.
Look again at your draft of the corrected tree from the diagram above, which you created in task 2.


##### <font color="#3399DB">Task 6</font>
> Add user accounts with the user names 'rahel123' and 'ali_ali' to the corrected tree from task 2.
> Proceed as described in the algorithm.
> Whose left child is 'rahel123', whose right child is 'ali_ali'?
> Write down your results again as comments in the following cell.


In [None]:
#Solution 
#rahel123 ist linkes Kind von rita
#ali_ali ist rechtes Kind von alfred_10 

**Solution**: Inserting 'rahel123' and 'ali_ali' into the binary search tree. 
<div class="details">
'rahel123' should now be the left child of 'rita', 'ali_ali' should be the right child of 'alfred_10'. 
</div>


Now that we've looked at the basic procedure, we can prepare the implementation for `find()`. 


##### <font color="#3399DB">Task 7</font>
> In preparation, write down the pseudo code for `insert()` and any helper functions. Try to use a recursive implementation again.


In [None]:
#Solution 
#Pseudocode 

#insert()
#input: node (DatabaseNode)
#if tree is empty: set node as root 
#else recursively call helper function with root as subroot

#recursive helper function: _insert_rec()
#input: node (DatabaseNode), sub_root (DatabaseNode) (recursive variable)

#determine pos = next node (left child or right child)
##base case:  
#if pos is None: 
    #insert node to parent
    #stop process
    
#recursive process: 
#if pos is not None: 
    #recursively call _insert_rec() with pos (left or right child) as new sub_root

We are now ready and can implement `insert()`. 


##### <font color="#3399DB">Task 8</font>
> Implement `insert()`. Add the method to the `DatabaseTree` class again in your existing script *tree.py*.
> Then use the prepared tests again to check your code.


In [None]:
#Solution 
#See script

In [5]:
!pytest test_tree.py::test_insert test_tree.py::test_insert_empty

platform linux -- Python 3.8.10, pytest-6.2.4, py-1.11.0, pluggy-0.13.1
rootdir: /home/jovyan/work/pyprog/stackfuel-python-programmer-product-de/module-05/chapter-01-solutions
plugins: anyio-3.5.0
collected 2 items                                                              [0m

test_tree.py [32m.[0m[32m.[0m[32m                                                          [100%][0m



Outstanding. If a new user account is created on the website, the system calls the `insert()` method of the `DatabaseTree` class in the background. Finally, think again about the time complexity of `insert()`.


##### <font color="#3399DB">Task 9</font>
> If $h$ is the number of nodes from the root to the furthest child, what is the time complexity of `insert()`? 
> 
> 1) $O(n)$
> 2) $O(h)$
> 3) $O(log \: n)$
> 4) $O(log \: h)$
> 
> Write down your answer as a comment in the following code cell.


In [None]:
#Solution 
#like find(): 
#O(h) or O(log n) in a balanced tree 

You can see the solution to task 9 if you open the following box.


**Solution**: Time complexity of `insert()`. 
<div class="details">
In the worst case, you have to traverse the tree to the leaf with the greatest distance to the root in order to insert the new node there. 
Because all the nodes you traverse are visited along the way again, this also results in a time complexity of $O(h)$.
</div>


**Congratulations**:
You have implemented the data structure as a binary search tree. New user accounts can now be easily added and found in the system. You observe the structure of the system for a while to make sure that the tree is not too unbalanced.


**Remember**: 
* A binary tree is a data structure in which each node has a maximum of two children, which are also binary trees.
* If a binary tree fulfills the binary search tree property, we call this a binary search tree. 
* The binary search tree is a set structure. It does not manage data in a sequence specified by the user. 
* Binary search trees are well suited to searching for nodes with a key and inserting nodes.
* Balanced binary search trees are better at finding nodes than unbalanced ones. However, the balancing process itself uses resources when inserting and removing nodes.


***
Do you have any questions about this exercise? Look in the forum to see if they have already been discussed.
***
Found a mistake? Contact Support at support@stackfuel.com.
***
