# Simple Data Structures

In real life, we need to store big amounts of data, and do certain operations efficiently. Then, we need *smart* ways of storing them.

These ways of storing the data are called Data Structures.

First, we're gonna see linear data structures, and then we'll move to a non-linear data structure which is very common.

The data structures that we are gonna see are graph-like shaped. Each datum will represent a vertex of the graph, and there will be connections that will guide the search in case of some queries.

## Linked-List

In liked list, we keep each vertex next to the other, and each one contains a link to the one to its right.

<img src = https://www.alphacodingskills.com/imgfiles/linked-list.PNG>

A simple implementation would be:

In [7]:
class LinkedList:
    class Node:
        def __init__(self, value = None):
            self.value = value
            self.next = None
        
    def __init__(self):
        self.head = self.Node()

    def __recursive__append(self, node, value):
        if node is None:
            return self.Node(value)

        node.next = self.__recursive__append(node.next, value)
        return node 

    def append(self, value):
        self.head = self.__recursive__append(self.head, value)
    
    def pop(self):
        assert self.head.next is not None, 'The list is empty.'
        self.head = self.head.next

    def __iter__(self):
        node = self.head.next
        while node is not None:
            yield node.value
            node = node.next 


L = LinkedList()
L.append(3)
L.append(5)
L.append(-1)

for i in L:
    print(i, end = ' ')
print()

L.pop()

for i in L:
    print(i, end = ' ')
print()

3 5 -1 
5 -1 


## Stacks

Stacks work exactly as the name say, as a stack of cards.
- Put elements on the top.
- Pop elements from the top.
- Unable to refer to or erase elements from the bottom or internal positions.

We call it **LIFO** (Last In - First Out) protocol or rule.

- [Stack Visualiztion with Linked List](https://www.cs.usfca.edu/~galles/visualization/StackLL.html)

- [Stack Visualization with Array](https://www.cs.usfca.edu/~galles/visualization/StackArray.html)

### Implementation with arrays

In [15]:
class Stack:
    def __init__(self):
        self.arr = []

    def push(self, x):
        self.arr.append(x)

    def pop(self):
        self.arr.pop(-1)

    def top(self):
        return self.arr[-1]

    def __len__(self):
        return len(self.arr)

    def empty(self):
        return self.__len__() == 0

S = Stack()
S.push(3)
S.push(10)
S.push(20)

while len(S) > 0:
    print(S.top(), end = ' ')
    S.pop()

print()

20 10 3 


### Problem I: Reversed polish notation.

[Link to wikipedia explaining what it is.](https://en.wikipedia.org/wiki/Reverse_Polish_notation)

We are gonna have a stack where we'll keep track all the numbers that are computed so far. Then, we iterate through the sequence from left to right, and each time that we receive a new operator, we take the last two elements from the stack, pop them, perform that operation with those two numbers and then push the resulting value to the stack.

In [13]:
def reversed_polish(sequence: str) -> int:
    sequence = sequence.split()
    S = Stack()
    for x in sequence:
        if x.isnumeric():
            x = int(x)
            S.push(x)
        else:
            a = S.top()
            S.pop()

            b = S.top()
            S.pop()

            
            if x == '+':
                c = a + b
            elif x == '-':
                c = a - b 
            elif x == '*':
                c = a * b 

            S.push(c)

    return S.top()


print(reversed_polish("3 4 + 5 - 10 *"))
print(reversed_polish("3 4 + 5 6 - *"))

-20
7


### Problem II: Checking if a bracket sequence is correctly balanced.

A balanced bracket sequence is a string consisting of only brackets, such that this sequence, when inserted certain numbers and mathematical operations, gives a valid mathematical expression.

For example: `[]([][])` is correct, but `[]{{}` is not.

**Solution**:
Let's start with an empty stack, and let's go from left to right. This stack will keep all the open brackets pending to be closed. From left to right, each time we receive an open bracket, we push it into the stack; and whenever we see a closed bracket, it must close the last pending open bracket in the stack. Thus, if when we see a closed bracket, the stack is empty or the top element in stack is not the corresponding one, it means the sequence is not balanced. If at the end of the processs, the stack is not empty, then the sequence is not balanced either. Otherwise, it is balanced.

In [16]:
S = Stack()
sequence = input()

match = {
    '(': ')',
    '{': '}',
    '[': ']'
}

for c in sequence:
    if c in {'(', '[', '{'}:
        S.push(c)
    else:
        if S.empty() == True or match[S.top()] != c:
            print('Sequence not balanced')
            exit(0)
        S.pop()

if S.empty() == False:
    print('Sequence not balanced')
else:
    print('Sequence balanced')
            



Sequence balanced


## Queues

Queues work as queues in the super market.
- Put elements at the end.
- Pop elements from the begining.
- Unable to refer or erase elements from internal positions.

We call it **FIFO** (First In - First Out) protocol.


- [Queue with LinkedLists visualiztion](https://www.cs.usfca.edu/~galles/visualization/QueueLL.html)
- [Queue with Arrays visualization](https://www.cs.usfca.edu/~galles/visualization/QueueArray.html)

They're useful because they allow us to schedule processes.

### Implementation with array

In [17]:
class Queue:
    def __init__(self):
        self.head = 0
        self.arr = []

    def push(self, value):
        self.arr.append(value)

    def front(self):
        assert not self.empty(), 'Queue is empty.'
        return self.arr[self.head]

    def pop(self):
        assert not self.empty(), 'Queue is empty.'
        self.arr[self.head] = None
        self.head += 1

    def __len__(self):
        return len(self.arr) - self.head

    def empty(self):
        return self.__len__ == 0

Q = Queue()
Q.push(3)
Q.push(8)
Q.push(5)
Q.push(11)
Q.push(7)

print(Q.front())
Q.pop()
print(Q.front())

3
8


### Python Built-in implementations

- `collections.deque`
- `queue.Queue`

For our proposes, we're gonna use the `deque` from collections. Something to remark is that a deque is a double ended queue, which means that we can append on each side and erase from both sides.

[Python's Deque documentation](https://docs.python.org/3/library/collections.html#collections.deque)

In [18]:
from collections import deque 

Q = deque()

Q.append(3)
Q.append(5)
Q.append(7)

Q.appendleft(2)

for i in Q:
    print(i, end=' ')
print()
Q.pop()

for i in Q:
    print(i, end = ' ')
print()
Q.popleft()


for i in Q:
    print(i, end = ' ')
print()


2 3 5 7 
2 3 5 
3 5 


## Full or Almost Full Binary Trees (Heaps)

So far, all the data structures seen are linear, and they are quite limited in the features they bring. For example, if we want to be able to compute the minimum across all the elements, all of them require $\Omega(n)$ time to find it, because they don't have any information related to the values of the elements.

To be able to do so, we need to use more information, or structure the data in a smarter way. That's why we have Binary Trees.

In a binary tree, one of the vertices is the root. And each vertex has at most 2 children.

A full binary tree is a binary tree where every level is completely full, (i.e.: the level 0 contains 1 vertex, the level 1 contains 2 vertices, the level 2 contains 4 vertices, ..., the level $i$ contains $2^i$ vertices).

Suppose a full binary tree has $t$ levels and $n$ vertices. Then $2^0 + 2^1 + \dots + 2^{t-1} = n \implies 2^{t} - 1 = n \implies t = \mathcal{O}(\log n)$.
So, a full binary tree has logarithmic height with respect to its size.

When we are inserting vertices to a full binary tree, if the last level is completely full, we insert the new vertex at the begining of the next level. If it's not completely full, we then insert it in the first free position of the last level.

![Full Binary tree picture](full_binary_tree_1.png "Binary Tree 1")

In this picture, if we want to insert a vertex $7$, we would need to insert it as a child of the vertex 3 in a new level.

![Almost Full Binary tree picture](almost_full_binary_tree_1.png "Binary Tree 2")

In this other picture, if we want to insert a vertex $9$, we would need to insert it as a child of the vertex 4 in the deepest level. At this moment, the tree is not full, and we call almost full binary tree.

**Something to notice:**
- We are indexing the vertices level by level, from left to right. That way, if a vertex has index $u$, then its left child has index $2u + 1$ and its right child has index $2u + 2$.
- Then, if a vertex has index $u$, its parent has index $\lfloor\frac{u - 1}{2}\rfloor$. For example, the vertex $7$ has parent $\lfloor\frac{7 - 1}{2}\rfloor = \lfloor\frac{6}{2}\rfloor = 3$, and the vertex $8$ has parent $\lfloor\frac{8 - 1}{2}\rfloor = \lfloor\frac{7}{2}\rfloor = 3$.

We can actually store the tree as a simple array, where the position makes reference to the index of the vertex, and the value at that position would be the value stored in the corresponding vertex.
That way, the root of the tree would be the value at position $0$ on that list, and the last vertex would be the value at the end of the tree.

### Min Heap

So far, we don't really have any information about which value is less than any other, and apparently, we just have a list that contains the values in the same order as they're inserted.
Well, we'll keep the following invariant in the tree:

*For any vertex $u$, it's hold that $val(u) \le val(2u)$ and $val(u) \le val(2u + 1)$*.
This means that any vertex has a smaller value than all the vertices below it in the tree.

That way, the minimum value on the tree is at the root, so we can get the minimum in constant time.

Now, when we insert a new element, we need to make sure that this invariant keeps being true. Therefore, we need to change the structure of the tree a little bit, so that it remains true.
What we're going to do is a method that we call `heapify upwards`.
We start from the inserted vertex at the bottom, and continuely swap it with its parent if it happens that the vertex has a smaller value than its parent.

Because the height of the tree is $\mathcal{O}(\log n)$, any vertex has at most $\mathcal{O}(\log n)$ ancestors.

What happens if we want to pop the minimum value (i.e.: erase the root from the tree). Then, we need to merge the two resulting subtrees.
Instead of that, let's swap first the root's value with the last node's value. Then just erase the last node from the tree, which doesn't affect the structure of tree. Now, the value at the root is not smaller than its children's, and therefore we need to swap it a few times so that we keep the invariant property true. This is a method that we call `heapify downwards`, and it does the following:
- We're at some vertex $u$.
- Then we find the child of $u$ with the smallest value. Let's call that child $x$.
- If that value is greater than $val(u)$ then it's fine, and we stop.
- Otherwise, we swap $val(x)$ and $val(u)$ and then move downwards towards $x$, and continue doing the same process with $x$.

At the end, the tree will remain as a heap, and because the height is logarithmic, this is also very efficient.

#### Implementation

In [19]:
def parent(v):
    return (v - 1) // 2

def left(v):
    return 2 * v + 1

def right(v):
    return 2 * v + 2

In [20]:
class Heap:
    def __init__(self):
        self.lst = []

    def swap(self, pos_1, pos_2):
        '''
        Swaps the values at position `pos_1` and `pos_2` on the list
        '''
        self.lst[pos_1], self.lst[pos_2] = self.lst[pos_2], self.lst[pos_1]


    def heapify_down(self, pos: int ) -> None:
        '''
        Swaps the value at the root as long as it's greater than its two children
        '''
        v = pos

        while v < len(self.lst):
            l = left(v)
            r = right(v)

            if l >= len(self.lst):
                break 
            
            idx = -1

            if r >= len(self.lst) or self.lst[l] <= self.lst[r]:
                idx = l
            else:
                idx = r
            
            if self.lst[idx] < self.lst[v]:
                self.swap(idx, v)
                v = idx
            else:
                break 

    def heapify_up(self, v: int) -> None :
        '''
        Swap the vertex v with its ancestors as long as it is greater than them
        '''
        while v != 0:
            pv = parent(v)

            if self.lst[pv] > self.lst[v]:
                self.swap(pv, v)
                v = parent(v)
            else:
                break

    def insert(self, value: int) -> None:
        self.lst.append(value)
        self.heapify_up(len(self.lst) - 1)

        
    def pop(self, pos: int = -1) -> None:
        '''
        Pops the value at position `pos`
        '''
        if pos == -1:
            pos = len(self.lst) - 1
        assert not self.empty(), 'The heap is empty, thus we cannot erase anything'
        assert pos >= 0 and pos < len(self.lst), 'Position should be in [0, len(lst))'

        self.swap(pos, -1)
        self.lst.pop(-1)

        if not self.empty():
            self.heapify_down(pos)
            self.heapify_up(pos)

    def __len__(self):
        return len(self.lst)

    def empty(self):
        return self.__len__() == 0




So, notice that we're able to insert values, and pop the minimum in $\mathcal{O}(\log n)$ time, and get the minimum in $\mathcal{O}(1)$ time. In order to build a heap from an existing list, we can just insert the elements one by one, taking in total $\mathcal{O}(n\log n)$ time.

#### Built-in implementations

Fortunately, Python already contains the library `heapq` which contains all these operations, and also it builds the heap in linear time.

In [24]:
import heapq
import numpy as np 

x = np.random.randint(10, size = 5)
l = list(x)

print('Before heapifying:', l)

heapq.heapify(l)
print('After heapifying:', l)

heapq.heappop(l)
print('After erasing the minimum:', l)

heapq.heappush(l, -1)
print('After inserting a new minimum:', l)


Before heapifying: [1, 7, 3, 2, 1]
After heapifying: [1, 1, 3, 2, 7]
After erasing the minimum: [1, 2, 3, 7]
After inserting a new minimum: [-1, 1, 3, 7, 2]


### Max Heap

We saw a heap where every vertex has a value smaller than its children, and that way we're able to compute the minimum in constant time.
*What if we want to compute the maximum?*

Well, we can have a **Max Heap**, in which every vertex has a bigger value than its children, and it would work exactly in the same way. Or we can also just keep a min heap with the negative values of the original values; so instead of having a list `[1, 4, 2]`, we would have `[-1, -4, -2]`.