# Algorithms and Data Structures in Python

## 1. Python Primer

### Miscellaneous notes:

* The keys of ```dict``` are ordered since Python 3.6
* > The ```and``` and ```or``` operators *short-circuit*, in that they do not evaluate the second
operand if the result can be determined based on the value of the ﬁrst operand
* ```a is b``` checks if identifiers ```a``` and ```b``` are aliases of the *same* object, whereas ```a == b``` checks of the objects identified by both identifies are deemed to be equivalent
* Operators can be used with sets: ```<```/```<=```, ```|```, ```&```, ```^``` and ```-```
* ```list += [4, 5]``` extends ```list``` whereas ```list = list + [4, 5]``` reassign ```list``` to a new list

### Generators

An interesting example with multiple ```yield``` statements:

In [1]:
def factors(n):
    k=1
    while k * k < n:
        if n % k ==0:
            yield k
            yield n//k
        k += 1
    if k * k == n:
        yield k

list(factors(100))

[1, 100, 2, 50, 4, 25, 5, 20, 10]

### Comprenhension

Not for lists only!

In [1]:
# Generators for squares up to integer n:

n = 100

lc = [k*k for k in range(1, n+1)] # List comprehension
sc = {k*k for k in range(1, n+1)} # Set comprehension
gc = (k*k for k in range(1, n+1)) # Generator comprehension
dc = {k: k*k for k in range(1, n+1)}  # Dictionary comprehension

# 2. OOP

General principles:
* Modularity (decomposition in separate functional units)
* Abstraction, e.g. Abstract Data Types (ADT) which specifies **what** operations do, not **how**
* Encapsulation

*Getters*/*Setters* are not really considered to pythonic. Use ```@property```, ```@attribute.setter``` and ```@attribute.deleter``` as instead:

```python

# circle.py
class Circle:
    
    def __init__(self, radius):
        self._radius = radius

    @property
    def radius(self):
        """The radius property."""
        print("Get radius")
        return self._radius

    @radius.setter
    def radius(self, value):
        print("Set radius")
        self._radius = value

    @radius.deleter
    def radius(self):
        print("Delete radius")
        del self._radius
```

* 'Magic' a.k.a *dunder* methods: if ```__str__``` is not overridden, then it is the same as ```__repr__``` by default

## Testing:

* *Stubbing*: in a top-down approach, replace the output of a function B called inside function A by a fixed value.
* Unit testing is actually a bottom-up strategy

## Miscellaneous

```__slots__```: see [here](https://stackoverflow.com/questions/472000/usage-of-slots)

# 3. Complexity

## Prefix averages

In [10]:
# Quadratic complexity

def prefix_average_1_0(S):
    averages = []
    for j in range(len(S)):
        total = 0
        for i in range(j+1):
            total += S[i]
        averages.append(total/(j+1))
    return averages

def prefix_average_1_1(S):
    averages = []
    for j in range(len(S)):
        averages.append(sum(S[:j+1])/(j+1))
    return averages

def prefix_average_2(S):
    averages = []
    sum = 0
    for j in range(len(S)):
        sum = sum + S[j]
        averages.append(sum/(j+1))
    return averages

prefix_average_1_0([1, 1, 2]), prefix_average_1_1([1, 1, 2]), prefix_average_2([1, 1, 2]), 

In [18]:
def disjoint_1(A,B, C):
    for a in A:
        for b in B:
            for c in C:
                if a == b == c:
                    return False
    return True

def disjoint_2(A,B, C):
    for a in A:
        for b in B:
            if a == b:
                for c in C:
                    if a == c:
                        return False
    return True

disjoint_1((1,2,3), (4,1,6), (1,8,9)), disjoint_2((1,2,3), (4,1,6), (1,8,9))

(False, False)

## Amortized complexity vs average-case complexity

* Amortized complexity considers the total complexity of a sequence of operations rather than one operation repetead n times (where n would be the length of the said sequence): see [Wikipedia](https://en.wikipedia.org/wiki/Amortized_analysis) or [Stackoverflow](https://stackoverflow.com/questions/7333376/difference-between-average-case-and-amortized-analysis)

* Average-case complexity considers all possible inputs and makes assumption about their distribution

# 4. Recursion

It is important to ```return``` the recursive call of the function, otherwise the first call ends returning ```None```.

In [84]:
# Sequential Search vs Binary Search

def sequential_search(data, target):
    i = 0
    while i < len(data):
        if data[i] == target:
            return i
        i += 1
    return -1

def binary_search(data_ordered, target, low = None, high = None):
    '''
    Return -1 if `target` is not in the ordered list of unique elements `data_ordered`, else the index of `target` in `data_ordered`
    '''
    
    if low is None: # no need to test both `low` and `high`, neither are None or they are both None
        low, high = 0, len(data_ordered)-1
    if low > high: # Works for empty lists as well
        return -1
    else:
        mid = (low + high) // 2
        if data_ordered[mid] == target:
            return mid
        elif data_ordered[mid] < target:
            return binary_search(data_ordered, target, low = mid + 1, high = high)
        else:
            return binary_search(data_ordered, target, low = low, high = mid - 1)

binary_search([1,2,3], 1), binary_search([1,2,3], 2), binary_search([1,2,3], 3), binary_search([1,2,3], 4), binary_search([1,2,3], -2), binary_search([42], 42), binary_search([], 42)

(0, 1, 2, -1, -1, 0, -1)

In [33]:
import os

def disk_usage(path = os.getcwd()):
    total = os.path.getsize(path)
    if os.path.isdir(path):
        for filename in os.listdir(path):
            total += disk_usage(os.path.join(path, filename))
    print(f'Size: {total:0.0f} | {path}')
    return total

## Tail recursion to non-recursive

*Not covered*

### Binary search

*Not covered*

# 5. Arrays

In [46]:
# Insertion sort

def insertion_sort(A):
    for i in range(1, len(A)):
        current = A[i]
        j = i-1
        while j >= 0 and A[j] > current:
            A[j+1] = A[j] # Here A[i] can be erased, that is why it was saved in `current`
            # No value after index i can be lost because j < i
            j -= 1
        # A[j] <= current - we exited the while loop.
        # A[j+1] has been pushed to A[j+2] in the previous step,
        # so we can overwrite `current`
        A[j+1] = current 
        
    return A


In [31]:
# Caesar cipher

class CaesarCipher:

    LENGTH_ALPHABET = ord('Z') - ord('A') + 1

    def __init__(self, key=0):
        self.key = key    

    # Helper functions to encrypt/decrypt characters
    @staticmethod
    def encrypt_character(char, key):
        return chr(ord('A') + (ord(char) - ord('A') + key) % CaesarCipher.LENGTH_ALPHABET )
    @staticmethod
    def decrypt_character(char, key):
        return chr(ord('A') + (ord(char) - ord('A') - key) % CaesarCipher.LENGTH_ALPHABET )

    # String encryption
    def encrypt(self, string):
        return ''.join([CaesarCipher.encrypt_character(char, self.key) for char in string])
    
    def decrypt(self, string):
        return ''.join([CaesarCipher.decrypt_character(char, self.key) for char in string])

test = CaesarCipher(3)
test.encrypt('ABCYZ'), test.decrypt(test.encrypt('ABCYZ'))



('DEFBC', 'ABCYZ')

## Multidimensional arrays

In [6]:
# Don't
n = 4 # 4 rows
p = 3 # 3 columns

flawed_2d_list = [[0] * p] * n

# Do

correct_2d_list = [[0] * p for i in range(n)]

# Stacks

## Implementation

Use the existing Python's list class to implement a stack class with the **adapter** design pattern.

In [2]:
class Empty(Exception):
    '''Error: attempting to access an element from an empty data structure.'''
    pass

class Stack:
    def __init__(self):
        self._stack = []

    def is_empty(self):
        return len(self._stack) == 0

    def __len__(self):
        return len(self._stack)

    def __repr__(self):
        return ''.join(
            [str(self._stack[i])+'\n' for i in range(self.__len__()-1, -1, -1)]
            )
    
    def push(self, element):
        self._stack.append(element)

    def pop(self):
        if self.is_empty():
            raise Empty('Empty stack')
        return self._stack.pop()

    def top(self):
        if self.is_empty():
            raise Empty('Empty stack')
        return self._stack[-1]


In [20]:
# Matching delimiters with a stack

def consistent_delim(string):
    '''Returns True if parentheses/brackets/curly braces are consistently opened and closed in ``string``, False otherwise'''

    left_delim = '([{'
    right_delim = ')]}'

    stacked_delim = Stack()
    for char in string:
        if char in left_delim:
            stacked_delim.push(char)
        elif char in right_delim:
            try:
                if left_delim.index(stacked_delim.pop()) != right_delim.index(char):
                    return False
            except Empty:
                return False
    return stacked_delim.is_empty()

In [25]:
consistent_delim('(){}')

True

# Queues

## Implementation

Using ```.pop(0)``` is not an option because it has a O(n) complexity: every single element exept the one at index 0 has to be shifted to the left individually.

Another option would be to point dequeued indexes to ```None```, to append queued elements to 'the right' and to maintain an index of current first element's index. But this ends up with a potentially very long list to maintain (if many queues/dequeues are done), even for a not-that-long queue.

The recommended implementation is a circular list.

In [15]:
class Empty(Exception):
    '''Error: attempting to access an element from an empty data structure.'''
    pass

class Queue:

    INITIAL_CAPACITY = 10

    def __init__(self):
        self._data = [None] * Queue.INITIAL_CAPACITY
        self._size = 0
        self._front = 0

    def __len__(self):
        return self._size
    
    def __repr__(self):
        return ', '.join([str(self._data[(self._front + i) % len(self._data)]) for i in range(self._size)])

    def is_empty(self):
        return self._size == 0

    def first(self):
        if self.is_empty():
            raise Empty('Empty queue.')
        else:
            return self._data[self._front]

    def dequeue(self):
        if self.is_empty():
            raise Empty('Empty queue.')
        first, self._data[self._front] = self._data[self._front], None
        self._front = (self._front + 1) % len(self._data)
        self._size -= 1        
        return first
            

    def enqueue(self, element):
        if self._size == len(self._data):
            print('Resizing underlying list...')
            self.resize()
        back = (self._front + self._size) % len(self._data)
        self._data[back] = element
        self._size += 1

    def resize(self):
        new_data = [None] * (2 * len(self._data))
        for i in range(self._size):
            new_data[i] = self._data[(self._front + i) % len(self._data)]
        self._data = new_data
        self._front = 0

test = Queue()
test.enqueue(11)
print(test)
test.enqueue(33)
print(test)
test.enqueue(55)
print(test)
test.dequeue()
print(test)
test.dequeue()
print(test)

for i in range(2, 22, 2):
    test.enqueue(i)
    print(f'Enqueueing {i}: {test}')

test.dequeue()
test.dequeue()
print(f'Finally: {test}')

# Linked List

In [18]:
class LinkedList:

    # Node 'private' class:
    class _Node:
        def __init__(self, element, next = None):
            self._element = element
            self._next = next

        @property
        def element(self):
            return self._element

        @property
        def next_node(self):
            return self._next

        @next_node.setter
        def next_node(self, node):
            self._next = node
        
        def __repr__(self):
            return str(self.element)

    # Linked list:
    def __init__(self, values = None, side = 'head'):
        self._size = 0
        self._head = None
        self._tail = None

        if values is not None:
            self.add(values, side=side)

    def __iter__(self):
        node = self._head
        while node:
            yield node.element
            node = node.next_node
    
    def __len__(self):
        return self._size

    def __repr__(self):
        return ' -> '.join([str(element) for element in self])

    def is_empty(self):
        return self._size == 0

    @property
    def values(self):
        return [element for element in self]

    def _add(self, element, side = 'head'):
        if side == 'head' or side == 'tail':
            if self.is_empty():
                self._head = LinkedList._Node(element)
                self._tail = self._head
                self._size += 1
            else:
                if side == 'head':
                    self._head = LinkedList._Node(element, self._head)
                    self._size += 1            
                elif side == 'tail':
                    self._tail.next_node = LinkedList._Node(element)
                    self._tail = self._tail.next_node
                    self._size += 1
        else:
            raise Exception("""Value of `side` parameter is either 'tail' or 'head'""")
    
    def add(self, elements, side = 'head'):
        if not isinstance(elements, (list, tuple, dict, set)):
            elements = elements,
        for e in elements:
            self._add(e, side = side)
    
    def remove(self, side = 'head'):
        if self.is_empty():
            raise Exception("""The linked list is empty: no head to remove.""")
        if side == 'head':
            self._head = self._head.next_node
            self._size -= 1
            if self._size == 0: # Handle special case: removing from a 1-element list
                self._tail = self._head
        elif side == 'tail':
            print('Tail not removed!\
                Removing the tail is not implemented in this singly linked list class,\
                because this operation would require the traversal of the entire singly linked list.\
                \nSee the doubly linked lists implementation')
        else:
            raise Exception("""Value of `side` parameter is either 'tail' or 'head'""")
    
    def head(self):
        return self._head.element

test = LinkedList()
test.add(1)
test.add([2, 3])
test.add(0)
test.remove()
test

3 -> 2 -> 1

In [28]:
test.add(7)

In [33]:
test._head.next_node

## Implementing a stack with a Singly Linked List

In this case, all operations have a worst-case O(1) running time, whereas the first implementation relying on the standard ```list``` class has amortized O(1) complexity.

In [36]:
class Empty(Exception):
    '''Error: attempting to access an element from an empty data structure.'''
    pass

class Stack2:
    def __init__(self):
        self._stack = LinkedList()

    def is_empty(self):
        return self._stack.is_empty()

    def __len__(self):
        return len(self._stack)

    def __repr__(self):
        return ''.join(
            [str(value)+'\n' for value in self._stack.values]
            )
    
    def push(self, element):
        self._stack.add(element)

    def pop(self):
        if self.is_empty():
            raise Empty('Empty stack')
        top = self.top()
        self._stack.remove()
        return top

    def top(self):
        if self.is_empty():
            raise Empty('Empty stack')
        return self._stack.head()

In [42]:
test = Stack2()
len(test)
test.push(3)
test.push(2)
test.push(1)
print(f'Stack:\n{test}')
print(f'.pop(): {test.pop()}, .top(): {test.top()}\n')
print(f'Stack:\n{test}')

Stack:
1
2
3

.pop(): 1, .top(): 2

Stack:
2
3

