# Algorithms and Data Structures in Python

## 1. Python Primer

### Miscellaneous notes:

* The keys of ```dict``` are ordered since Python 3.6
* > The ```and``` and ```or``` operators *short-circuit*, in that they do not evaluate the second
operand if the result can be determined based on the value of the ﬁrst operand
* ```a is b``` checks if identifiers ```a``` and ```b``` are aliases of the *same* object, whereas ```a == b``` checks of the objects identified by both identifies are deemed to be equivalent
* Operators can be used with sets: ```<```/```<=```, ```|```, ```&```, ```^``` and ```-```
* ```list += [4, 5]``` extends ```list``` whereas ```list = list + [4, 5]``` reassign ```list``` to a new list

### Generators

An interesting example with multiple ```yield``` statements:

In [1]:
def factors(n):
    k=1
    while k * k < n:
        if n % k ==0:
            yield k
            yield n//k
        k += 1
    if k * k == n:
        yield k

list(factors(100))

[1, 100, 2, 50, 4, 25, 5, 20, 10]

### Comprenhension

Not for lists only!

In [1]:
# Generators for squares up to integer n:

n = 100

lc = [k*k for k in range(1, n+1)] # List comprehension
sc = {k*k for k in range(1, n+1)} # Set comprehension
gc = (k*k for k in range(1, n+1)) # Generator comprehension
dc = {k: k*k for k in range(1, n+1)}  # Dictionary comprehension

# 2. OOP

General principles:
* Modularity (decomposition in separate functional units)
* Abstraction, e.g. Abstract Data Types (ADT) which specifies **what** operations do, not **how**
* Encapsulation

*Getters*/*Setters* are not really considered to pythonic.

## Testing:

* *Stubbing*: in a top-down approach, replace the output of a function B called inside function A by a fixed value.
* Unit testing is actually a bottom-up strategy

## Miscellaneous

```__slots__```: see [here](https://stackoverflow.com/questions/472000/usage-of-slots)

# 3. Complexity

## Prefix averages

In [10]:
# Quadratic complexity

def prefix_average_1_0(S):
    averages = []
    for j in range(len(S)):
        total = 0
        for i in range(j+1):
            total += S[i]
        averages.append(total/(j+1))
    return averages

def prefix_average_1_1(S):
    averages = []
    for j in range(len(S)):
        averages.append(sum(S[:j+1])/(j+1))
    return averages

def prefix_average_2(S):
    averages = []
    sum = 0
    for j in range(len(S)):
        sum = sum + S[j]
        averages.append(sum/(j+1))
    return averages

prefix_average_1_0([1, 1, 2]), prefix_average_1_1([1, 1, 2]), prefix_average_2([1, 1, 2]), 

In [18]:
def disjoint_1(A,B, C):
    for a in A:
        for b in B:
            for c in C:
                if a == b == c:
                    return False
    return True

def disjoint_2(A,B, C):
    for a in A:
        for b in B:
            if a == b:
                for c in C:
                    if a == c:
                        return False
    return True

disjoint_1((1,2,3), (4,1,6), (1,8,9)), disjoint_2((1,2,3), (4,1,6), (1,8,9))

(False, False)

# 4. Recursion

In [7]:
# Binary search

def sequential_search(data, target):
    i = 0
    while i < len(data):
        if data[i] == target:
            return i
        i += 1
    return -1

def binary_search(data, target, low, high):
    if low > high:
        return -1
    else:
        mid = (low + high) // 2
        if data[mid] == target:
            return mid
        else:
            pass

In [13]:
sequential_search([1, 2, 3, 4], 3)

2

# Google foo.bar

## 2

### 2.1

In [49]:
def solution(x, y):
    '''
    General idea:
    If x+y=n+1, then point (x,y)) is on the nth diagonal (! Edge case: there is no 0th diagonal !).
    The first ID of this nth diagonal is [1+2+...+(n-1)], then + 1 if x=1, or + 2 if x=2 and so on.
    So, actual ID of point (x,y) is 1+...+(n-1)+x
    Plugging in x+y=n+1, ID = 1 + ... + (x+y-2) + x
    
    Edge case:
    (x,y)=(1,1)
    '''
    if x+y==2:
        return '1'
    elif x > 1e5:
        raise ValueError('x value is too large')
    elif y > 1e5: 
        raise ValueError('y value is too large')
    else:
        return str(int(x+(x+y-2)*(x+y-1)/2))

solution(5, 10)

### 2.2

In [59]:
def solution(n, b):
    def convert_to_base(int_in, k):
        '''
        Assume `int_in` can be written on `k` digits or less,
        i.e `int_in` is strictly less than b^`k`.
        Then start by splitting `int_in` in parcels of b^(`k`-1).
        Count how many parcels you have: that is the first digit.
        Compute the rest, then iterate with lower values of `k`, down until `k` is 0.
        '''
        int_out = ''
        rest = int_in
        for i in range(k-1, -1, -1):
            quotient = rest // b**i # since this is 0 if rest < b**i, this conveniently adds leading zeros too
            int_out = int_out + str(quotient)
            rest = rest - quotient * b**i
        return int_out
        
    def get_next_id(n):
        '''The process described in 4 steps'''
        digits = sorted(n)
        y = int(''.join(digits), base = b)
        digits.reverse() # reverses in-place and avoids sorting again
        x = int(''.join(digits), base = b)
        z = convert_to_base(x - y, len(n))
        return z
    
    # Find out the length of the cycle
    trajectory = [n]
    while True: # Maths guarantee there'll be a loop... Right?
        new_id = get_next_id(trajectory[-1])
        if new_id in trajectory:
            return len(trajectory) - trajectory.index(new_id)
        else:
            trajectory.append(new_id)

solution('1211', 10)