
# Greedy algorithm

**A greedy algorithm** is any algorithm that solves a problem by making the locally optimal choice at each stage. Greedy algorithms are known to produce suboptimal results for many problems.

## Combinatorial optimization 

The objective of **combinatorial optimization** problems is to select the most optimal subset of elements from a given set in order to obtain the optimal result of a function, typically by minimizing or maximizing its value.

## Example1:

Some country have $n$ denominations of coins:
$  a_1 = 1 < a_2 < ... < a_n$.

We need to collect the amount of money $S$ with the least number of coins.

**Greedy algorithm steps**:

1) Take the largest possible number of coins of the highest denomination $a_n$:

$k_n=floor(S/a_n)$

$S' = S - a_n \cdot k_n$

2) How many of $a_{n-1}$ coins do we need for $S'$?: 

$k_{n-1}=floor(S'/a_{n-1})$

3) Continue to $a_1$

Result: $ k = k_n + k_{n-1} +...$

This greedy algorithm is not optimal.

Example: $S = 24, a_1 = 1, a_2 = 5, a_3 = 7$

Result: 3 of $a_3$; 3 of $a_1$: $24 = 3 \cdot 7+3 \cdot 1$

Optimal solution: 2 of $a_3$; 2 of $a_2$: $24 = 2 \cdot 7+2 \cdot 5$

## Example2:

However, for some greedy algorithms, we can prove that they return an optimal solution.

We have a set of intervals $[a_i, b_i]$ that cover an interval $[0, X]$ without gaps. The task is to find a subset with the  minimum number of intervals that still cover $[0, X]$ without gaps.

1) We arrange the set of intervals in ascending order of the left end $a_i$

2) Among the intervals covering 0, we find one with the largest value of the right end $b_i$ - $[c_1, d_1]$

3) Among the intervals covering $d_1$, we find one with the largest value of the right end $b_i$ - $[c_2, d_2]$

...

4) Stop when $d_n \geq X$

Result: $[c_1, d_1]$, $[c_2, d_2]$, ... - optimal


## Example3, Knapsack problem:

Given a set of items $N$, each with a weight $w_i$ and a value $c_i$, determine which items to include in the collection so that the total weight is less than or equal to a given limit L and the total value is as large as possible.

![Knapsack](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/Knapsack.svg/500px-Knapsack.svg.png)

Greedy algorithm:

1) for each item calculate the ratio $c_i/w_i$

2) sort items by $c_i/w_i$ in descending order

3) put the first item from the sorted array with $w_i < L$

4) continue to dequeue items from the sorted array while $w_{i1} + w_{i2} + ... < L$

This greedy algorithm is not optimal:

L=90

$w_1 = 20$;  $c_1 = 60$; $c_1/w_1 = 3$

$w_2 = 30$;  $c_2 = 90$; $c_2/w_2 = 3$

$w_3 = 50$;  $c_3 = 100$; $c_3/w_3 = 2$

Greedy algorithm: $c_1 + c_2 = 150$

Optimal: $c_2 + c_3 = 190$

# Dynamic programming

We divide the problem into simpler subtasks. Each subtask is solved only once to reduce computation time

- split the task into subtasks
- cache the results of solving subtasks
- delete unused results of solving subtasks

**Top-down approach**: the task is divided into smaller subtasks, they are solved and combined to solve the original problem. Memorization is used to solve frequently occurring subtasks.


**Bottom-up approach**: all subtasks that may be needed to solve the original problem are calculated in advance and then used to build a solution to the original problem. However, sometimes it is not easy to find out which subtasks we need to solve.

## Fibonacci sequence using dynamic programming

To implement the bottom-up approach, it is necessary to calculate all Fibonacci numbers from 1 to $n$

![Title](https://www.baeldung.com/wp-content/uploads/sites/4/2020/06/Fibonacci-bottom-up-1.svg)

Top-down approach is better.

$F(n) = F(n-1)  + F(n-2) $

Moreover some calculations can be reused.

![Title](https://www.baeldung.com/wp-content/uploads/sites/4/2020/06/Fibonacci-memoization.svg)

In [5]:
def F(n):
    dp = {}
    dp[0] = 1
    dp[1] = 1
    return calculate_F(n, dp)

def calculate_F(n, dp):
    print(f'n = {n}')
    if n not in dp:
        print(f'n-1 = {n-1}')
        print(f'n-2 = {n-2}')
        dp[n] = calculate_F(n-1, dp) + calculate_F(n-2, dp)
    print(f'new dp = {dp}')
    return dp[n]

In [6]:
F(7)

n = 7
n-1 = 6
n-2 = 5
n = 6
n-1 = 5
n-2 = 4
n = 5
n-1 = 4
n-2 = 3
n = 4
n-1 = 3
n-2 = 2
n = 3
n-1 = 2
n-2 = 1
n = 2
n-1 = 1
n-2 = 0
n = 1
new dp = {0: 1, 1: 1}
n = 0
new dp = {0: 1, 1: 1}
new dp = {0: 1, 1: 1, 2: 2}
n = 1
new dp = {0: 1, 1: 1, 2: 2}
new dp = {0: 1, 1: 1, 2: 2, 3: 3}
n = 2
new dp = {0: 1, 1: 1, 2: 2, 3: 3}
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5}
n = 3
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5}
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8}
n = 4
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8}
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8, 6: 13}
n = 5
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8, 6: 13}
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8, 6: 13, 7: 21}


21

## Knapsack problem:

We can find an optimal solution using dynamic programming if we assume that all weights are integers.

**I. There is an unlimited number of items with a weight of $w_i$, i=1..n**

$dp[w]$ - the maximum total cost of items that can be placed in a knapsack with a weight limit of $w$

1) $dp[0] = 0$

2) If we use $w_i$ type item, then the rest of the items in the knapsack weighs $w-w_i$ and costs dp[w-w_i]. The whole knapsack costs $c_i + dp[w-w_i]$. We iterate through all available item types and try to use each type in the knapsack. Calculate $dp[w] = max(c_1 + dp[w-w_1], ... , c_n + dp[w-w_n])$ for each $w_i$ type of items, for each possible weight $w$ in 1:L:

$T=O(L*n)$

**II. There is only one item with lable i, i = 1..n**

$dp[i, w]$ - the maximum total cost of items that can be placed in a knapsack with a weight limit of $w$, if we use items with lables from 1 to i

1) $dp[0, w]$  = 0

2) If $w_i > w$ - we can't use item i: $dp[i, w] = dp[i-1, w]$

3) Else if $w_i \leq w$: $dp[i, w] = max(dp[i-1, w], dp[i-1, w - w_i] + c_i)$ - maximum cost of using or not using i item in a knapsack

$T=O(L*n)$

(here $v_i = c_i$):

![Alignment](https://upload.wikimedia.org/wikipedia/commons/d/dc/Knapsack_problem_dynamic_programming.gif)


## Edit distance (Levenshtein distance)

Distance between two sequences X and Y = minimum number of edits (substitutions, insertions, deletions) needed to turn one into other.

![Alignment](img/edit_dist.png)

$editDistance(X,Y) \geq ||X|-|Y||$

We can calculate edit distances of prefixes of X and Y

![Alignment](img/edit_dist_prefix.png)


In [1]:
# Implementation using recursion
def edit_dist(a, b):
    if len(a) == 0:
        return len(b)
    if len(b) == 0:
        return len(a)
    d = 0 if a[-1] == b[-1] else 1
    return min(edit_dist(a[:-1], b[:-1]) + d,
               edit_dist(a[:-1], b) + 1,
               edit_dist(a, b[:-1]) + 1)

In [4]:
edit_dist('ABBC', 'ADC')

2

We can create dynamic programming matrix with edit distances of the corresponding prefixes.

![Alignment](img/edit_distance_dp.png)


In [5]:
# Implementation using dynamic programming
def edit_dist(a, b):
    dp = []
    for i in range(len(a)+1):
        dp.append([0]*(len(b)+1))
    for i in range(len(a)+1):
        dp[i][0]=i
    for i in range(len(b)+1):
        dp[0][i]=i
    for i in range(len(a)+1):
        for j in range(len(b)+1):
            d = 0 if a[i-1] == b[j-1] else 1
            dp[i][j] = min(dp[i-1][j-1] + d,
                           dp[i][j-1] + 1,
                           dp[i-1][j] + 1)
    return dp[-1][-1]

In [6]:
edit_dist('GCATGCG', 'GATTACA')

5