
# Combinatorial optimization 

The objective of combinatorial optimization is to select the most optimal subset of elements from a given set in order to obtain the optimal result of a function, typically through minimizing or maximizing its value.

# Greedy algorithm

A greedy algorithm is any algorithm that solves a problem by making the locally optimal choice at each stage. Greedy algorithms are known to produce suboptimal results on many problems

## Example1:

Some country have n denominations $  a_1 = 1 < a_2 < ... < a_n$ of coins. We need to collect the amount of money S with the least number of coins.

- take the largest possible number of coins of the highest denomination: $k_n$ of $a_n$ coins:

$k_n=floor(S/a_n)$

$S' = S - a_n*k_n$

- how many of $a_{n-1}$ coins we need for $S'$?: 

$k_{n-1}=floor(S'/a_{n-1})$

- continue to $a_1$

Result $ k = k_n + k_{n-1} +...$

This greedy algorithm is not optimal:

Example: $S = 24, a_1 = 1, a_2 = 5, a_3 = 7$

Result: 3 of $a_3$; 3 of $a_1$

Optimal: 2 of $a_3$; 2 of $a_2$

## Example2:

However, for some greedy algorithm we can proove that they return an optimal solution.

We have a set of intervals $[a_i, b_i]$ that covers an interval $[0, X]$ without gaps. The task is to find a subset with minimum number of intervals that still covers $[0, X]$ without gaps.

- We arrange the set of intervals in ascending order of the left end $a_i$

- Among the intervals covering 0, we find one with the largest value of the right end $b_i$ - $[c_1, d_1]$

- Among the intervals covering $d_1$, we find one with the largest value of the right end $b_i$ - $[c_2, d_2]$

- ...

Result: $[c_1, d_1]$, $[c_2, d_2]$, ... - optimal


## Example2, Knapsack problem:

Given a set of items N, each with a weight $w_i$ and a value $c_i$, determine which items to include in the collection so that the total weight is less than or equal to a given limit L and the total value is as large as possible.

![Knapsack](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/Knapsack.svg/500px-Knapsack.svg.png)

Greedy algorithm:

- for each item calculate the ratio $c_i/w_i$

- sort items by $c_i/w_i$ in descending order

- put the first item from the sorted array with $w_i < L$

- continue to dequeue items from the sorted array while $w_{i1} + w_{i2} + ... < L$

This greedy algorithm is not optimal:

L=90

$w_1 = 20$;  $c_1 = 60$; $c_1/w_1 = 3$

$w_2 = 30$;  $c_2 = 90$; $c_2/w_2 = 3$

$w_3 = 50$;  $c_3 = 100$; $c_3/w_3 = 2$

Greedy algorithm: $c_1 + c_2 = 150$

Optimal: $c_2 + c_3 = 190$

# Dynamic programming

dividing the problem into simpler subtasks. Each subtask is solved only once to reduce computation time

- split the task into subtasks
- cache the results of solving subtasks
- delete unused results of solving subtasks

- Top-down approach: the task is divided into smaller subtasks, they are solved and combined to solve the original problem. Memorization is used to solve frequently occurring subtasks.


- Bottom-up approach: all subtasks that may be needed to solve the original problem are calculated in advance and then used to build a solution to the original problem. However, sometimes it is not easy to find out which subtasks we need to solve.

## Fibonacci sequence using dynamic programming

To implement the bottom-up approach, it is necessary to calculate all Fibonacci numbers from 1 to n

![Title](https://www.baeldung.com/wp-content/uploads/sites/4/2020/06/Fibonacci-bottom-up-1.svg)

Top-down approach is better.

$F(n) = F(n-1)  + F(n-2) $

Moreover some calculations can be reused.

![Title](https://www.baeldung.com/wp-content/uploads/sites/4/2020/06/Fibonacci-memoization.svg)

In [5]:
def F(n):
    dp = {}
    dp[0] = 1
    dp[1] = 1
    return calculate_F(n, dp)

def calculate_F(n, dp):
    print(f'n = {n}')
    if n not in dp:
        print(f'n-1 = {n-1}')
        print(f'n-2 = {n-2}')
        dp[n] = calculate_F(n-1, dp) + calculate_F(n-2, dp)
    print(f'new dp = {dp}')
    return dp[n]

In [6]:
F(7)

n = 7
n-1 = 6
n-2 = 5
n = 6
n-1 = 5
n-2 = 4
n = 5
n-1 = 4
n-2 = 3
n = 4
n-1 = 3
n-2 = 2
n = 3
n-1 = 2
n-2 = 1
n = 2
n-1 = 1
n-2 = 0
n = 1
new dp = {0: 1, 1: 1}
n = 0
new dp = {0: 1, 1: 1}
new dp = {0: 1, 1: 1, 2: 2}
n = 1
new dp = {0: 1, 1: 1, 2: 2}
new dp = {0: 1, 1: 1, 2: 2, 3: 3}
n = 2
new dp = {0: 1, 1: 1, 2: 2, 3: 3}
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5}
n = 3
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5}
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8}
n = 4
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8}
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8, 6: 13}
n = 5
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8, 6: 13}
new dp = {0: 1, 1: 1, 2: 2, 3: 3, 4: 5, 5: 8, 6: 13, 7: 21}


21

## Knapsack problem:

We can find an optimal solution if we assume that all weights are integers.

- There is an unlimited number of items with a weight of $w_i$, i=1..n

$dp[w]$ - the maximum value of items that can be placed in a knapsack with a weight limit of w

$dp[0] = 0$

for each w_i type of items, for each possible weight w in 1:L:
    
calculate $dp[w] = max(c_i - dp[w-w_i])$

$T=O(L*n)$

- There is only one item with lable i, i = 1..n

$dp[i, w]$ - the maximum value of items that can be placed in a knapsack with a weight limit of w, if we use items with lables from 1 to i

$dp[0, w]$  = 0

if $w_i > w$:

$dp[i, w] = dp[i-1, w]$

else:

$dp[i, w] = max_i(dp[i-1, w], dp[i-1, w - w_i] + c_i)$ - maximum of using or not using i item in a knapsack

$T=O(L*n)$

(here $v_i = c_i$):

![Alignment](https://upload.wikimedia.org/wikipedia/commons/d/dc/Knapsack_problem_dynamic_programming.gif)


## Needleman–Wunsch algorithm

- global alignment algorithm

- we assign a total score to each alignment variant


The calculation of the score is based on the scores of each individual pair of letters

- Match: The two letters at the current index are the same.

- Mismatch: The two letters at the current index are different.

- Indel (Insertion or Deletion): The best alignment involves one letter aligning to a gap in the other string.

Simple scoring system:
- Match: +1
- Mismatch or Indel: −1


GCATGCG

GATTACA



0) 
- To fill the first row: we move horizontally, this means that an insertion occurred in the first sequence, we add a penalty for indel to the previous value
- To fill the first column: we move vertically, this means that an insertion occurred in the second sequence, we add a penalty for indel to the previous value


1) To build a dynamic programming table we move from the top-left cell to the bottom-right:

![Alignment](img/needleman_build.png)

2) To get the best alignment we move from the bottom-right cell to the top-left:


![Alignment](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Needleman-Wunsch_pairwise_sequence_alignment.png/440px-Needleman-Wunsch_pairwise_sequence_alignment.png)




GCATG-CG

G-ATTACA

Score = 0