# Introduction to Greedy Algorithms

Adopted from: https://github.com/rasbt/algorithms_in_ipython_notebooks

Extra material from: https://www.freecodecamp.org/news/what-is-a-greedy-algorithm/


![Burns](https://upload.wikimedia.org/wikipedia/en/5/56/Mr_Burns.png) | ![Gordon](https://upload.wikimedia.org/wikipedia/en/4/40/Gordon_Gekko.jpg)
From: https://en.wikipedia.org/wiki/Wall_Street_(1987_film)

The subfamily of *Greedy Algorithms* is one of the main paradigms of algorithmic problem solving next to *Dynamic Programming* and *Divide & Conquer Algorithms*. The main goal behind greedy algorithms is to implement an efficient procedure for often computationally more complex, often infeasible brute-force methods such as exhaustive search algorithms. 

The main outline of a greedy algorithms consists of 3 steps:

- make a greedy choice
- reduce the problem to a subproblem (a smaller problem of the similar type as the original one)
- repeat

So, greedy algorithms are essentially a problem solving heuristic, an iterative process of tackling a problem in multiple stages while making an locally optimal choice at each stage. In practice, and depending on the problem task, making this series of locally optimal ("greedy") choices must not necessarily lead to a globally optimal solution.

![Greedy Mountain](https://upload.wikimedia.org/wikipedia/commons/8/8c/Greedy-search-path-example.gif)

# What happens when we have a greedy algorithm to look for the highest peak
# greedy rule here is - highest slope

![LocalMax](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/Greedy_Glouton.svg/300px-Greedy_Glouton.svg.png)


# Formal Definition of Greedy Algorithm 

Assume that you have an objective function that needs to be optimized (either maximized or minimized) at a given point. A Greedy algorithm makes greedy choices at each step to ensure that the objective function is optimized. The Greedy algorithm has only one shot to compute the optimal solution so that it never goes back and reverses the decision.

## Greedy algorithms have some advantages and disadvantages:

* It is quite easy to come up with a greedy algorithm (or even multiple greedy algorithms) for a problem. Analyzing the run time for greedy algorithms will generally be much easier than for other techniques (like Divide and conquer). For the Divide and conquer technique, it is not clear whether the technique is fast or slow. This is because at each level of recursion the size of gets smaller and the number of sub-problems increases.
* The difficult part is that for greedy algorithms you have to work much harder to understand correctness issues. Even with the correct algorithm, it is hard to prove why it is correct. Proving that a greedy algorithm is correct is more of an art than a science. It involves a lot of creativity. Usually, coming up with an algorithm might seem to be trivial, but proving that it is actually correct, is a whole different problem.

## Example 1: Coin Changer

To look at a first, naive example of a greedy algorithm, let's implement a simple coin changing machine. Given a money value in cents, we want to return the minimum possible number of coins, whereas the possible denominations are 1, 5, 10, and 20 cents.

![Coin](https://upload.wikimedia.org/wikipedia/commons/thumb/d/da/Greedy_algorithm_36_cents.svg/1200px-Greedy_algorithm_36_cents.svg.png)

In [7]:
def coinchanger(cents, 
                denominations=(1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000),
                debug=False):
    coins = {d: 0 for d in denominations}  # create a dictionary of coins (hashmap) so we have O(1) access to coin counts
    for c in sorted(coins.keys(), reverse=True):  # of course remember that sorting in general is n log n, here we are sorting denominations
        # so we try to put the highest denomination first
        coins[c] += cents // c # here is the greedy step we add as many of our highest coins at this moment
        # note // is integer division in python3
        cents = cents % c # proceed to whatever is left
        if debug:
            print(cents, coins)
        if not cents:
            total_coins = sum([i for i in coins.values()])
            return sorted(coins.items(), reverse=True), total_coins

The funtion above creates a dictionary, `coins`, which tracks the number of coins of each denomination to be returned. Then, we iterate though the denominations in descending order of their value. Now, in a "greedy" fashion, we count the highest possible number of coins from the largest denomination in the first step. We repeat this process until we reached the smallest denomination or the number of remaining `cents` reaches 0. 

In [2]:
coinchanger(cents=36)

([(2000, 0),
  (1000, 0),
  (500, 0),
  (200, 0),
  (100, 0),
  (50, 0),
  (20, 1),
  (10, 1),
  (5, 1),
  (2, 0),
  (1, 1)],
 4)

In [5]:
coinchanger(cents=36, debug=True)

36 {1: 0, 2: 0, 5: 0, 10: 0, 20: 0, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
36 {1: 0, 2: 0, 5: 0, 10: 0, 20: 0, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
36 {1: 0, 2: 0, 5: 0, 10: 0, 20: 0, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
36 {1: 0, 2: 0, 5: 0, 10: 0, 20: 0, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
36 {1: 0, 2: 0, 5: 0, 10: 0, 20: 0, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
36 {1: 0, 2: 0, 5: 0, 10: 0, 20: 0, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
16 {1: 0, 2: 0, 5: 0, 10: 0, 20: 1, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
6 {1: 0, 2: 0, 5: 0, 10: 1, 20: 1, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
1 {1: 0, 2: 0, 5: 1, 10: 1, 20: 1, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
1 {1: 0, 2: 0, 5: 1, 10: 1, 20: 1, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}
0 {1: 1, 2: 0, 5: 1, 10: 1, 20: 1, 50: 0, 100: 0, 200: 0, 500: 0, 1000: 0, 2000: 0}


([(2000, 0),
  (1000, 0),
  (500, 0),
  (200, 0),
  (100, 0),
  (50, 0),
  (20, 1),
  (10, 1),
  (5, 1),
  (2, 0),
  (1, 1)],
 4)

In [8]:
coinchanger(cents=379)  

([(2000, 0),
  (1000, 0),
  (500, 0),
  (200, 1),
  (100, 1),
  (50, 1),
  (20, 1),
  (10, 0),
  (5, 1),
  (2, 2),
  (1, 0)],
 7)

In [9]:
coinchanger(cents=4356) # 43 euros and 56 cents

([(2000, 2),
  (1000, 0),
  (500, 0),
  (200, 1),
  (100, 1),
  (50, 1),
  (20, 0),
  (10, 0),
  (5, 1),
  (2, 0),
  (1, 1)],
 7)

In [None]:
# lets imagine a country with 1 7 10 dobloon coins

In [12]:
coinchanger(cents=41, denominations=(1,7,10)) # this is where greedy algorithm would fail

([(25, 1), (20, 0), (10, 1), (5, 1), (1, 1)], 4)

In [13]:
# soviet roubles - greedy works
# hypothesis - greed will fail if next denomination is less than double ? prove it?
coinchanger(cents=47, denominations=(1,3,5,10,25))

([(25, 1), (10, 2), (5, 0), (3, 0), (1, 2)], 5)

In [14]:
# fake roubles 1, 4, 5 # where our greedy algorithm would NOT give us optimal solution
coinchanger(cents=8, denominations=(1,4,5))

([(5, 1), (4, 0), (1, 3)], 4)

In [15]:
# Indian money (5,10,20,25) - example of real money failing on greedy algorithm
coinchanger(cents=40, denominations=(5,10,20,25))

([(25, 1), (20, 0), (10, 1), (5, 1)], 3)

Calling out `coinchanger` function with 100 cents as input, it returns 5 coins a 20 cents, the smallest, possible number of coins that can be returned in this case. Below are some more examples: 

In [None]:
coinchanger(cents=5)

([(2000, 0),
  (1000, 0),
  (500, 0),
  (100, 0),
  (50, 0),
  (20, 0),
  (10, 0),
  (5, 1),
  (1, 0)],
 1)

In [None]:
coinchanger(cents=4)

([(2000, 0),
  (1000, 0),
  (500, 0),
  (100, 0),
  (50, 0),
  (20, 0),
  (10, 0),
  (5, 0),
  (1, 4)],
 4)

In [None]:
coinchanger(cents=23)

([(2000, 0),
  (1000, 0),
  (500, 0),
  (100, 0),
  (50, 0),
  (20, 1),
  (10, 0),
  (5, 0),
  (1, 3)],
 4)

In [None]:
## TODO find another country (or previous) where greedy algorithm fails



# Interval Scheduling Problem

Let's dive into an interesting problem that you can encounter in almost any industry or any walk of life. Some instances of the problem are as follows:

You are given a set of N schedules of lectures for a single day at a university. The schedule for a specific lecture is of the form (s time, f time) where s time represents the start time for that lecture and similarly the f time represents the finishing time. Given a list of N lecture schedules, we need to select maximum set of lectures to be held out during the day such that  none of the lectures overlap with one another i.e. if lecture Li and Lj are included in our selection then the start time of j >= finish time of i or vice versa .



# The Lecture Scheduling Problem
Let's look at the various approaches for solving this problem.

## Earliest Start Time First  

i.e. select the interval that has the earliest start time. Take a look at the following example that breaks this solution. This solution failed because there could be an interval that starts very early but that is very long. This means the next strategy that we could try would be where we look at smaller intervals first.

Example of earliest start time first failing as greedy algorithm:

- 9:00 - 16:00 - Lecture 1
- 10:00 - 12:00 - Lecture 2
- 12:00 - 13:00 - Lecture 3
- 13:00 - 14:00 - Lecture 4

You can see how the earliest start time first algorithm would fail here. It would select lecture 1 and then it would not be able to select any other lecture. So, let's look at the next strategy.

Note: we are assuming these lecturers are all equally important and we are not looking at the length of the lecture.

## Smallest Interval First 

 i.e. you end up selecting the lectures in order of their overall interval which is nothing but their  finish time - start time . Again, this solution is not correct. Look at the following case.

![sml](https://cdn-media-1.freecodecamp.org/imgr/4bz2N.png)

### Example of shortest interval first failing as greedy algorithm:

- 9:00 - 12:00 - Lecture 1
- 11:30 - 12:30   - Lecture 2
- 12:00 - 14:00 - Lecture 3

Again, you can see how the shortest interval first algorithm would fail here. It would select lecture 2 and then it would not be able to select any other lecture. So, let's look at the next strategy.

## Longest interval first

i.e. you end up selecting the lectures in order of their overall interval which is nothing but their  finish time - start time . Again, this solution is not correct. Look at the following case.

### Example of longest interval first failing as greedy algorithm:

- 9:00 - 12:00 - Lecture 1
- 9:30 - 10:30   - Lecture 2
- 11:30 - 13:00 - Lecture 3

Here you can see how the longest interval first algorithm would fail here. It would select lecture 1 and then it would not be able to select any other lecture. So, let's look at the next strategy.

## Least Conflicting Interval First 

 i.e. you should look at intervals that cause the least number of conflicts. Yet again we have an example where this approach fails to find an optimal solution.

![c](https://cdn-media-1.freecodecamp.org/imgr/5LZ9V.png)

The diagram shows us that the least confliciting interval is the one in the middle with just 2 conflicts. After that we can only pick the two intervals at the very ends with conflicts 3 each. But the optimal solution is to pick the 4 intervals on the topmost level.

## Earliest Finishing time first. 

This is the approach that always gives us the most optimal solution to this problem. We derived a lot of insights from previous approaches and finally came upon this approach. We sort the intervals according to increasing order of their finishing times and then we start selecting intervals from the very beginning. Look at the following pseudo code for more clarity.

    function interval_scheduling_problem(requests)
        schedule \gets \{\}
        while requests is not yet empty
            choose a request i_r \in requests that has the lowest finishing time
            schedule \gets schedule \cup \{i_r\}
            delete all requests in requests that are not compatible with i_r
        end
        return schedule
    end

In [19]:
# let's make a lecture class that will hold starting time, finishing time and a name
# we will have a list of lectures and we want to find the maximum number of lectures we can attend
class Lecture:
    def __init__(self, name, start, finish):
        self.name = name
        self.start = start
        self.finish = finish

    def __repr__(self):
        return f"Lecture {self.name} START:{self.start} FINISH:{self.finish}"

# example for time we will us integers
# 9:00 am will be 900, 5:30 pm will be 1730
lectures = [Lecture("A", 900, 1000), 
            Lecture("B", 930, 1030), 
            Lecture("C", 1100, 1200), 
            Lecture("D", 1130, 1230), 
            Lecture("E", 1200, 1300), 
            Lecture("F", 1230, 1330),
            Lecture("G", 1315, 1500),]

for lecture in lectures:
    print(lecture)

Lecture A START:900 FINISH:1000
Lecture B START:930 FINISH:1030
Lecture C START:1100 FINISH:1200
Lecture D START:1130 FINISH:1230
Lecture E START:1200 FINISH:1300
Lecture F START:1230 FINISH:1330
Lecture G START:1315 FINISH:1500


In [21]:
# type hint should be sequence of Lecture objects
# we might have to import typing sequence # TODO

def get_schedule(lectures: list[Lecture], debug=False) -> list[Lecture]:
    """
    lectures should be a sequence of Lecture objects
    we will return a list of lectures that we can attend
    goal being maximum number of lectures
    """
    if len(lectures) == 0:
        return []
    # sort by end time (2nd item in the tuple)
    # this will O(n log n)
    lectures = sorted(lectures, key = lambda lec: lec.finish)
    if debug:
        print("Lectures sorted by finish time")
        print(*lectures, sep="\n")
    schedule = [lectures[0]] # we start with the lecture with the earliest end time
    for lecture in lectures[1:]:
        # if the start time of current lecture is more or equal to end time of last lecture we add it
        if lecture.start >= schedule[-1].finish:
            schedule.append(lecture)
    return schedule

get_schedule(lectures, debug=True)


Lectures sorted by finish time
Lecture A START:900 FINISH:1000
Lecture B START:930 FINISH:1030
Lecture C START:1100 FINISH:1200
Lecture D START:1130 FINISH:1230
Lecture E START:1200 FINISH:1300
Lecture F START:1230 FINISH:1330
Lecture G START:1315 FINISH:1500


[Lecture A START:900 FINISH:1000,
 Lecture C START:1100 FINISH:1200,
 Lecture E START:1200 FINISH:1300,
 Lecture G START:1315 FINISH:1500]

In [22]:
some_lectures = [Lecture("A", 900, 1000),Lecture("B", 815, 1200), Lecture("C", 1100, 1200)]

In [23]:
get_schedule(some_lectures)

[Lecture A START:900 FINISH:1000, Lecture C START:1100 FINISH:1200]

## Proving that earliest finishing time first is optimal

Left as an exercise to the reader. Hint: use proof by contradiction.

* we can use exchange argument to prove that the greedy algorithm is optimal
* can use induction to prove that the greedy algorithm is optimal

Again not always easy to prove that the greedy algorithm is optimal. But it is easy to come up with a greedy algorithm.

## Different scheduling problem - maximize the length of the schedule

Turns out that the greedy algorithm is not optimal for this problem. The optimal solution is to select the two intervals with the least conflicts. But the greedy algorithm would select the two intervals with the earliest start time.

We would have to use dynamic programming to solve this problem. Later in the course we will see how to solve this problem using dynamic programming.

## When to use Greedy Algorithms


Greedy Algorithms can help you find solutions to a lot of seemingly tough problems. The only problem with them is that you might come up with the correct solution but you might not be able to verify if its the correct one. All the greedy problems share a common property that a local optima can eventually lead to a global minima without reconsidering the set of choices already considered.

Source: https://www.freecodecamp.org/news/what-is-a-greedy-algorithm/

## Example: Knapsack

![Knap](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/Knapsack.svg/500px-Knapsack.svg.png)

Now, let us take a look at a classic, combinatorial optimization problem, the so-called "knapsack" problem. Here, we can think of a "knapsack" as a rucksack, and our goal is to fill it with items so that the rucksack's contents have the highest possible value. Of course, the knappsack has a certain weight capacity, and each item is associated with a certain value and a weight. In other words, we want to maximize the value of the knapsack subject to the constraint that we don't exceed its weight capacity.

As trivial as it sounds, the knapsack problem is still one of the most popular algorithmic problems in the modern computer science area. There are numerous applications of knapsack problems, and to provide an intuitive real-world example: We could think of sports betting or daily fantasy soccer predictions as a knapsack problem, where we want to construct a squad of players with the highest possible points to salary ratio.

### 0-1 Knapsack

Let's us take a look the probably simplest variation of the knapsack problem, the 0-1 knapsack, and tackle it using a "greedy" strategy. In the 0-1 knapsack, we have a given set of items, $i_1, i_2, ..., i_m$, that we can use to fill the knapsack. Again, the knapsack has a fixed capacity, and the items are associated with weights, $w_1, w_2, ..., w_m$, and values $v_1, v_2, ..., v_m$. While our goal is still to pack the knapsack with a combination of items so that it carries the highest possible value, the 0-1 knapsack variation comes with the constraint that we can only carry 1 copy of each item. Thus, the runtime complexity of this algorithm is $O(nW)$, where $n$ is the number of items in the list and $W$ is the maximum capacity of the knapsack.

For example, if we are given 3 items with weights $[w_1: 20, w_2: 50, w_3: 30]$ and values
$[v_1: 60, v_2: 100, v_3: 120]$, a knapsack with capacity 50 may carry to 1 copy of item 1 and 1 copy of item 3 to maximize its value (180) in contrast to just carrying 1 copy of item 2 (value: 100).

Let's see how one "greedy" code implementation may look like:

In [24]:
def knapsack_01(capacity, weights, values, debug=False):
    val_by_weight = [value / weight 
                     for value, weight in zip(values, weights)] # so we create the best bang for buck values
    sort_idx = [i[0] for i in sorted(enumerate(val_by_weight), 
                                     key=lambda x:x[1], 
                                     reverse=True)]
    knapsack = [0 for _ in values]  # initialize knapsack
    total_weight, total_value = 0, 0

    for i in sort_idx: # simply go through our best deals and add if any have room
        # here is our greedy heuristic - rule
        if total_weight + weights[i] <= capacity:
            knapsack[i] = 1
            total_weight += weights[i]
            total_value += values[i]
            if debug:
                print("Added best deal at the moment val/weight", values[i]/weights[i])
                print("Added weight", weights[i])
                print("Added value", values[i])
        if total_weight >= capacity:  # no need to check best deals if we already are over weight or full
            break

    return knapsack, total_weight, total_value

We start by creating an array `val_by_weight`, which contains the value/weight values of the items. Next, we create an array of index positions by sorting the value/weight array; here, we can think of the item with the highest value/weight ratio as the item that gives us the "best bang for the buck." Using a for-loop, we then iterate over the `sort_idx` and check if a given items fits in our knapsack or not, that is, if we can carry it without exceeding the knapsack's capacity. After we checked all items, or if reach the capacity limit prematurely, we exit the for-loop and return the contents of the knapsack as well as its current weight and total value, which we've been tracking all along. 

A concrete example:

In [25]:
weights = [20, 50, 30]
values = [60, 100, 120] 
knapsack_01(capacity=50, weights=weights, values=values)

([1, 0, 1], 50, 180)

In [26]:
knapsack_01(capacity=50, weights=weights, values=values, debug=True)

Added best deal at the moment val/weight 4.0
Added weight 30
Added value 120
Added best deal at the moment val/weight 3.0
Added weight 20
Added value 60


([1, 0, 1], 50, 180)

Running the `knapsack_01` function on the example input above returns a knapsack containing item 1 and item 3, with a total weight equal to its maximum capacity and a value of 180.

Let us take a look at another example:

In [19]:
70/40, 40/30, 34/20 # Euro/kg

(1.75, 1.3333333333333333, 1.7)

In [27]:
weights = [40, 30, 20]
values = [70, 40, 34] 

knapsack_01(capacity=50, weights=weights, values=values, debug=True)

Added best deal at the moment val/weight 1.75
Added weight 40
Added value 70


([1, 0, 0], 40, 70)

Notice the problem here? Our greedy algorithm suggests packing item 1 with weight 40 and a value of 70. Now, our knapsack can't pack any of the other items (weights 20 and 30), without exceeding its capacity. This is an example of where a greedy strategy leads to a globally suboptimal solution. An optimal solution would be to take 1 copy of item 2 and 1 copy of item 3, so that our knapsack carries a weight of 50 with a value of 74.

### Fractional Knapsack

Now, let's implement a slightly different flavor of the knapsack problem, the fractional knapsack, which is guaranteed to find the optimal solution. Here, the rules are slightly different from the 0-1 knapsack that we implemented earlier. Instead of either just *including* or *excluding* an item in the knapsack, we can also add fractions $f$ of an item, subject to the constraint $0 \geq f \leq 1$.

Now, let's take our 0-1 knapsack implementation as a template and make some slight modifications to come up with a fractional knapsack algorithm:

In [28]:
def knapsack_fract(capacity, weights, values, debug=False):
    val_by_weight = [value / weight 
                     for value, weight in zip(values, weights)]
    sort_idx = [i[0] for i in sorted(enumerate(val_by_weight), 
                                     key=lambda x:x[1], 
                                     reverse=True)]
    knapsack = [0 for _ in values]
    total_weight, total_value = 0, 0

    for i in sort_idx:
        if total_weight + weights[i] <= capacity:  # so we add the best deal for full 100% if we have space
            knapsack[i] = 1
            total_weight += weights[i]
            total_value += values[i]
            if debug:
                print("Added best deal at the moment val/weight", values[i]/weights[i])
                print("Added weight", weights[i])
                print("Added value", values[i])
        else:  # unlike 0-1 we can add a fraction of the best deal at the time
            allowed = capacity - total_weight
            frac = allowed / weights[i]
            knapsack[i] = round(frac, 4) # we use round for practical reasons because of float imprecisions
            total_weight += allowed
            total_value += frac * values[i]   
            if debug:
                print("Added a fraction of the  at the moment val/weight", values[i]/weights[i])
                print("Fraction added", knapsack[i])
                print("Added weight", weights[i])
                print("Added value", values[i])        
        if total_weight >= capacity:
            break
    if debug:
        print(knapsack, total_weight, round(total_value, 4))
    return knapsack, total_weight, round(total_value, 4)

Let's give it a whirl on a simple example first:

In [29]:
weights = [20, 50, 30]
values = [60, 100, 120] 
knapsack_fract(capacity=50, weights=weights, values=values, debug=True)

Added best deal at the moment val/weight 4.0
Added weight 30
Added value 120
Added best deal at the moment val/weight 3.0
Added weight 20
Added value 60
[1, 0, 1] 50 180


([1, 0, 1], 50, 180)

The solution is an optimal solution, and we notice that it is the same as the one we got by using the 0-1 knapsack previously.

To demonstrate the difference between the 0-1 knapsack and the fractional knapsack, let's do a second example:

In [30]:
weights = [30]
values = [500] 

knapsack_fract(capacity=10, weights=weights, values=values)

([0.3333], 10, 166.6667)

In [31]:
weights = [40, 30, 20]
values = [70, 40, 34] 

knapsack_fract(capacity=50, weights=weights, values=values, debug=True)

Added best deal at the moment val/weight 1.75
Added weight 40
Added value 70
Added a fraction of the  at the moment val/weight 1.7
Fraction added 0.5
Added weight 20
Added value 34
[1, 0, 0.5] 50 87.0


([1, 0, 0.5], 50, 87.0)

In [32]:
weights = [40, 30, 20]
values = [70, 40, 36] 
knapsack_fract(capacity=50, weights=weights, values=values, debug=True)

Added best deal at the moment val/weight 1.8
Added weight 20
Added value 36
Added a fraction of the  at the moment val/weight 1.75
Fraction added 0.75
Added weight 40
Added value 70
[0.75, 0, 1] 50 88.5


([0.75, 0, 1], 50, 88.5)

## Example 3: Point-Cover-Interval Problem

![Point cover](https://i.stack.imgur.com/4klhj.png)

The classic Point-Cover-Interval problem is another example that is well suited for demonstrating greedy algorithms. Here, we are given a set of Intervals *L*, and we want to find the minimum set of points so that each interval is covered at least once by a given point as illustrated in the example below:

![](https://github.com/ValRCS/RTU_Algorithms_DIP321/blob/main/topics/images/point-cover-interval-ex.png?raw=1)

### Real world example of point-cover-interval problem

Real world example of point-cover-interval problem is the problem of scheduling lectures. You are given a set of lectures and you want to schedule them in such a way that you use the least number of classrooms. Each classroom can be used for a certain period of time. You want to schedule the lectures in such a way that you use the least number of classrooms.

Points would be the classrooms and the intervals would be the lectures. You want to schedule the lectures in such a way that you use the least number of classrooms. TODO is this really the same problem?

Our greedy strategy, which finds the optimal solution for this problem, can be as follows:

- sort intervals in increasing order by the value of their endpoints
- for interval in interval-set:
    - if interval is not yet covered:
        - add interval-endpoint to the set of points

In [33]:
def min_points(intervals, debug=True):
    s_ints = sorted(intervals, key=lambda x: x[1])  # n log n of course in in ascending order by endpoint!
    if debug:
        print(s_ints)
    points = [s_ints[0][-1]]

    for interv in s_ints: # O(n) of course where n is number of intervals
        if not(points[-1] >= interv[0] and points[-1] <= interv[-1]):
            points.append(interv[-1])  # careful with append not all append type operations will be O(1) in all languages
        
    return points

In [34]:
pts = [[2, 5], [1, 3], [3, 6]] # we assume that end and startin points are included
min_points(pts)

[[1, 3], [2, 5], [3, 6]]


[3]

In [35]:
pts = [[4, 7], [1, 3], [2, 5], [5, 6]]
min_points(pts)
# not the only solution points [2, 5] would also cover the intervals
# again these are points on a 1d line

[[1, 3], [2, 5], [5, 6], [4, 7]]


[3, 6]

In [36]:
pts = [[1,2],[5,14], [13,15]]
min_points(pts)

[[1, 2], [5, 14], [13, 15]]


[2, 14]

In [37]:
# we could use starting points as our sort parameter then we have to go backwards
def max_points(intervals, debug=True):
    s_ints = sorted(intervals, key=lambda x: x[0], reverse=True)  # sort by starting point
    if debug:
        print("Sorted by starting", s_ints)
    points = [s_ints[0][0]]

    for interv in s_ints:
        if not(points[-1] >= interv[0] and points[-1] <= interv[-1]):
            points.append(interv[0])
        
    return points

In [38]:
pts = [[4, 7], [1, 3], [2, 5], [5, 6]]
max_points(pts)

Sorted by starting [[5, 6], [4, 7], [2, 5], [1, 3]]


[5, 1]

In [39]:
min_points(pts)

[[1, 3], [2, 5], [5, 6], [4, 7]]


[3, 6]

## Example 4: Pairwise Distinct Summands

In the pairwise distinct summands problem, we are given an integer $n$, and our goal is to find the maximum number of unique summands. For example, given an integer n=8, the maximum number of unique summands would be `[1 + 2 + 5] = 3`.

Implemented in code using a greedy strategy, it looks as follows:

In [43]:
def max_summands(num, debug=False):
    summands = []
    sum_summands = 0
    next_int = 1

    while sum_summands + next_int <= num:
        sum_summands += next_int
        summands.append(next_int)
        if debug:
            print(sum_summands, summands)
        next_int += 1

    # so last number will be the difference between the sum and the number
    # we can prove that this number has not yet been used
    summands[-1] += num - sum_summands
    return summands

In [42]:
max_summands(8, debug=True)

1 [1]
3 [1, 2]
6 [1, 2, 3]


[1, 2, 5]

In [44]:
max_summands(100, debug=True)

1 [1]
3 [1, 2]
6 [1, 2, 3]
10 [1, 2, 3, 4]
15 [1, 2, 3, 4, 5]
21 [1, 2, 3, 4, 5, 6]
28 [1, 2, 3, 4, 5, 6, 7]
36 [1, 2, 3, 4, 5, 6, 7, 8]
45 [1, 2, 3, 4, 5, 6, 7, 8, 9]
55 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
66 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
78 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
91 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]


[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 22]

In [None]:
# Prove that this is optimal!

First, we intitialize the sum of the summands to 0 and evaluate the integer `next_int=1`. We then enter a while loop if the sum of the summ

## Set Cover Problems

Set cover problems are problems where we want to find the minimum number of subsets such that their set union contains all items in a target set. This target set is typically called the "universe." To borrow an example from the excellent [Wikipedia page](https://en.wikipedia.org/wiki/Set_cover_problem) on set cover problems, let's assume we have the universe 

- $U=\{1, 2, 3, 4, 5\}$

and are given the collection of sets 

- $C=\{\{1, 2, 3\}, \{2, 4\}, \{3, 4\}, \{4, 5\}\}$

The task is to find the minimum number of sets in $C$ so that their union equals $U$.

Note that set cover problems are NP-complete, thus no computationally efficient solution exists. However, we can use greedy algorithms to approximate the solution; this solution may or may not be globally optimal.

The greedy strategy we are going to employ is very simple and works as follows:

- While not all elements in U are covered:
  - For all uncovered sets in C:
  - Pick the set that covers most of the elements in U

[Set Covers in Practice](https://cs.stackexchange.com/questions/74407/what-are-the-real-world-applications-of-set-cover-problem)

In [25]:
collection = {'set_1': {1, 2, 3},
              'set_2': {2, 4}, 
              'set_3': {3, 4}, 
              'set_4': {4, 5}}
sets_used = []
elements_not_covered = {1, 2, 3, 4, 5}


while elements_not_covered:
    # initialize with empty set
    elements_covered = set()
    for set_ in collection.keys():
        
        if set_ in sets_used:
            continue
        
        current_set = collection[set_]
        # we add elements to potential solution
        would_cover = elements_covered.union(current_set)
        # and if this set helps we just add the new elements
        if len(would_cover) > len(elements_covered):
            elements_covered = would_cover
            sets_used.append(set_)
            elements_not_covered -= elements_covered
            
            if not elements_not_covered:
                break
    
print(sets_used)

['set_1', 'set_2', 'set_4']


In [26]:
for set_ in sets_used:
    print(collection[set_])

{1, 2, 3}
{2, 4}
{4, 5}


As a result, we can see that 3 sets are required to cover the universe U. In this case, the greedy algorithm has not found the globally optimal solution, which would be `'set_1'` and `'set_4'`. Note that this is just a trivial example, and greedy algorithms can be very useful approximators for solutions that are computationally infeasible.

For instance, an exhaustive solution to this problem that would guaranteed to find the global optimum (remember that set cover problems are NP-complete) would involve iterating over a power set, which has $2^n$ elements, where $n$ is the number of sets in the collection. For example, a collection of 30 sets would already require comparing the solutions of $2^{30}=1,073,741,824$ million possible combinations!

(Note that the greedy approach may have found the globally optimal solution in this simple example if it had iterated over the dictionary in a different order -- for example, if we had swapped the positions of {2, 4} and {4, 5})