# Greedy Algorithms :

https://learning.ecam.be/SA4T/slides/03-greedy


<img src="img/def.png" width="35%">

---

### Coin Change Algorithm :

ex. Give back 43 cents change to give back in terms of coins. (1, 2, 5, 10, 20, 50)
So a greedy algorithm would start with the biggest one : 20

In [9]:
# Greedy Coing change Algorithm :
#
# Methodology :
#    Goal : given a value and a set of available coin, return the list of coins
#           that sum up to the value, using the least number of coins possible.
#
#   1. Sort the list of available coin in descending order.
#   2. Initialize an empty list to store the result.
#   3. For each coin in the sorted list:
#       a. While the coin value is less than or equal to the remaining value:
#           i.   Determine how many times the coin fits into the remaining value.
#           ii.  Subtract the total value of these coins from the remaining value.
#           iii. Add the coin to the result list the determined number of times.
#   4. Return the result list.
#
#
# Example : 
#       value = 83, 
#       coins = [1,2,5,10,20,50,100,200]
#
#       1. sorted coins = [200,100,50,20,10,5,2,1]
#       2. selecte biggest possible coin : select 50 (83-50=33)
#       select 20 (33-20=13)
#       select 10 (13-10=3)
#       select 2  (3-2=1)
#       select 1  (1-1=0)
#       result = [50,20,10,2,1]
# ====================================



coins = [1, 2, 5, 10, 20, 50, 100, 200]


def change(value):
    result = []
    reversed_list = reversed(coins)   
    for coin in reversed_list:          
        if coin <= value:             # 1. select 50 (<83)                             | 2. select 20 (<33)      | 3. etc...
            n = value //coin          #    how many times does 50 fit in 83 ? ===> 1   |    33 mod 20 = 1        |
            value = value - n * coin  #    83 - 1*50 = 33                              |    33 - 20 = 13         |
            result += n * [coin]      #    result = [50]                               |    result = [50,20]     |
    return result


print(change(83))



# O(len(coins))
def change(value: int) -> list[int]:
    result = []
    reversed_list = reversed(coins)           # O(1), fixed size list
    for coin in reversed_list:                # O(m), m = len(coins)
        if value // coin:
            k = value // coin                 # O(1)
            result += k * [coin]              # O(k), creates/appends a list of k coins
            value %= coin                     # O(1)
    return result

change(83)




# ======== COMPLEXITY =========
# 
# 
#   Best Case :
#       - immediately find the coin that matches the value (minimizing k)
#       - value = 100, coins = [1,...,200]
#       - append k=1 coin
#       - loop : O(m), m = len(coins)
#
#       Time complexity : 
#               O(m) + 0(k=1)
#               -> O(m)         but with fixed coin set -> O(1)

# 
#   Worst Case :
#       - coins list is [1], value = 100.000     (or to generalize, large value with only small coins available)
#       - must append k=value times the coin 1
#       - O(k) = O(value)
#       
#       Time complexity : 
#               O(m) + O(k) = O(m + value), but since m=len(coins) is FIXED
#               -> O(value)
#
#       
#   Note :
#       - reversed() is O(1), it is not a sorting algoritm, just an iterator
# 




[50, 20, 10, 2, 1]


[50, 20, 10, 2, 1]

### Proof of correctness of greedy algorithms :

Lemme : An optimal way to reach `value` can always contain the largest coin which is smaller than or equal to `value`

### Return largest number possible based on list of numbers :

In [10]:

    # 



# Subproblems :
    # For a set (or list) S of remaining numbers, 
    # the subproblem is: “What is the lexicographically largest 
    # concatenated string I can make using exactly the elements of S?”

# Base-case : 
    # S empty
    # S has one-element


# Guess :
    # First cut at k (1 to n), leftover P(n-k)
    # ex. k=3 -> prices(3) + P(n-3) = 8 + P(17)

# Recurrence :
    # 0 if n=0
    # 1 if n=1
    #
    # P(n) = max_k[ price[k] + P(n-k) ]


# def largest(numbers) : 
#     temp = 0
#     res = []
#     for num in sorted(numbers):
#         if string(num)[0] > temp : temp = int(str(num)[0])
#     pass

# 1. find the largest 1st number



# print(largest([10, 2])) # returns 2.10


# print(largest([3,30,34,5,9]))  # returns 9.5.34.3.30


# =======================================================================

def compare(a,b):
#Return true si ab est mieux que ba
    ab = int(str(a) + str(b))
    ba = int(str(b) + str(a))
    if  ab>=ba : return True
    else: return False

def custom_merge_sort(numbers):
    # use compare instead of  >. because 34 > 3 > 30   (ex. 9.5.34.3.30 > 9.5.34.30.3)
    if len(numbers) < 2 : return numbers

    # cut list in half
    A, B = A[:len(numbers)/2], B[len(numbers)/2:]
    # recurrence
    A, B = custom_merge_sort(A), custom_merge_sort(B)

    result = []
    while A and B :
        if compare(A[0], B[0]): result.append(A.pop(0))
        else : result.append(B.pop(0))
    return result + A + B


def largest_prof(numbers):
    numbers = custom_merge_sort(numbers)
    return ''.join(numbers)



# print(largest_prof([3,30,34,5,9])) 




### Fractionnal Knapsack :

In [11]:
# Given the weights and prices of n items, put these items in a knapsack of capacity to get the maximum total value in the knapsack. 
# You are allowed to take a fraction of that item. 
# Justify why a greedy approach works.


# Intuition :
    # Objects = [ 1,   2, 3 ]
    # Value   = [ 20, 30, 5 ]
    # Weight  = [ 40, 10, 1 ]

    # Order by value/weight : 3, 2, 1   (prix par unité de poids.)
    # ex. poids_max = 15 ;
        # -> 1, 10, 4 = 15


# ==============================================================

item = tuple[int, int, int] # (id, value, weight)

def FKS(items: list[item], capacity: int) -> list[item]:

    # 1. sort items by value/weight :
    items_sorted = sorted(items, key = lambda item: item[1]/item[2], reverse=True ) 
    # print(items_sorted)

    remaining_capacity = capacity
    res = []

    # 2. Fill sack with 
    for item_id, value, weight in items:
        if remaining_capacity <= 0 : break

        # Take whole item
        if weight <= capacity : 
            res.append([item_id, value, weight])
            remaining_capacity -= weight

        # Take fraction
        else :
            frac = remaining_capacity/weight  #   ratio = min(capacity/weight, 1)
            res.append([item_id, value*frac, weight*frac])
            remaining_capacity = 0

    return res






res = FKS([        # (object_number, value, weight)
    (0, 60, 10),  
    (1, 100, 20), 
    (2, 120, 30)], 
    50
)

print(res)



[[0, 60, 10], [1, 100, 20], [2, 120, 30]]


### Movie loverzz :

In [12]:
# You are given n activities with their start and finish times. 
# Select the maximum number of activities that can be performed by a single person, 
# assuming that a person can only work on a single activity at a time.
# Justify why you can use a greedy strategy. What is the time complexity?

#! ex. regarder autant de films que possible ce soir, on priorise les films de 1h > films de 2-3h.


# Goal: Pick as many activities as possible so one person can do them without overlaps.
activity = tuple[int, int] # (start_time, end_time)


def activity_selection(activities: list[activity]) -> list[activity]:
    res = []
    # 1. Sort by activities that finish the earliest
    sorted_act = sorted(activities, key=lambda act: act[1])
    print(sorted_act)
    
    last_end = -float('inf')   #0
    for start, end in sorted_act:
        if start > last_end:
            res.append((start,end))
            last_end = end
    return res

# activity_selection([(0, 3), (1, 4), (5, 7)])  # (start_time, end_time)

activity_selection([(0, 3), (1,2), (3,4), (1, 4), (5, 7)]) 




[(1, 2), (0, 3), (3, 4), (1, 4), (5, 7)]


[(1, 2), (3, 4), (5, 7)]

### Job CPA (cost) :

In [13]:
# We are given a list of jobs, which all have a deadline and an associated profit if the job is performed before the deadline. 
# All jobs take a single unit of time.
# Maximize the total profit if only one job can be scheduled at a time.
# Justify why a greedy algorithm can be used, and find the time complexity.

# Note: you can finish jobs way before their deadline. (1, 7, 500), you can do (2, 8, 1000) anytime before (can finish job2 at time:3)

job = tuple[int, int, int] # (id, deadline, profit)

def job_sequencing(jobs: list[job]) -> list[int | None]:
    res = []
    # 1. sort by price
    sorted_jobs = sorted(jobs, key=lambda j: j[2], reverse=True)
    print(f'jobs sorted by profit: {sorted_jobs}')

    # 2.
    # dd_list = []
    # for job_id, dd, profit in sorted_jobs:
    #     if dd not in dd_list:
    #         dd_list.append(dd)
    #         res.append((job_id, dd, profit))


    # PROF
    schedule = len(jobs) * [None]

    for job_id, dd, profit in sorted_jobs:
        for slot in reversed(range(dd)):
            if schedule[slot] is None:
                schedule[slot] = job_id
                break

    return schedule

job_sequencing([(0, 4, 20), (1, 1, 10), (2, 1, 40), (3, 1, 30)])  # (id, deadline, profit)

# job_sequencing([(0, 4, 20), (1, 4, 10)])   # should show

# COMPLEXITY :
    # greedy always used a sorting mechanism, so almost always O(nlogn)
    # here we have a O(N^2)



#! Why cant we use HEAP here ?

jobs sorted by profit: [(2, 1, 40), (3, 1, 30), (0, 4, 20), (1, 1, 10)]


[2, None, None, 0]

### Compression Algorithms

A heap is an array visualized as a nearly complete binary tree.
- A max-heap is a heap with the additional property that a parent is always `greater` than or equal to its children.
- A min-heap is a heap with the additional property that a parent is always `less` than or equal to its children.

<img src="img/heap.png" width="25%">


Remark :
- Heaps are the maximally efficient implementation of priority queues. It can be seen as `partial` sorting.

In [14]:
# Max heap:
# - array representation = [n, n+1, n+2, n+3, n+4,....]
# - Tree representation
#       n, 
#    n+1, n+2
#  n+3, n+4, n+5
#  ...               


#### Python libraries for heaps:



- `heapq.heapify(1)` O(n) : list -> min-heap. [4,3,2] -> [2,3,4]

- `heapq.pop(heap)` O(logn) : pop the first item and rearrange "smartly" (without complete recalculation) the tree
- `heapq.push(heap, item)` O(logn) : add an item "smartly" into the heap.


(The librairy heapq has a few useful methods:
`heapq.heapify(1)`: transform a list 1 into a heap in-place (O(n)).
`heapq.pop(heap): remove and return the smallest item from the heap,
maintaining the heap invariant (O(logn)).
`heapq.push(heap, item): push the item onto the heap, maintaining the heap
invariant (O(logn)).)

<img src="img/heapify.png" width="100%">

### Huffman Encoding :


<img src="img/huffmanIntro.png" width="100%">

Binary Encoding :

<!-- <img src="img/huffman.png" width="35%"> -->

- We'd like to encode a set of characters (e.g. `A - 010`, `в - 0010`) as efficiently as possible in order to compress a text string.

- The encoding can depend on the text to encode, so you can give shorter codes to common letters.

- It has to be prefix-free: if `A` was `01`, `B` was `10` and `c` was `011`, there would be a conflict if our encoded string was `011...


=> Each letter needs to be at the bottom of the tree to avoid any ambiguity


<img src="img/huffman2.png" width="35%">

- Don't start by putting the most frequent used letters at the top
- Start by putting the least used letters at the bottom and work your way up.
=> horizontal order doesn't matter, vertical order does.





### Huffman and Shannon :

- Huffman theorized that Information I(x) can be linked to the probability of an outcome P(x) : 
    I(x) = -log_b[P(x)] (for binary, b = 2)

A = 0.25
B = 0.25
C = 0.25
D = 0.25

<img src="img/theory/1.png" width="60%">

source : https://www.youtube.com/@Reducible


- Shannon theorized that the limit between too short encodings (lose information) and too long encodings (too redundant) is the entropy H(x). Such that the average encoding length L >= H(x)


<img src="img/theory/2.png" width="60%">

Most efficient binary tree for equally likely A,B,C,D


But what if we have unequal letters frequencies (E more likely than Z) ?
<img src="img/theory/3.png" width="60%">

General Rule : `Give more likely symbols a shorter smaller encodings.`



### Shannon - Fano Coding

#### Prefix-free coding :
Given a frequency distribution :

<img src="img/theory/4.png" width="40%">

1. Sort them and find the best split of probability (50-50) :

<img src="img/theory/5.png" width="40%">


2. Do this until you obtain your tree : 

<img src="img/theory/6.png" width="60%">

And for each left, add `0`, for each right `1` to the code.




#### Optimizations:

The previous coding is not always optimal, we add the following rules : 

1. Build the tree bottoms-up : Put the 2 least likely symbols at the bottom of the tree. Select the next 3 least-likely symbols and put them one level above, repeat until tree is complete

- Shannon - Fano Coding : Top-down perspective

- Huffman Coding : Bottoms-up perspective.

<img src="img/theory/7.png" width="80%">

In [15]:
# Huffman Coding Algorithm

# Methodology :
#    Goal : given a text, assign binary codes to each character based on their frequencies,
#           such that more frequent characters have shorter codes, minimizing the total length of the encoded text


#   Pseudo-code :
#
#       1. Count the frequency of each character.
#
#       2. Create a Node class :
#                    - frequency
#                    - character
#                    - children (left, right)
#                    - comparison method (based on frequency)
#
#       3. Create a method in this class to compare node's frequencies.
#       
#       ~Since we build the tree from the bottom up :
#           - Priority Queue (Min-Heap) : 
#               keep track of the minimum element in a collection.
#               easily remove the smallest elements and add them to the tree.
#               form a new node with the removed elements and add them back up in the queue.
#               The last element in the queue is the root of the Huffman encoding tree.
# 
#       4. Priority Queue operations 
#           a. create a heap based on the {frequency, char}
#           b. remove 2 least frequent nodes
#           c. create node from them
#           d. add created node back to the heap.
#           
#           
#       5. After (bottoms-up) tree creation is done,
#           build the codebook top-bottom (read only).
#           

#   Example :
#       text = "hello"
#       Frequencies : {h:1, e:1, l:2, o:1}
#       (...see code in cell below in 3-Greedy_Algorithms.ipynb)


# =========================================
from heapq import heapify, heappop, heappush
from collections import Counter
from dataclasses import dataclass
from typing import Optional

@dataclass
class Node:
    freq: int
    char: Optional[str] = None
    children: Optional[list['Node']] = None
    
    def __lt__(self, other):
        return self.freq < other.freq           # O(1)


def huffman(text: str):

    # 1: Calculate the frequencies of each letter in text
    heap = [Node(char=c, freq=f) for c, f in Counter(text).items()]  # O(n) = 3 * O(n) = Counter() + .items() + list comprehension
    print(heap)
    # 2: Heapify
    heapify(heap)                                           # O(n)
    print(heap)

    # 3: Remove 2 least frequent letters, 
    #    create a new node with them as children,
    #    add created node back to the heap.
    while len(heap) > 1:                                    # O(n-1)  
        y, z = heappop(heap), heappop(heap)                 # O(log n) * 2
        w = Node(freq=y.freq + z.freq, children=[y, z])     #
        heappush(heap, w)                                   # O(log n)    

    # 4: Visual representation of the final coding {char: code}
    codebook = {}
    def build_code(letter: Node, prefix: str = ""):
        if letter.char is not None:                         # O(1)
            codebook[letter.char] = prefix                  # O(1)
        if letter.children:
            build_code(letter.children[0], prefix + "0")    # O(depht of node)
        if letter.children:
            build_code(letter.children[1], prefix + "1")    # O(depht of node)
    
    # 5: Build the codebook top-bottom (read only).
    build_code(heap[0])                                      # O(n) - visits all nodes (2n-1)

    return codebook

huffman('helllo')



# ======== COMPLEXITY =========
# 
# 
#   Line-by-Line Complexity :
#       1. Frequency Calculation + Node Creation :
#           - Counter(text) : Scans entire input string of length n     - O(n)
#           - .items()      : Iterates over n distinct characters       - O(n)
#           - List comp     : Creates one Node per distinct character   - O(n)
#       
#           Total :                                                     ~ O(n)
#       
#       2. Heapify :
#           - heapify(heap) : Builds a min-heap from n elements         ~ O(n)
#       
#
#       3. Huffman Tree :
#           - while     : (each it. reduces heap size by 1) ~O(n-1)     - O(n)
#           - heappop() : Each heappop = O(log n)                       - O(log n)
#           - Node      : Node creation and list creation of size 2     - O(1)
#           - heappush(): heap insertion O(log n)                       - O(log n)
#
#           -> Cost per iteration : O(log n)
#
#           Total : O(n-1) * O(log n) =                                 ~ O(n log n)
# 
#
#
#       4. Codebook Creation :
#           - build_code()   :  visiting each node once                  - O(n)
#           - recursive call : concat : prefix + "0" =                   - O(depht of node)
#           - recursive call : concat : prefix + "1" =                   - O(depht of node)
#           
#           -> Cost per iteration : 2* O(depth)                          ~ O(depth)
#           
#           
#           Total : 
#               - Each node is visited once → O(#nodes) = O(n)
#               - At each visit, we concatenate prefix string → O(depth)
#
#               Worst Case : depht = n  ----> (#nodes * depht) =  n^2    ~ O(n^2)
#                   
#               Best Case  : depht = log n  ---->              = n log n  ~ O(n log n)
#       
#               (see below for details)
#       
#       
#       
#       
#       
#    Nodes - Binary Tree :
#       - node : anything that occupies a position in the tree (leaf/elements + internal nodes/not elements)
#       - total nodes = n_leaves + internal_nodes = n + (n-1) = 2n - 1
#
#
# 
# 
#   Cost of traversing the tree :
#
#       Best Case :
#           - tree is balanced (except root, all nodes have 2 children).
#           - height of tree : log n  (classic binary tree)
#           
#           Iteration over nodes : O(n)
#           Copy char to codebook : O(log n)
#           
#           Total cost : (#nodes * depht) = 
#           O(n) * O(log n) =
#           ~ O(n log n)
#           
# 
#       Worst Case :
#           - tree is completely skewed (linear tree, no branches)
#           - height of tree : n 
#           
#           Iteration over nodes : O(n)
#           Copy char to codebook : O(n)
#           
#           Total cost : (#nodes * depht) = 
#           O(n) * O(n) =
#           ~ O(n^2)
#           








[Node(freq=1, char='h', children=None), Node(freq=1, char='e', children=None), Node(freq=3, char='l', children=None), Node(freq=1, char='o', children=None)]
[Node(freq=1, char='o', children=None), Node(freq=1, char='e', children=None), Node(freq=3, char='l', children=None), Node(freq=1, char='h', children=None)]


{'l': '0', 'h': '10', 'o': '110', 'e': '111'}

In [16]:
# Visualize the Huffman Tree and Codes

from heapq import heapify, heappop, heappush
from collections import Counter
from dataclasses import dataclass
from typing import Optional

@dataclass
class Node:
    freq: int
    char: Optional[str] = None
    children: Optional[list['Node']] = None

    def __lt__(self, other):
        return self.freq < other.freq

def node_repr(node):
    if node.char is not None:
        return f"('{node.char}', {node.freq})"
    return f"(•, {node.freq})"


def visual_huffman(text: str):

    print("\n=== INPUT TEXT ===")
    print(text)

    # 1. Frequency count
    freq = Counter(text)
    print("\n=== FREQUENCIES ===")
    for c, f in freq.items():
        print(f"'{c}': {f}")

    # Create initial heap
    heap = [Node(char=c, freq=f) for c, f in freq.items()]
    heapify(heap)

    print("\n=== INITIAL HEAP ===")
    print([node_repr(n) for n in heap])

    # 2. Build Huffman Tree
    step = 1
    while len(heap) > 1:
        print(f"\n--- STEP {step} ---")

        y = heappop(heap)
        z = heappop(heap)

        print("Pop two smallest:")
        print("  y =", node_repr(y))
        print("  z =", node_repr(z))

        w = Node(freq=y.freq + z.freq, children=[y, z])

        print("Create new internal node:")
        print("  w =", node_repr(w))

        heappush(heap, w)

        print("Heap after push:")
        print([node_repr(n) for n in heap])

        step += 1

    root = heap[0]
    print("\n=== FINAL ROOT ===")
    print(node_repr(root))

    # 3. Build codebook
    codebook = {}

    def build_code(node: Node, prefix: str = ""):
        if node.char is not None:
            print(f"Assign code: '{node.char}' -> {prefix}")
            codebook[node.char] = prefix
            return

        print(f"Traverse internal node (freq={node.freq}) with prefix='{prefix}'")

        build_code(node.children[0], prefix + "0")
        build_code(node.children[1], prefix + "1")

    print("\n=== BUILDING CODEBOOK ===")
    build_code(root)

    print("\n=== FINAL CODEBOOK ===")
    for k, v in codebook.items():
        print(f"'{k}': {v}")

    return codebook

visual_huffman("ABCDA")

# visual_huffman("helllo")





=== INPUT TEXT ===
ABCDA

=== FREQUENCIES ===
'A': 2
'B': 1
'C': 1
'D': 1

=== INITIAL HEAP ===
["('C', 1)", "('D', 1)", "('A', 2)", "('B', 1)"]

--- STEP 1 ---
Pop two smallest:
  y = ('C', 1)
  z = ('D', 1)
Create new internal node:
  w = (•, 2)
Heap after push:
["('B', 1)", "('A', 2)", '(•, 2)']

--- STEP 2 ---
Pop two smallest:
  y = ('B', 1)
  z = ('A', 2)
Create new internal node:
  w = (•, 3)
Heap after push:
['(•, 2)', '(•, 3)']

--- STEP 3 ---
Pop two smallest:
  y = (•, 2)
  z = (•, 3)
Create new internal node:
  w = (•, 5)
Heap after push:
['(•, 5)']

=== FINAL ROOT ===
(•, 5)

=== BUILDING CODEBOOK ===
Traverse internal node (freq=5) with prefix=''
Traverse internal node (freq=2) with prefix='0'
Assign code: 'C' -> 00
Assign code: 'D' -> 01
Traverse internal node (freq=3) with prefix='1'
Assign code: 'B' -> 10
Assign code: 'A' -> 11

=== FINAL CODEBOOK ===
'C': 00
'D': 01
'B': 10
'A': 11


{'C': '00', 'D': '01', 'B': '10', 'A': '11'}

#### Why huffman is the most optimal ?



- Nodes have 2 child, if only 1, its parent is useless and we upgrade the child.


Exam : given a list of char, give the optimal huffman encoding.

