# Chapter 22: Greedy Algorithms

> *"A greedy algorithm makes the choice that looks best at the moment, hoping that local optimum will lead to global optimum."* — Anonymous

---

## 22.1 Introduction to Greedy Algorithms

**Greedy algorithms** build up a solution piece by piece, always choosing the next piece that offers the most immediate benefit. Unlike backtracking which explores multiple options, greedy algorithms make a single sequence of irrevocable decisions.

### 22.1.1 When Greedy Works

A greedy algorithm is correct for a problem if it exhibits:

1. **Greedy Choice Property:** A globally optimal solution can be arrived at by making a locally optimal (greedy) choice.
2. **Optimal Substructure:** An optimal solution to the problem contains optimal solutions to subproblems.

These properties are often proven using **exchange arguments** or **induction**.

### 22.1.2 Why Greedy Algorithms Matter

```
┌─────────────────────────────────────────────────────────────────────┐
│                    IMPORTANCE OF GREEDY ALGORITHMS                    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. SIMPLICITY: Greedy algorithms are often easy to conceptualize   │
│     and implement.                                                   │
│                                                                      │
│  2. EFFICIENCY: They typically run in O(n log n) or O(n) time,      │
│     making them suitable for large inputs.                           │
│                                                                      │
│  3. FOUNDATION: Many classic algorithms are greedy (Dijkstra,       │
│     Prim, Kruskal, Huffman coding).                                 │
│                                                                      │
│  4. APPROXIMATION: For NP-hard problems, greedy can provide         │
│     near-optimal solutions (e.g., set cover).                        │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
```

---

## 22.2 Activity Selection Problem

**Problem:** Given n activities with start and finish times, select the maximum number of non-overlapping activities.

**Greedy Strategy:** Always pick the activity with the earliest finish time that doesn't conflict with previously selected activities.

```python
def activity_selection(activities):
    """
    activities: list of (start, finish)
    Returns list of selected activities (indices)
    """
    # Sort by finish time
    activities.sort(key=lambda x: x[1])
    selected = [0]  # first activity
    last_finish = activities[0][1]
    
    for i in range(1, len(activities)):
        if activities[i][0] >= last_finish:
            selected.append(i)
            last_finish = activities[i][1]
    return selected
```

**Correctness Proof (Exchange Argument):**  
Let A be the greedy solution (sorted by finish). Let O be an optimal solution sorted by finish. Compare the first activity where they differ: greedy's first activity finishes no later than O's first (by choice). Replace O's first with greedy's first, and the new set is still feasible and has same size. By induction, greedy is optimal.

**Time Complexity:** O(n log n) due to sorting.

---

## 22.3 Interval Scheduling / Interval Partitioning

### 22.3.1 Interval Scheduling (Maximum number of non-overlapping intervals)

Already covered in Activity Selection.

### 22.3.2 Interval Partitioning (Minimum number of classrooms)

**Problem:** Given n lectures with start and end times, assign each to a classroom such that no two lectures overlap in the same room. Minimize number of classrooms.

**Greedy Strategy:** Sort lectures by start time. Use a priority queue (min-heap) to track the earliest ending time among current rooms. For each lecture, if it can fit in a room (start >= earliest finish), reuse that room; otherwise, allocate a new room.

```python
import heapq

def min_classrooms(intervals):
    intervals.sort(key=lambda x: x[0])  # sort by start time
    pq = []  # min-heap of end times
    for start, end in intervals:
        if pq and start >= pq[0]:
            heapq.heappop(pq)  # reuse room
        heapq.heappush(pq, end)
    return len(pq)
```

**Correctness:** This algorithm produces an optimal assignment because it minimizes the maximum number of overlapping intervals at any point (which equals the number of rooms needed). The greedy choice of reusing the earliest-finishing room is optimal.

**Time Complexity:** O(n log n)

---

## 22.4 Huffman Coding

**Problem:** Given a set of characters with frequencies, generate a binary prefix code that minimizes the total encoded length.

**Greedy Strategy:** Repeatedly merge the two least frequent characters into a new node, until one tree remains.

```python
import heapq

class HuffmanNode:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None
    
    def __lt__(self, other):
        return self.freq < other.freq

def huffman_codes(chars_freq):
    # chars_freq: list of (char, freq)
    heap = [HuffmanNode(ch, freq) for ch, freq in chars_freq]
    heapq.heapify(heap)
    
    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        merged = HuffmanNode(None, left.freq + right.freq)
        merged.left = left
        merged.right = right
        heapq.heappush(heap, merged)
    
    root = heap[0]
    codes = {}
    def traverse(node, code):
        if node.char is not None:
            codes[node.char] = code
            return
        traverse(node.left, code + '0')
        traverse(node.right, code + '1')
    
    traverse(root, "")
    return codes
```

**Correctness Proof:** The greedy choice of merging the two smallest frequencies is optimal. This can be proven via exchange argument: any optimal tree has the two lowest-frequency characters as siblings at the deepest level. Merging them first leads to optimal code.

**Time Complexity:** O(n log n) (heap operations)

---

## 22.5 Fractional Knapsack

**Problem:** Given items with weight and value, and a knapsack capacity, fill the knapsack to maximize total value. Items can be taken fractionally.

**Greedy Strategy:** Sort items by value per unit weight (v/w) descending. Take as much as possible of the highest value-density item, then next, etc.

```python
def fractional_knapsack(items, capacity):
    """
    items: list of (value, weight)
    Returns maximum value
    """
    items.sort(key=lambda x: x[0]/x[1], reverse=True)
    total_value = 0
    for value, weight in items:
        if capacity >= weight:
            total_value += value
            capacity -= weight
        else:
            total_value += value * (capacity / weight)
            break
    return total_value
```

**Correctness:** This greedy works because we can always replace any fraction of a lower density item with an equal weight of higher density to increase value.

**Time Complexity:** O(n log n)

**Note:** The 0/1 knapsack problem (items cannot be split) does not yield to greedy; it requires dynamic programming.

---

## 22.6 Task Scheduling with Deadlines

**Problem:** Given n tasks, each with a deadline dᵢ and profit pᵢ, each takes unit time. Schedule tasks to maximize total profit earned by their deadlines.

**Greedy Strategy (Maximum Profit First):** Sort tasks by profit descending. For each task, try to schedule it at the latest possible slot before its deadline (using a disjoint-set structure to find free slot).

```python
def task_scheduling(tasks):
    """
    tasks: list of (deadline, profit)
    Returns max profit
    """
    tasks.sort(key=lambda x: x[1], reverse=True)  # sort by profit descending
    max_deadline = max(tasks)[0]
    slot = [-1] * (max_deadline + 1)  # -1 means free
    total_profit = 0
    for deadline, profit in tasks:
        # find latest free slot <= deadline
        for t in range(deadline, 0, -1):
            if slot[t] == -1:
                slot[t] = profit
                total_profit += profit
                break
    return total_profit
```

**Optimization with Union-Find:** Using a disjoint set to quickly find the latest available slot reduces complexity to near O(n α(n)).

**Correctness:** Exchange argument: if an optimal schedule has a task not scheduled at its latest possible slot, we can swap to increase profit or keep same.

---

## 22.7 Minimum Spanning Tree (Revisited)

Kruskal and Prim algorithms are greedy:

- **Kruskal:** Always pick the smallest edge that doesn't form a cycle.
- **Prim:** Always add the smallest edge connecting the current tree to a new vertex.

These were covered in Chapter 18.

---

## 22.8 Dijkstra's Algorithm (Revisited)

Dijkstra's algorithm for shortest paths is greedy: always relax edges from the vertex with smallest tentative distance. Covered in Chapter 17.

---

## 22.9 Exchange Arguments: Proof Technique

**Exchange Argument** is a common way to prove greedy optimality:

1. Consider an optimal solution O.
2. Show that if O differs from the greedy solution G, we can modify O to be more like G without worsening the objective.
3. By repeated exchanges, we transform O into G, showing G is as good as O.

### Example: Activity Selection

- Let G = greedy solution (sorted by finish). Let O = optimal solution (sorted by finish). Compare first index where they differ: G has activity a, O has activity b (with finish_a ≤ finish_b). Replace b with a in O; O' is still feasible and has same size. So G's choice is optimal.

### Example: Huffman Coding

- Any optimal prefix code tree has the two smallest frequency characters as siblings at deepest level. If not, swap to get them there without increasing total cost. Then merging them first is optimal.

---

## 22.10 Summary of Greedy Algorithms

| Problem                    | Greedy Strategy                         | Complexity  |
|----------------------------|-----------------------------------------|-------------|
| Activity Selection         | Earliest finish time                     | O(n log n)  |
| Interval Partitioning      | Earliest finish first reuse              | O(n log n)  |
| Huffman Coding             | Merge smallest frequencies                | O(n log n)  |
| Fractional Knapsack        | Highest value/weight ratio                | O(n log n)  |
| Task Scheduling            | Highest profit first, latest slot         | O(n²) or near-linear with DSU |
| MST (Kruskal)              | Smallest edge not forming cycle           | O(E log E)  |
| MST (Prim)                 | Smallest edge connecting tree to outside  | O(E log V)  |
| Dijkstra                   | Smallest tentative distance                | O((V+E) log V) |

---

## 22.11 When Greedy Fails

Greedy does not always work. Classic counterexamples:

- **0/1 Knapsack:** Greedy by value/weight fails. Example: capacity 50, items (value, weight): (60,10), (100,20), (120,30). Value/weight: 6,5,4. Greedy picks 60+100=160, weight 30, remaining capacity 20 cannot take 120. Optimal is 100+120=220 (weight 50).
- **Minimum Coin Change (some denominations):** Greedy works for US coins (1,5,10,25) but not for arbitrary denominations, e.g., 1,3,4 to make 6: greedy picks 4+1+1 (3 coins) but optimal is 3+3 (2 coins).
- **Traveling Salesman (nearest neighbor):** Greedy nearest neighbor can give arbitrarily bad tours.

---

## 22.12 Practice Problems

### Problem 1: Maximum Length of Pair Chain (LeetCode 646)
You are given n pairs where pairs[i] = [left, right] and left < right. A chain is a sequence such that p2[0] > p1[1]. Find longest chain.

**Hint:** Activity selection variant.

### Problem 2: Non-overlapping Intervals (LeetCode 435)
Given intervals, find minimum number to remove so that the rest are non-overlapping.

**Hint:** Activity selection: keep as many as possible → total - max kept.

### Problem 3: Minimum Number of Arrows to Burst Balloons (LeetCode 452)
Balloons represented by intervals [x_start, x_end]. An arrow shot at x bursts all balloons with x_start ≤ x ≤ x_end. Find min arrows.

**Hint:** Sort by end; greedy shoot at end of first balloon, remove all overlapping, repeat.

### Problem 4: Gas Station (LeetCode 134)
Given gas stations with gas[i] and cost[i] to next station, find starting index to complete circuit.

**Hint:** Greedy: if total gas < total cost, impossible; otherwise, start where deficit is never negative.

### Problem 5: Partition Labels (LeetCode 763)
Partition string into as many parts as possible so each letter appears in at most one part.

**Hint:** Record last occurrence of each char, greedy extend current part.

### Problem 6: Queue Reconstruction by Height (LeetCode 406)
People in queue described by [h, k] where h is height, k is number of taller people in front. Reconstruct queue.

**Hint:** Sort by height descending, then insert at position k.

### Problem 7: Candy (LeetCode 135)
Children with ratings, each gets at least 1 candy, and children with higher rating than neighbor get more candies. Find minimum total candies.

**Hint:** Two-pass greedy: left to right, then right to left.

### Problem 8: Remove K Digits (LeetCode 402)
Given string num representing non-negative integer, remove k digits to make smallest number.

**Hint:** Greedy stack: remove larger previous digits while possible.

### Problem 9: Jump Game II (LeetCode 45)
Minimum number of jumps to reach end.

**Hint:** Greedy at each step, choose furthest reachable.

### Problem 10: Split Array into Consecutive Subsequences (LeetCode 659)
Check if array can be split into consecutive subsequences of length ≥ 3.

**Hint:** Greedy with frequency and "open" sequences.

---

## 22.13 Further Reading

1. **"Introduction to Algorithms" (CLRS)** – Chapter 16 (Greedy Algorithms)
2. **"Algorithm Design"** by Kleinberg & Tardos – Chapter 4 (Greedy Algorithms)
3. **"The Algorithm Design Manual"** by Steven Skiena – Section 13.2 (Greedy Algorithms)
4. **Original Papers**:
   - Huffman, D. A. (1952) – "A Method for the Construction of Minimum-Redundancy Codes"
   - Dijkstra, E. W. (1959) – "A note on two problems in connexion with graphs"

---

> **Coming in Chapter 23**: **Dynamic Programming** – We'll explore the power of memoization, tabulation, and solving optimization problems by breaking them into overlapping subproblems.

---

**End of Chapter 22**

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='21. divide_and_conquer.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='23. dynamic_programming.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
