# Chapter 10: The Greedy Method

Algorithm that repeatedly makes choices to optimize some objective function, like repeatedly accepting the bid that maximizes price-per-unit

Applied to optimization problems (problems that involve searching through a set of configurations to find one that minimizes or maximizes the objective function)

**Greedy-choice property**: property that states a global optimal configuration may be reached by a series of locally optimal choices

**Knapsack problem**: given a set of n items, each with a weight and value, the objective is to maximize the total value while not going over the weight capacity of the knapsack

## Fractional Knapsack Problem

**Fractional knapsack problem**: given a set S of n items, such that each item has a positive benefit $b_i$ and a positive weight $w_i$, the objective is to find the maximum-benefit subset that does not exceed a given weight W. Items are allowed to be broken into fractions arbitrarily, with fractions of an item represented by $x_i$ such that $0 \leq x_i \leq w_i$ for each $i \in S$. This yields the objective function, which determines the total benefit of the items taken:

$$ \sum_{i \in S}b_i(\frac{x_i}{w_i})$$

This problem may be solved via the greedy method. The most important decision is determining the objective function that will be optimized. A good approach is to rank items by their **value**, which is determined to be the ratio of their benefits and weights. This yields the following algorithm:

![10-greedy-fractional-knapsack](./res/10-greedy-fractional-knapsack.PNG)

This algorithm runs in $O(nlogn)$ time, with n being the number of items in S. Either a heap-based priority queue may be used to store the items of S, or the items can be sorted by their benefit-to-weight ratios, with both options resulting in the same running time

An **exchange argument** is used to prove this running time. Its general structure is proof by contradiction, where it is assumed that there is a better solution than the greedy algorithm. It is then argued that there is an exchange that may be made among the components of the solution that would lead to a better solution. This approach shows that the greedy method can effectively be used to solve the fractional knapsack problem

## Task Scheduling

**Task scheduling problem**: given a set T of n tasks, each with a start time $s_i$ and finish time $f_i$, wherein each task must start or end at its respective times, schedule all tasks on the fewest machines possible in a nonconflicting way

The greedy method solves this problem using the following algorithm:

![greedy-scheduling](./res/10-greedy-scheduling.PNG)

This algorithm runs in $O(nlogn)$ time. It is proved using a **lower-bound** argument. This argues that any solution to the problem will require a cost of at least some given parameter. The greedy algorithm is then shown to achieve this lower bound as an upper bound

## Text Compression and Huffman Coding

**Text compression**: efficiently encode a character string X into a small binary string Y

**Variable-length encoding schemes**: allow the codes for various characters to have different lengths, with user-friendly character having the least and unfriendly ones having the most. The encoding scheme represents a **prefix code**, meaning that no code word is a prefix of any other code word in the scheme

### Huffman Coding

This method produces a variable-length prefix code for X based on the construction of a proper binary tree T that represents the code

Each edge of T represents a it in a code word, with each edge to a left child representing a "0" and each one to a right child a "1". Each external node v is associated with a specific character, and the code word is defined by the sequence of bits associated with the edges in the path from the root of T to v:

![huffman-tree](./res/10-huffman-tree.PNG)

This process is accomplished via the following algorithm:

![huffman-algo](./res/10-huffman-algo.PNG)

This algorithm runs in $O(d log d)$ time, with d representing the number of distinct characters in the string

If T is a binary tree, T, with minimum total path weight for a set, C, of characters, with each c in C having a positive weight f(c), then T is proper, meaning each internal node in T has two children

Given a set C of characters with a positive weight of f(c) for each c in C, two characters with the smallest two weights, b and c, are associated with nodes that have the maximum depth and are siblings in a binary tree T with minimum total path weight for C

The Huffman coding algorithm constructs an optimal prefix code for a string of length n with d distinct characters in $O(n + dlogd)$

The Huffman algorithm uses the greedy method by using a sequence of choices, starting from a well-understood starting condition, and iteratively making additional choices by identifying the decision that achieves the best cost improvement from all of the choices that are currently positive