This week for discussion section we will do some review problems.

## Problem 1

Consider the following problems.

1. Given $n$ items with weights $w_1, \dotsc, w_n$ and a knapsack with
capacity $c,$ find the maximal weight you can fit in your knapsack.
2. Same problem as the previous one, but now the items also have values
$v_1, \dotsc, v_n$ and you instead want to maximize the value.
3. Same problem as the previous one, but now you can take any proportion of
any item, e.g. "take 16.5% of the second item."
4. Suppose you are throwing a party for your $n$ friends. Upon arrival at your
party, your $i$ th friend will stay at the party if and only if there are
already at least $b_i$ people at the party, including yourself. Find an
order at which your friends will arrive that maximizes the number of friends
who stay.
5. Let $A$ and $B$ be two lists of numbers. Determine a longest common
subsequence.
6. Let $A$ and $B$ be two lists of numbers. Determine whether a longest
common subsequence is $A$ or $B.$

For each of these problems, come up with at least one reasonable greedy
algorithm, and for each algorithm you come up with, say whether it works (no 
proof required) or fails (give an example). Here are how points work:
- If it works and you say it works, then you get +10 points.
- If it works and you say it fails, then you get -2 point.
- If it fails and you say it fails and give an example, then you get +5 points.
- If it fails and you say it works, then you get -100 points.

#### Solution
1. The greedy algorithm of taking a maximal available weight fails due to the
example $[3, 2, 2]$ and $c = 4.$ Similarly, the greedy algorithm of taking
a minimal available weight fails due to the example $[1, 1, 3]$ and $c = 3.$
2. The greedy algorithm of taking an available item with maximal
value-to-weight-ratio fails due to the example where the weights are 
$[3, 2, 2],$ the values are $[3, 2, 2],$ and $c = 4,$ in which case the ratios
are all equal. More explicitly, it fails uniquely for the example where the
weights are $[300, 200, 200],$ the values are $[301, 200, 200],$ and $c = 400.$
3. The greedy algorithm of taking as much of an available item with maximal
value-to-weight-ratio works.
4. The greedy algorithm of ordering the friends by increasing $b_i$ works.
5. The greedy algorithm of iterating through the items of $A$ and taking the
next first occurence (if it exists) of the item in $B$ fails due to the example
$A = [5, 1, 1, 1]$ and $B = [1, 1, 1, 5].$ Reversing the roles and trying
both ways also does not work due to the example $A = [5, 1, 1, 1, 1, 5]$ and
$B = [1, 1, 5, 1, 1].$
6. A longest common subsequence being $A$ or $B$ means that $A$ is a subsequence
of $B$ or $B$ is a subsequence of $A.$ We can check both by using the greedy
algorithm from HW 2 Problem 4, where e.g. to check whether $A$ is a subsequence
of $B,$ we iterate through the items of $A$ and take the next first occurence
(if it exists) of the item in $B.$

Here is some code implementing the solutions to problems 3 and 6.

In [41]:
def n_stay(ordered_b):
    party_size = 1
    for b in ordered_b:
        if party_size >= b:
            party_size += 1
    return party_size

def max_n_stay(b):
    ordered_b = sorted(b)
    return n_stay(ordered_b)

assert max_n_stay([0, 0, 0, 0]) == 5
assert max_n_stay([5, 4, 3, 2, 1]) == 6
assert max_n_stay([5, 5, 5, 5, 5, 5, 5, 5, 5]) == 1


def knapsack_proportions(weights, values, capacity):
    ratio_and_weights = sorted([(v / w, w) for w, v in zip(weights, values)])
    value_in_knapsack = 0
    while ratio_and_weights and capacity > 0:
        ratio, w = ratio_and_weights.pop() # largest available ratio
        weight_taken = min(capacity, w)
        capacity -= weight_taken
        value_in_knapsack += weight_taken * ratio # value taken
    return value_in_knapsack

assert knapsack_proportions([100], [100], 50) == 50
assert knapsack_proportions([100, 100], [500, 100], 150) == 550
assert knapsack_proportions([5, 3, 1], [2, 4, 6], 6) == 6 + 4 + 2 * (2 / 5)

## Problem 2
Suppose you have the stock prices of Nvidia over the course of $n$ days and that
you can go back in time and choose a buy day and a sell day (of course, you can
only sell if you have already bought). Design an algorithm to maximize your
profit.

#### Solution
Main idea: We divide and conquer by splitting our array of stock prices in half.
We can find the best "crossing" buy day and sell day in $O(n)$ time by
optimizing the suffix in the first half and the prefix in the second half
individually.

Code:

In [53]:
def maximize_profits(prices):
    if len(prices) in {0, 1}:
        return 0
    middle = len(prices) // 2
    A, B = prices[: middle], prices[middle: ]
    max_profit_A, max_profit_B = maximize_profits(A), maximize_profits(B)
    max_profit_crossing = max(B) - min(A)
    return max(max_profit_A, max_profit_B, max_profit_crossing)

def maximize_profits_brute_force(prices):
    return max(
        prices[j] - prices[i]
        for i in range(len(prices))
        for j in range(i, len(prices))
    )

prices = [2, 2, 5, 6, 7, 8]
assert maximize_profits(prices) == maximize_profits_brute_force(prices)
prices = [5, 100, 1, 200]
assert maximize_profits(prices) == maximize_profits_brute_force(prices)
prices = [19, 54, 23, 67, 54, 21, 93, 5, 92, 95, 101, 2, 56, 100]
assert maximize_profits(prices) == maximize_profits_brute_force(prices)
prices = [67, 45, 7, 25, 1, 34, 1, 3, 6, 245, 7, 36, 72, 6, 34, 5, 14, 54, 7]
assert maximize_profits(prices) == maximize_profits_brute_force(prices)

Runtime: We are spitting the problem into two subproblems each of size
$n / 2,$ and we are combining the solutions in $O(n)$ time in particular
because the `max_profit_crossing` computation is $O(n)$ time. By the master
theorem the overall runtime is $O(n \log n).$

Correctness: Call $A$ and $B$ the left and right halves respectively. To see
that the recurrence works, observe that every pair of buy day and sell day is
either both in $A,$ both in $B,$ or crossing, so returning the max of these
gives the correct answer. The `max_profit_crossing` computation works because
for a buy day $i$ in $A$ and a sell day $j$ in $B,$ the profit can be written
$B[j] - A[i],$ so `max_profit_crossing` is
$$\max_{i, j}(B[j] - A[i]) \leq \max_{i, j}B[j] + \max_{i, j}(-A[i]) = \max B - \min A.$$
The reverse inequality is trivial.

## Problem 3
Given $n$ points $(x_1, y_1), \dotsc, (x_n, y_n) \in \R^2$ in the plane (with
pairwise distinct distances from the origin) and an integer $1 \leq k \leq n,$
design an algorithm to find the $k$ th furthest point from the origin. Analyze
the runtime of your algorithm when $k = 1$ and when $k = n.$

#### Solution
Main idea: We construct a heap with priority given by the negative distance
of the points and then pop off $k$ elements to get the $k$ th furthest point.

Code:

In [73]:
import heapq
import math

def kth_furthest_point(points, k):
    distances_and_xy = [
        (-math.sqrt(x_i ** 2 + y_i ** 2), x_i, y_i)
        for x_i, y_i in points
    ]
    heapq.heapify(distances_and_xy)
    for _ in range(k - 1):
        heapq.heappop(distances_and_xy)
    _, x_k, y_k = distances_and_xy[0]
    return (x_k, y_k)

assert kth_furthest_point([(5, 5), (10, 10), (15, 15)], 1) == (15, 15)
assert kth_furthest_point([(5, 5), (10, 10), (15, 15)], 2) == (10, 10)
assert kth_furthest_point([(5, 5), (10, 10), (15, 15)], 3) == (5, 5)
assert kth_furthest_point([
    (14, 23), (1, 2), (0, 0), (-3, -4), (-5, -10), (100, 100), (-100, 101)
], 3) == (14, 23)

Correctness: By definition of a heap.

Runtime: Heapifying takes $O(n)$ time, and popping off $k$ elements takes
$O(k \log n)$ time, so overall this is $O(n + k \log n).$ In particular, when
$k = 1$ this is $O(n),$ and when $k = n$ this is $O(n \log n).$

## Problem 4
Let $A$ be a sorted $n \times n$ matrix of numbers, in the sense that
$A[i][j] <= A[i + 1][j]$ and $A[i][j] <= A[i][j + 1]$ for all relevant $i, j.$
Consider the problem of whether a given number $x$ is in $A.$ Design a divide
and conquer algorithm to do this. Is this optimal?

#### Solution
Main idea: By comparing the middle element of $A$ with $x,$ we can eliminate
one of the quadrants of $A,$ and the remaining three quadrants we can recurse
on.

Code:

In [1]:
def binary_search_array(A, x):
    def helper(a, b, c, d):
        # base cases
        if a == c or b == d:
            return False
        elif c == a + 1 and d == b + 1:
            return A[a][b] == x
        # recurse
        i, j = (a + c) // 2, (b + d) // 2
        if A[i][j] == x:
            return True
        elif A[i][j] < x:
            return (
                helper(i + 1, b, c, j)
                or helper(i + 1, j, c, d)
                or helper(a, j + 1, i + 1, d)
            )
        else: # A[i][j] < x
            return (
                helper(a, b, i, j)
                or helper(i, b, c, j)
                or helper(a, j, i, d)
            )
    return helper(0, 0, len(A), len(A[0]))

def search_array(A, x):
    n = len(A)
    curr_i, curr_j = 0, n
    while curr_i < len(A) and curr_j >= 0:
        if A[curr_i][curr_j] == x:
            return True
        elif A[curr_i][curr_j] < x:
            curr_i += 1
        else:
            curr_j -= 1
    return False

A = [
    [2, 4, 8, 10, 11],
    [2, 4, 9, 13, 15],
    [7, 8, 9, 16, 17],
    [13, 14, 20, 22, 24]
]
for solution in {binary_search_array, search_array}:
    assert solution(A, 2)
    assert solution(A, 4)
    assert solution(A, 7)
    assert solution(A, 8)
    assert solution(A, 9)
    assert solution(A, 10)
    assert solution(A, 13)
    assert solution(A, 14)
    assert solution(A, 15)
    assert solution(A, 16)
    assert solution(A, 17)
    assert solution(A, 20)
    assert solution(A, 22)
    assert solution(A, 24)
    assert not solution(A, 1)
    assert not solution(A, 3)
    assert not solution(A, 5)
    assert not solution(A, 6)
    assert not solution(A, 18)
    assert not solution(A, 21)
    assert not solution(A, 25)

Correctness: If $A[i][j] > x,$ then $A[i + k][j + \ell] > x$ for any relevant
$k > 0$ and $\ell > 0$ because $A$ is sorted. This eliminates the bottom-right
quadrant of $A.$ Similarly if $A[i][j] < x.$ Since $x$ is in $A$ if and only if
it is in one of these quadrants, the recursion works.

Runtime: We are splitting our problem into three subproblems, each of size
$n / 2,$ and it takes $O(1)$ time to split and combine solutions. Thus by master
theorem the overall runetime is $O(n^{\log_2(3)}),$ where
$\log_2(3) \approx 1.58.$

We can do better than this. For example, we can binary search each row for
the element $x,$ and this takes $O(n \log n)$ time. Even better, we can start
at the top-right corner of the array and move down or left depending on the
comparison of $x$ with our curernt element, and this takes $O(n)$ time.