# CS 4820 Algorithms

This is a study note for CS 4820: algorithms. It is meant to be a summary of crucial materials in the course. This was the first time that I wrote notes like this in English.

## 1 Greedy Algorithms

### 1.1 Problem: Interval Scheduling

#### The Problem

There are n requests that occupy time intervals. Each of them starts at s(i) and ends at f(i). Only one request could be processed at a time. Find the maximum amount of requests that could be scheduled without time intervals overlapping with each other.

#### The Design

While scheduling, always choose the first compatible interval with the least f(i)'s.

#### The Analysis

Proof: By "Stays Ahead" Method. We prove that among all possible solutions, our solution has the least f(i) for every i. Then we prove by contradiction that our solution is optimal.

Run Time: O(n log n), from sorting.

### 1.2 Problem: Minimize Lateness Scheduling

#### The Problem

All intervals are available from point s. Each has length t(i) and deadline d(i). Scheduling intervals after the deadline will have a lateness penality proportional to lateness time. Only one interval can be processed at a time. All intervals should be scheduled. Find a scheduling of these intervals such that the total lateness is minimized.

#### The Design

Always schedule the one with the earliest deadline first.

#### The Analysis

Proof: By "Exchange" Method. First we prove that there is an optimal solution O with no idle time. Then we prove that invert the scheduling of all intervals with deadlines against the "fully sorted order" will not increase total lateness. Since intervals can be inverted up untill become fully sorted by their deadlines, our solution is proved to be at least as good as optimal.

### 1.3 Problem: Graph Shortest Paths

#### The Problem

In a graph where each edge is weighed by l(e) non negative, find a path from s to v with the lowest sum of l(e)'s.

#### The Design: Dijkstra's Algorithm

Starting from s. Add all nodes already explored to a set S. For each node j with at least an edge connecting to S, refresh its distance to s using

\begin{equation*}
d'(v) = min_{e=(u,v):u\in S}d(u)+l_{e}
\end{equation*}

#### The Analysis

Proof: By "Stay Ahead" Method. We prove that if another path P' with the shortest distance to v exists, then its cost is no less than the cost of P by the non-negativity of the edge weights and the process of always finding the least cost outward edge.

Run Time: Using a priority queue, Dijkstra costs O(m) + n O(log n) + m O(log n) for a graph with n nodes and m edges.

### 1.4 Problem: Minimum Spanning Tree

#### The Problem

A **minimum spanning tree** is a subgraph of a connected and edge-weighted graph that has the least total cost possible. Find the minimum spanning tree of a connected edge-weighed graph.

#### The Design

**Kruskal's Algorithm**: Start from no edges at all, and insert edges by increasing cost. We only insert those which do not create cycles.

**Prim's Algorithm**: Start from any node s, build up set S by adding node v that minimizes the attachment cost $min_{e=(u,v): u \in S} C_e$ to S. The edges added form a minimum spanning tree.

**Reverse-Delete Algorithm**: Start from the full graph, delete edges by decreasing cost. Only delete those that will not make the graph disconnected.

#### The Analysis

Proof: First a lemma called "Cut Property" saying that the minimum spanning tree of G contains the lowest cost edge e connecting a subgraph S and other parts of G. This is proven by "Exchange Method", showing that  edges of other spanning trees could be exchanged with e and lower its cost. The counterpart of that lemma in "Reverse Delete" is "Cycle Property", that the minimum spanning tree does not contain the highest cost edge e in any cycle C in G. Then we prove that output of Kruskal's and Prim's are spanning trees.

Run Time: Prim's Algorithm is nearly the same as Dijkstra's, which is O(m + n log n + m log n) while using a priority queue. 

#### Appendix: The Union Find Data Structure

Kruskal's Algorithm needs a data structure that could find whether a node is in a connected component and union two connected components. They require the Union Find data structure. 

This efficient implementation of Union Find below gives a pointer to each node, initially pointed to itself. Unioning two components only require us to change one pointer, and we always change the pointer of the component with the lower size to point to the component with the higher size. Find traces the pointer chain, and while done, pointing every node on the path to the root. Union takes O(1), Find takes O(log n). 

Kruskal's Algorithm, implemented by Union Find, takes O(m log n) time.

#### Appendix: Huffman Code

Prefix codes are a function f that maps $x \in S$ to a sequence of 0's and 1's such that if $x \neq y$, then f(x) is not a prefix of f(y). Notice that a binary tree can express prefix codes.

Since an optimal prefix code would let the lowest frequency word to have the longest path, we could build the tree bottom up. Specifically, union the two lowest frequency words into one word (node), and build up the tree by adding a parent to them. Continue until all words are fitted into the tree. The total running time is O(k log k).

## 2 Divide and Conquer

### 2.1 The Master Theorem

In Divide and Conquer, a recurrence relation

\begin{equation*}T(n) = aT(n/b) + f(n)\end{equation*}

is very common. The complexity of T(n) is given by the Master Theorem:

1. Suppose $\exists \epsilon > 0$ such that $ f(n) = O(n^{log_{b}a} - \epsilon )$, then $T(n) = \Theta(n^{log_{b}a})$.

2. Suppose $\exists k \geq 0$ such that $f(n) = \Theta(n^{log_{b}a}log^{k}n) $, then $T(n) = \Theta(n^{log_{b}a}log^{k+1}n)$.

3. Suppose $\exists \epsilon > 0$ such that $f(n) = \Omega(n^{log_{b}a} + \epsilon )$, but $\exists c > 0$ and $n$ such that $af(n/b) \leq cf(n)$, then $T(n) = \Theta(f(n))$.

### 2.2 Problem: Inversion Number

#### The Problem

In a sequence a(i), if i < j but a(i) > a(j), then this is called an inversion. Find the number of inversions in a.

#### Design and Analysis

The process of counting the number of inversions can be done in the same way as merge sorting the list. While merging, maintain two counters and move the one with smaller value. If the one with smaller value is in the second list, then inversion counter should be added by the number of remaining elements in the first list. The run time is O(n log n).
