# The Maximum Cut Problem

Given an undirected graph $G=(V,E)$ 

Return a cut $(A,B)$ where $A$ and $B$ are non empty, that maximises the number of crossing edges

This is a $NP$-Complete problem

## A Local Search Algorithm

For a cut $(A,B)$ and a vertex $v$, define
$$
C_v(A,B) = \# \; \text{edges incident on v that cross}\; (A, B) \\
D_v(A,B) = \# \; \text{edges incident on v do not that cross}\; (A, B)
$$

Let $(A,B)$ be an arbitrary cut of $G$.

While there is a vertex $v$ with $D_v(A,B) > C_v(A,B)$, move $v$ to the other side of the cut. This will increase the crossing number by
$$
D_v(A,B) - C_v(A,B)
$$

The output of this approach is garunteed to be at least $50\%$ of the number of edges, which will be at least $50\%$ of the maximum possible crossing number.


### Proof of Performance Gaurantee

Let $(A,B)$ be a locally optimal cut.

Then for every vertex $v$, 
$$
D_v(A,B) \leq C_v(A,B)
$$
Summing over all $v \in V$
$$
\sum_{v \in V}{D_v(A,B)} \leq \sum_{v \in V}{C_v(A,B)}
$$

Note that the left sum counts each non-crossing edge twice, and the left sum counts each crossing edge twice. Therefore
$$
2 \times [\# \;\text{of non-crossing edges}] \leq 2 \times [\# \;\text{of crossing edges}] \\[10pt]
\implies 2 \times \lvert E \rvert \leq 4 \times [\# \;\text{of crossing edges}] \\[10pt]
\implies [\# \;\text{of crossing edges}] \geq 2 \times \lvert E \rvert
$$


## Weighted Maximum Cut

We can assign to each edge a nonnegative weight, and seek to find a maximum weights of crossing edges.

Note that we can still
1. Use the local search idea
2. Retain the $50\%$ performance gaurantee 

However, we can no longer gaurantee polynomial time complexity as the weights can take on an exponential possible number of values. 

# Principles of Local Search

Let $X$ be a set of candidate solutions to a problem. For each $x \in X$ we can specify which $y \in X$ are its "neighbours"

For example, 

1. in the maximum cut problem
$$
x, y \; \text{are neighbours} \iff \text{differ by moving one vertex}
$$

2. in the TSP problem 
$$
x, y \; \text{are neighbours} \iff \text{differ by 2 edges}
$$

A Generic Local search algorithm then starts from a particular state and moves through neighbours in the direction of improving the solution.

In pseudocode,
```
x = some initial solution

While x has superior neighbour y
    x := y

Return x

```

## FAQ

Question: How to pick initial solution $x$?

Answer: 

1. We can use a random solution and then run many independent trails of local search, returning the best locally optimal solution

2. Use your best heuristics, then apply local search as a post-processing step to improve the solution

Question: If there are several superior neighbouring $y$, which to choose?

Answer:

1. Choose $y$ at random
2. Go to the best $y$
3. more complex heuristics

Question: How to define neighbourhoods?

Answer: In general a larger neighbourhood leads to a slower time whilst a smaller neighbourhood results in fewer "bad" local solutions. Aim to find a sweet spot between the two.

For questions with regards to general time complexity see [smoothed analysis](https://en.wikipedia.org/wiki/Smoothed_analysis)

# The 2-SAT Problem

Given,
1. an input of $n$ boolean variables $x_1, x_2, x_3, \cdots, x_n$, 
2. $m$ clauses of 2 literals each

Determine if there exists an assingment that simiultaneously satisfies every clause

## (In)Tractability of SAT

2-SAT can be solved in polynomail time
- reduction to computing strongly connected components
- "backtracking" works in polynomial time
- randomised local search

3-SAT however is $NP$-complete
- brute-force search $\approx 2^n$ time
- randomised local search $\approx (\frac{4}{3})^n$ time

## Papadimitriou's Algorithm

For $n$ variables,

Repeat $\log_{2}{n}$ times:

1. Choose random initial assignment
2. repeat $2n^2$ times:
    - if current assignment satisfied all clauses, return "solution exists"
    - else, pick arbitrary unsatisfied clause and flip the value of one of its variables

Return "no solution"

This algorithm will always be correct for unsatisfiable instances, however there will always be a chance for a false negative.

## Random Walks

Consider a random walk on the positive integers starting at $0$. We seek to analyse the expected value of the number of steps required to reach position $n$.

Let $z_i = $ number of random steps to get to $n$ from $i$

We know that
1. $E[z_n] = 0$
2. $E[z_0] = 1 + E[z_1]$

Further for $i \in \{1, 2, 3, \cdots, n-1\}$
$$
E[z_i] = P(\text{go left}) \times E[z_i | \text{go left}] + P(\text{go right}) \times E[z_i | \text{go right}] \\[10pt]
\implies E[z_i] = \frac{1}{2} \times (1 + E[z_{i-1}]) + \frac{1}{2} \times (1 + E[z_{i+1}]) \\[10pt]
\implies E[z_i] - E[z_{i+1}] = E[z_{i-1}] - E[z_i] +2
$$

Therefore since
$$
E[z_0] - E[z_1] = 1 \\[10pt]
\implies E[z_1] - E[z_2] = 3
$$
This gives,
$$
\begin{aligned}
E[z_0] - E[z_1] &= 1 \\
E[z_1] - E[z_2] &= 3 \\ 
E[z_2] - E[z_3] &= 5 \\ 
E[z_3] - E[z_4] &= 7 \\
\vdots \\
E[z_{n-1}] - E[z_n] &= 2n-1
\end{aligned} \\[10pt]
\begin{aligned}
\therefore E[z_0] &= 1 + 3 + 5 + \cdots + 2n-1 \\[10pt]
&= 2n \times \frac{n}{2} = n^2
\end{aligned}
$$

We will use a collorary for the analysis of Papadimitrou's algorithm.
$$
p[T_n > 2n^2] \leq 1/2
$$

Let $p = p[T_n > 2n^2]$. Since
$$
E[T_n] = n^2 \\[10pt]
\implies n^2 = \sum_{k=0}^{2n^2}{(k \times P[T_n = k])} + \sum_{2n^2 +1}^{\infty}{(k \times P[T_n = k])} \\[10pt]
\geq 2n^2 \times p[T_n > 2n^2]
$$

Since the first sum is strictly positive.

Therefore we have
$$
n^2 \geq 2n^2 \times p[T_n > 2n^2] \\[10pt]
p[T_n > 2n^2] \leq 1/2
$$

## Analysing Papadimitriou's Algorithm

For a satisfiable 2-SAT instance with $n$ variables, Papadimitrious algorithm produces a satisfying assingment with probability $\geq 1 - \frac{1}{n}$

First Focus on a sinlge iteration of the outer for loop. 

Fix an arbitrary satisfiying assignment $a^*$. Let $a_t$ be the algorithms assingment after $t$ inner iterations where $t \in [0, 2n^2]$.

Let $X_t$ be the number of variables on which $a^*$ and $a_t$ agree.

Suppose $a_t$ is not a satisfying assingment and the algorithm picks the clause with variables $x_i$ and $x_j$. Since $a^*$ is satisfying it makes a different assingment to $x_j$ or $x_j$ or both.

The algorithm will then randomly flip one of $x_i$ or $x_j$
1. if $a^*$ and $a_t$ differ on both $x_i$ and $x_j$, then
$$
X_{t+1} = X_t + 1
$$

2. if $a^*$ and $a_t$ differ on exactly one of $x_i$, $x_j$ then
$$
X_{t+1} =
\begin{cases} 
X_t +1 & 50\% \text{probability} \\
X_t -1 & 50\% \text{probability}
\end{cases}
$$

There! a random walk looking thing appears. 

Applying the collorary,
$$
\begin{aligned}
P[\text{algorithm fails}] &\leq P[\text{all independant trails fails}] \\
&\leq \left(\frac{1}{2} \right)^{\log_{2}{n}} \\
&= \frac{1}{n}
\end{aligned}
$$

# Stable Matchings

Consider two nodes sets $U$ and $V$.

For simplicity assume $\lvert U \rvert = \lvert V \rvert = n$

Each node has a ranked order of the nodes on the other side.

The goal is to return a perfect matching (bijection) such that if $u \in U$ and $v \in V$ are not matched, then either $u$ likes its mate $v^\prime$ better than $v$ or $v$ likes it mate $u^\prime$ better than $u$

## Gale-Shapley Proposal Algorithm

While there is an unattached man $u$:

- $u$ proposes to the top woman $v$ on his preference list who hasn't rejected him yet

- each woman entertains only the best proposal recieved so far

Theorem: this algorithm terminates with a stable matching after $\leq n^2$ iterations

Proof:
1. Each man makes $\leq n$ proposals $\implies$ $\leq n^2$ iterations
2. terminates with a perfect matching.

Assume that there exists a man who has been rejected by all women (no perfect matching), 

$\implies$ all $n$ women engaged at conclusion of the algorithm

$\implies$ all $n$ men are engaged as well which is a contradiction

3. terminates with a stable matching.

Consider some $u, v$ not matched to each other. There are two cases

Case 1: $u$ never proposed to $v$

$\implies$ $u$ matched to someone he prefers over $v$

Case 2: $u$ proposed to $v$

$\implies$ $v$ recieved a preferable offer and ends up with someone she prefers to $u$

# Bipartite Matching

A bipartite graph $G = (U, V, E)$ where $U$ and $V$ are sets of nodes and $E$ is the set of edges and for each $e \in E$ it has one endpoint in $U$ and one endpoint in $V$.

The goal is to compute a matching $M \subseteq E$ of maximum size.

This is a polynomial solvable problem and follows as a reduction from the maximum flow problem.

# Maximum Flow Problem

Given a directed graph $G = (V,E)$, a source vertex $s$ and sink vertex $t$, and a capacity $u_e$ for each edge $e \in E$. 

Compute the $s \rightarrow t$ path that sends as much "flow" as possible, respecting the conservation of flow.

This is solvable in polynomial time via greedy algorithms based on "augmenting paths".

## Selfish Flow

Given a flow network, we can define a delay function on each edge such that we are able to compute travel time as a function of edge load.

See Braess' Paradox. I think i've seen a steve mould video on strings and springs on this.

# Linear Programming

Generally we aim to optimise a linear function over the intersection of halfspaces (linear ineqaulities).

This generalizes the maximum flow problem and a ton of other problems. We can solve linear programs efficiently.

Extensions: 
1. Conver programming - polynomail time solvable under mild conditions
2. Integer programming - $NP$ hard in general

# Optional Theory Problems

## 1

Prove that in graphs with positive integer edge weights, the local search algorithm for the maximum cut problem is not guaranteed to converge in a polynomial number of iterations.

# Programming Assingment

## 1

In this assignment you will implement one or more algorithms for the 2SAT problem.

Here are 6 different 2SAT instances:

Week 4 PA/

1. 2sat1.txt
2. 2sat2.txt
3. 2sat3.txt
4. 2sat4.txt
5. 2sat5.txt
6. 2sat6.txt  

In each instance, the number of variables and the number of clauses is the same, and this number is specified on the first line of the file. 

Each subsequent line specifies a clause via its two literals, with a number denoting the variable and a "-" sign denoting logical "not". 

For example, the second line of the first data file is "-16808 75250", which indicates the clause $\neg x_{16808} \vee x_{75250}$.

Your task is to determine which of the 6 instances are satisfiable, and which are unsatisfiable.

In the box below, enter a 6-bit string, where the ith bit should be 1 if the $i^{th}$ instance is satisfiable, and 0 otherwise. For example, if you think that the first 3 instances are satisfiable and the last 3 are not, then you should enter the string 111000 in the box below.

DISCUSSION: 

This assignment is deliberately open-ended, and you can implement whichever 2SAT algorithm you want. For example, 2SAT reduces to computing the strongly connected components of a suitable graph (with two vertices per variable and two directed edges per clause, you should think through the details). This might be an especially attractive option for those of you who coded up an SCC algorithm in Part 2 of this specialization. Alternatively, you can use Papadimitriou's randomized local search algorithm. (The algorithm from lecture is probably too slow as stated, so you might want to make one or more simple modifications to it --- even if this means breaking the analysis given in lecture --- to ensure that it runs in a reasonable amount of time.) A third approach is via backtracking.  In lecture we mentioned this approach only in passing; see Chapter 9 of the Dasgupta-Papadimitriou-Vazirani book, for example, for more details.

## Solution

We will implement the SCC solution. We use the equivilence between $a \vee b$ and
$$
\neg a \implies b \wedge \neg b \implies a
$$
That is if one of the variables in false then the other variable must be true.

Lets consider the first few entries of 2sat1.txt
$$
\neg x_{16808} \vee x_{75250} \\[10pt]

x_{43659} \vee x_{8931} \\[10pt]

\neg x_{27545} \vee \neg x_{50879} \\[10pt]
$$

This means that we have the implications
$$
x_{16808} \implies x_{75250} \quad \neg x_{75250} \implies \neg x_{16808} \\[10pt]

\neg x_{43659} \implies x_{8931} \quad \neg x_{8931} \implies x_{43659} \\[10pt]

x_{27545} \implies \neg x_{50879} \quad x_{50879} \implies \neg x_{27545}
$$

We can then create a graph with nodes whos names are $x_i$ and $\neg x_i$, who's directed edges are the implications as listed above.

The solubility of the 2-SAT problem is then reduced to computing strongly connected components. 

If we are able to find for any $x_i$ a path from $x_i \rightarrow \neg x_i$, and a path from $\neg x_i \rightarrow x_i$ which would be a contradiction. 

Then once we find all SCCs we will be able to solve the 2-SAT problem as each SCC will be assigned the same boolean value. We can sort the SCCs in topological order and assign $x$ with $\text{false}$ if it's SCC preceded $\neg x$ and $\text{true}$ otherwise. Since
$$
\begin{array} {c|c|c}
    a & b & a \implies b \\ 
    \hline \text{T} & \text{T} & \text{T} \\ 
    \hline \text{T} & \text{F} & \text{F} \\
    \hline \text{F} & \text{T} & \text{T} \\
    \hline \text{F} & \text{F} & \text{T} \\
\end{array}
$$

In [54]:
class Stack:
    def __init__(self):
        self.arr = []
        self.length = 0
        return
    
    def __iter__(self):
        return self

    def __next__(self):
        if self.length == 0:
            raise StopIteration
        self.length -= 1
        return self.arr.pop()
        
    def push(self, elm):
        self.length += 1
        self.arr.append(elm)
        return
        
    def get(self):
        return self.arr[self.length-1]
     
    def pop(self):
        self.length -= 1
        return self.arr.pop()

In [55]:
def load_data(filename: str):
    with open(filename, 'r') as f:

        num_nodes = int(next(f))
        
        g = {i: [] for i in range(-num_nodes, num_nodes+1) if i != 0}
        g_rv = {i: [] for i in range(-num_nodes, num_nodes+1) if i != 0}
        
        for line in f:
            items = line.split(" ")

            # process a disjunction
            a = int(items[0])
            b = int(items[1])
            
            g[-a].append(b)
            g[-b].append(a)
            
            g_rv[b].append(-a)
            g_rv[a].append(-b)

    return g, g_rv, num_nodes

In [56]:
print_cycles = 1000

In [102]:
def dfs1_loop(num_nodes: int, g_rv: dict[int: list[int]]):
    f_order = Stack()
    
    explored = {i: False for i in range(-num_nodes, num_nodes+1) if i != 0}
    num_explored = 0
    cycle = 0
    
    for i in range(-num_nodes, num_nodes+1):
        if i == 0: 
            continue

        if explored[i] == False:
            
            nodeStack = Stack()
            nodeStack.push(i)
            
            while nodeStack.length > 0:
                
                if num_explored // print_cycles > cycle:
                    cycle += 1
                    completion = (num_explored / (2*num_nodes)) * 100
                    print(f'\t dfs 1 progress: {format(completion, ".2f")}%, nodes: {num_explored}', end='\r')
                
                node = nodeStack.get()

                if explored[node] == False:
                    explored[node] = True
                    num_explored += 1
                
                has_unexplored_children = False
                
                for child in g_rv[node]:
                    if explored[child] == False:
                        has_unexplored_children = True

                        explored[child] = True
                        num_explored += 1
                        nodeStack.push(child)
                
                if not has_unexplored_children:
                    f_order.push(nodeStack.pop())

    completion = (num_explored / (2*num_nodes)) * 100
    print(f'\t dfs 1 progress: {format(completion, ".2f")}%, nodes: {num_explored}', end='\n')
    
    return f_order

def dfs2_loop(num_nodes: int, g: dict[int, list[int]], f_order: Stack):
    
    explored = {i: False for i in range(-num_nodes, num_nodes+1) if i != 0}
    num_explored = 0
    cycle = 0
    
    leaders: dict[int, list[int]] = {}
    
    for source in f_order:
        if explored[source] == False:
        
            nodeStack = Stack()
            nodeStack.push(source)
            
            while nodeStack.length > 0:
                
                if num_explored // print_cycles > cycle:
                    cycle += 1
                    completion = (num_explored / (2*num_nodes)) * 100
                    print(f'\t dfs 2 progress: {format(completion, ".2f")}%, nodes: {num_explored}', end='\r')
                
                node = nodeStack.get()

                if explored[node] == False:
                    explored[node] = True
                    num_explored += 1
                
                has_unexplored_children = False
                
                for child in g[node]:
                    if explored[child] == False:
                        has_unexplored_children = True

                        explored[child] = True
                        num_explored += 1
                        nodeStack.push(child)
                    
                if not has_unexplored_children:
                    leaders[nodeStack.pop()] = source
                
    completion = (num_explored / (2*num_nodes)) * 100
    print(f'\t dfs 2 progress: {format(completion, ".2f")}%, nodes: {num_explored}', end='\n')
    
    return leaders

def solve_2SAT(g: dict[int, list[int]], g_rv: dict[int, list[int]], num_nodes: int):
    f_order = dfs1_loop(num_nodes, g_rv)
    leaders = dfs2_loop(num_nodes, g, f_order)
    
    for i in range(1, num_nodes+1):
        if leaders[i] == leaders[-i]:
            return False
    return True


In [103]:
g, g_rv, num_nodes = load_data('Week 4 PA/2sat4.txt')
solve_2SAT(g, g_rv, num_nodes)

	 dfs 1 progress: 100.00%, nodes: 1200000
	 dfs 2 progress: 100.00%, nodes: 1200000


True

In [104]:
import os

def solve_all():

    grading = {}
    for file in os.listdir('Week 4 PA'):
        print(f"solving: {file} ---->")
        g, g_rv, num_nodes = load_data(os.path.join('Week 4 PA', file))
        passed = solve_2SAT(g, g_rv, num_nodes)
        grading[file] = passed
    
    return grading

solve_all()

solving: 2sat1.txt ---->
	 dfs 1 progress: 100.00%, nodes: 200000
	 dfs 2 progress: 100.00%, nodes: 200000
solving: 2sat2.txt ---->
	 dfs 1 progress: 100.00%, nodes: 400000
	 dfs 2 progress: 100.00%, nodes: 400000
solving: 2sat3.txt ---->
	 dfs 1 progress: 100.00%, nodes: 800000
	 dfs 2 progress: 100.00%, nodes: 800000
solving: 2sat4.txt ---->
	 dfs 1 progress: 100.00%, nodes: 1200000
	 dfs 2 progress: 100.00%, nodes: 1200000
solving: 2sat5.txt ---->
	 dfs 1 progress: 100.00%, nodes: 1600000
	 dfs 2 progress: 100.00%, nodes: 1600000
solving: 2sat6.txt ---->
	 dfs 1 progress: 100.00%, nodes: 2000000
	 dfs 2 progress: 100.00%, nodes: 2000000


{'2sat1.txt': True,
 '2sat2.txt': False,
 '2sat3.txt': True,
 '2sat4.txt': True,
 '2sat5.txt': False,
 '2sat6.txt': False}