In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})
import time


# CMPS 2200
# Introduction to Algorithms

## Recurrences (Example)
 

### Tree Method [Summary]

$$ \begin{equation}
W(n) = \begin{cases}
  \mathcal{O}(c_b), & \text{if $n=c$} \\
  \alpha W(\frac{n}{\beta}) + \mathcal{O}(f(n)) \leq  \alpha W(\frac{n}{\beta}) + c_1f(n) + c_2, & \text{otherwise} 
  \end{cases}
\end{equation}$$


$$ \begin{equation}
S(n) = \begin{cases}
  \mathcal{O}(c_b), & \text{if $n=c$} \\
  S(\frac{n}{\beta}) + \mathcal{O}(f_1(n)) \leq  S(\frac{n}{\beta}) + c_3f_1(n) + c_4, & \text{otherwise} 
  \end{cases}
\end{equation}$$


> Step 1. What is the **input size** on level $i$? 

$$\frac{n}{\beta^i}$$

>Step 2. What is the **cost** of each node on level $i$? 

$$c_1f(\frac{n}{\beta^i})+c_2$$

>Step 3. How many nodes are there on level $i$? 

$$\alpha^i$$


>Step 4. What is the total cost across the level $i$?

$$\alpha^i\big(c_1f(\frac{n}{\beta^i})+c_2\big)$$

>Step 5. How many levels are there in the tree [**Tree Height/Depth**]? Note that Span is positively correlated with Tree Depth.

$$\frac{n}{\beta^i} = c~~~ \Rightarrow ~~~{i = \log_{\beta}n} ~~(c==1)$$

 

>Step 6. What is the total cost? 

$$\sum_{i=0}^{\log_{\beta}n}\alpha^i\big(c_1f(\frac{n}{\beta^i})+c_2\big)$$

Note that the leave level has $\alpha^{\log_\beta n}$ nodes, or equivalently, $n^{\log_\beta \alpha}$.




### Brick method [Summary]


- **Root dominated**: cost *decreases* geometrically as we descend the tree
  - $W(n) \in O(\hbox{root})$
  - e.g., $W(n) = 2 W(\frac{n}{2}) + n^2$


- **Leaf dominated**: cost *increases* geometrically as we descend the tree
  - $W(n) \in O(\hbox{leaves})$
  - e.g., $W(n) = 2 W(\frac{n}{2}) + \sqrt{n}$
  
  
- **Balanced**: neither of the above is true
  - $W(n) \in O(\hbox{(num levels)*(max cost at any level)})$
  - e.g., $W(n) = W(n-1) + n \in O(n^2)$
  - e.g., $W(n) = 2 W(\frac{n}{2}) + n \in O(n \lg n)$

More examples (some trickier than others): **Pay Attention to Base Case of Recurrence 3/4**

$$ W(n) = W(n - 1) + n $$ 

$$S(n) = S(n/2)+\lg n$$

$$ W(n) = \sqrt{n} W(\sqrt{n}) + n^2 $$

$$ W(n) = W(\sqrt{n}) + W(n/2) + n $$

$$ W(n) = W(n/2) + W(n/3) + 1 $$



$$ W(n) = W(n - 1) + n $$

This is actually a balanced recurrence since every level has the asymptotic same cost and there are $n$ levels - the recurrence is $O(n^2)$. 

$$ S(n) = S(n/2) + \lg n $$

This is actually a balanced recurrence since every level has the asymptotic same cost and there are $n$ levels - the recurrence is $O(\lg^2 n)$. 

$$ W(n) = \sqrt{n} W(\sqrt{n}) + n^2 $$

This is root-dominated so it is $O(n^2)$.

$$ W(n) = W(\sqrt{n}) + W(n/2) + n $$

This is root-dominateed so it is $O(n)$.

$$ W(n) = W(n/2) + W(n/3) + 1 $$

This recurrence is a little tricky - while it is leaf-dominated we need to calculate the number of leaves.



### Integer multiplication

Now that we've come up with a general technique for solving recurrences, let's look at a recursive algorithm. You learned this algorithm in elementary school for integer multiplication:

- Input: $n$ bit integers $x= \langle x_{n-1}, \ldots, x_0\rangle$ and $y = \langle y_{n-1}, \ldots, y_0\rangle$

- Output: $x \cdot y$

- Example: '1001'$\times$'1101'


In [5]:
def int2binary(n):
    return list('{0:b}'.format(n))
 
int2binary(9)

['1', '0', '0', '1']

In [6]:
nine = int2binary(9)
print(nine)
thirteen = int2binary(13)
print(thirteen)  

['1', '0', '0', '1']
['1', '1', '0', '1']


```
       1001   
     x 1101   
     ======
       1001   
      0000    
     1001     
  + 1001      
  =========
    1110101   (117)
```

In [7]:
def binary2int(n): 
    return int(n, 2)
binary2int('1110101')

117

What is the work of the "elementary school" algorithm?

- For two $n$-digit inputs, the work is $O(n^2)$, since for each digit of $x$ we might add a stack of $n$ bits. The total number of bits in the solution is at most $2n$.

What does this have to do with recursion and recurrences?

Instead of the elementary school algorithm, consider splitting each $n$-digit input in half. Can we multiply recursively?

Let $x = \langle x_L, x_R\rangle$, $y = \langle y_L, y_R\rangle$. Then,

$\begin{align} 
x &=& 2^{n/2} x_L + x_R \:\:\:\:\:\: \hbox{e.g.,} \: 1001:  2^2 (10) + (01)\\
y &=& 2^{n/2} y_L + y_R \:\:\:\:\:\: \hbox{e.g.,} \: 1101:  2^2 (11) + (01)\\
\end{align}
$

<br><br>

**Wait...Is multiplying by $2^{n/2}$ efficient?**
https://en.wikipedia.org/wiki/Bitwise_operation
<br><br>

Recall: $2^2 [10] \rightarrow [1000] \:\:$ (shift two places to the left).

<br>

So then,

$\begin{align}
x\cdot y &=& (2^{n/2} x_L + x_R)(2^{n/2} y_L + y_R) \\
 &=& 2^n (x_L \cdot y_L) + 2^{n/2} (x_L \cdot y_R + x_R \cdot y_L) + (x_R \cdot y_R) \\
\end{align}
$

<br>

We've converted one multiplication of sizes $(n,n)$ into four multiplications of size $(\frac{n}{2}, \frac{n}{2})$.

What recursive algorithm, and recurrence is suggested by this observation?

>$W(n) = 4W(n/2) + cn$

What is the solution to this recurrence using the brick method? Is it root-dominated, or leaf-dominated?

### work of recursive multiplication

$C(\hbox{root}) = n$

$C(\hbox{level}\:1) = 4(\frac{n}{2})= 2 \cdot n$

geometrically **increasing** as we descend the recurrence tree.

A recurrence is **leaf-dominated** if for all $v$, there is an $\alpha > 1$ such that:

$$C(v) \leq \frac{1}{\alpha} \sum_{u \in D(v)} C(u)$$

let $\alpha \leftarrow 2$

$n \le \frac{1}{2}\cdot 2 \cdot n$

<br>

The cost of a leaf dominated recurrence is $O(L)$, where $L$ is the number of leaves.

### how many leaf nodes are there?

<br><br><br>


nodes per level: $1, 4, 64, \ldots 4^i \ldots 4^{\log_2 n}=(2\cdot2)^{\log_2 n}=n\cdot n = n^2$

> This is a leaf-dominated reucrrence that is $O(n^2)$ -- the same as the elementary school algorithm!

Now, what is the span of this algorithm if implemented in parallel?


### span of recursive multiplication

Assuming each multiplication can be done in parallel, and that addition has span $O(n)$, we get that 

$$S(n) = S(n/2) + cn$$ 

which yields a span of ...


<br><br><br><br>

**brick method**

$S(\hbox{root})=n$


$S(\hbox{level 1}) = \frac{n}{2}$

$\rightarrow$ **root dominated** 

$S(n) \in O(n)$. This is much better than the span of the grade school algorithm, which is $O(n^2)$!

<br>

**What parallelism is achieved?**

<br>

Parallelism $= \frac{W}{S} = \frac{n^2}{n} = n$

Can we do any better?

### recursive multiplication with less work

Recall that 

$\begin{align}
x\cdot y &=& (2^{n/2} x_L + x_R)(2^{n/2} y_L + y_R) \\
 &=& 2^n (x_L \cdot y_L) + 2^{n/2} (x_L \cdot y_R + x_R \cdot y_L) + (x_R \cdot y_R) \\
\end{align}
$

<br>

Can we reduce this from 4 multiplications to 3??

<br><br><br><br>

Observation:
    
$\begin{align} 
(x_L + x_R)\cdot (y_L + y_R)=(x_L\cdot y_L) + (x_L\cdot y_R) + (x_R\cdot y_L) + (x_R\cdot y_R)\\
\end{align}
$

$\begin{align}
(x_L\cdot y_R) + (x_R\cdot y_L) = (x_L + x_R)\cdot (y_L + y_R) - (x_L\cdot y_L) - (x_R\cdot y_R)\\
\end{align}$




How does our observation help us? 

If we calculate $(x_L\cdot y_L)$, $(x_R\cdot y_R)$, and $(x_L + x_R)\cdot (y_L + y_R)$, that is *three* recursive multiplications. 



So with 3 recursive multiplications and two more "additions", we then get that $W(n) = 3W(n/2) + dn$. What is the running time?

### work of $W(n) = 3W(n/2) + dn$

**brick method**

$C(\hbox{root}) = n$

$C(\hbox{level 1}) = \frac{3n}{2}$


<br><br><br><br>

$\frac{3}{2} > 1 \Rightarrow$ **leaf dominated**.

But, there are fewer leaves this time. Why?

<br><br><br><br>

nodes per level: $1, 3, 9, \ldots 3^i \ldots 3^{\lg n}=n^{\lg 3} \:\:\:\: (\hbox{by}\: a^{\log_b c} = c^{\log_b a})$


<br><br>


Using the brick method, this is still a leaf-dominated recurrence, and thus the running time is $O(n^{\log_2 3}) \:\: $ (approximately $O(n^{1.58}$) versus of $O(n^2)$ earlier).

<br>

This is known as the [**Karatsaba-Ofman**](https://en.wikipedia.org/wiki/Karatsuba_algorithm) algorithm (1962), and is the earliest known divide-and-conquer algorithm for integer multiplication. It is actually implemented in Intel hardware!


<br><br>

So, our we've decreased work from $O(n^2)$ to $O(n^{\log_2 3})$. 

Span stays the same: $O(n)\:\:$  Why?

<br><br>

Parallelism $= \frac{W}{S} = \frac{n^{\log_2 3}}{n} \approx n^{.58} < n$

<br>

So, our parallelism went down. Is that good or bad?

<br><br>


Schönhage and Strassen (1971) gave an algorithm that runs in $O(n\log n \log\log n)$ time.

In 2007, [Fürer gave an algorithm](https://ivv5hpp.uni-muenster.de/u/cl/WS2007-8/mult.pdf) that runs in $n \log n 2^{O(\log^* n)}$.

What is the fastest possible sequential algorithm for integer multiplication? In parallel?
