# Divide & conquer

Divide & conquer is a method for designing algorithms that solve problems by breaking the problem into smaller subproblems and then solving those smaller problems by dividing further until some base case is reached when the problem is small enough. Once the base case is reached we solve that subproblem and then use that solution to solve the larger problem by combining it with solutions to the other smaller subproblems we have solved. Merge sort is a nice example of this that we will go over. As you can probably guess from this description divide & conquer is very closely tied to the concept of recursion.


## Scalar multiplication  

The first problem divide & conquer methods are applied to in {cite:p}`dasgupta2008algorithms` is integer multiplication. 

To multiple two complex numbers $g=a+ib$ and $h=c+id$ we compute

$$
(a+ib)(c+id)= ac-bd + (bc+ad)i
$$

which involves 4 multiplications of real numbers. Carl Gauss, the famous mathematician, discovered that we can actually reduce this to 3 since 

$$
bc+ad = (a+b)(c+d) - ac - bd
$$

our formula now becomes

$$
\begin{align*}
(a+ib)(c+id) &= ac-bd + (bc+ad)i \\
&= ac-bd + ((a+b)(c+d) - ac - bd)i.
\end{align*}
$$

It seems like this involves more multiplication but actually we now just need to compute $ac$, $bd$ and $(a+b)(c+d)$ since $ac$ and $bd$ appear twice in the expression so we have 3 *unique* multiplications of real numbers.


This might seem like minimal improvement but lets see what happens when we apply recursion and switch to integer multiplication. Suppose $y$ and $x$ are $n$-bit integers where $n$ is a power of 2. First lets start by splitting $x$ and $y$ into two halves that are $\frac{n}{2}$ bits each such that 

$$
x = 2^{\frac{n}{2}} x_L + x_R \quad \text{ and } \quad y = 2^{\frac{n}{2}} y_L + y_R.
$$

For example if  $x=10110110_2$ then $x_L=1011_2$ and $x_R=0110_2$ and $x=(2^{\frac{n}{2}} \times 1011_2) + 0110_2$, (note the subscript of 2 means the number is written in base 2 i.e. binary).

```{note}
:class: dropdown
Note that we multiply the left most bits by $2^{n/2}$ since these are the $n/2$ most significant bits since we're using big endian convention. 
As an explicit example for the $8$-bit number 

$$
\begin{align*}
x &= 182_{10}\\
&=10110110_2 \\ 
&= (1\times2^7)+(0\times2^6)+(1\times2^5)+(1\times2^4)+(0\times2^3)+(1\times2^2)+(1\times2^1)+(0\times2^0)
\end{align*}
$$

since the leftmost bits are the most significant we must have $x_L=2^{4}\times1011_2 =2^{\\frac{8}{2}}\left[(1\times2^3)+(0\times2^2)+(1\times2^1)+(1\times2^0)\right]$ if we are to have $x_R=0110_2$ and satisfy the equation $x_L+x_R=x$.
```

Now the product of $x$ and $y$ is given by

$$
\begin{align*}
xy &= (2^{\frac{n}{2}} x_L + x_R)(2^{\frac{n}{2}} y_L + y_R)  \\
   &= 2^nx_Ly_L + 2^{\frac{n}{2}}(x_Ly_R + x_Ry_L) + x_Ry_R
\end{align*}
$$

The addition will take linear time, in the number of bits, and so do will the power of 2 multiplications since it is just a bit shift to the left i.e `a << n` or `a << n//2` in  `Python`. The important operation are the 4 multiplications of the $\frac{n}{2}$-bit numbers. Notice that we have *divided* the problem into 4 subproblems each of which are *half* the size i.e. $\frac{n}{2}$. We can now recursively perform the same routine for the 4 new multiplications which would further divide the problem into subproblem of smaller size. 
We can describe the runtime (time complexity) of these recursive algorithms using *recurrence relations* which we cover in the next section. The recurrence relation for this algorithm is given by

$$
T(n) = 4T\left(\frac{n}{2}\right) + O(n)
$$

we multiply by 4 since we are creating 4 new subproblems and multiple by $T\left(\frac{n}{2}\right)$ since the new subproblems have size $\frac{n}{2}$. The additional $O(n)$ is there to capture the linear time complexity of additions and leftward bit shifts. Solving this recurrence relation for $T(n)$ we get the solution is $T(n)=O(n^2)$. This is the same complexity as the grade-school multiplication method so we have no real improvement with this new recursive algorithm.

We can improve this algorithm by using the insight from Gauss that we laid out earlier that. Since $x_Ly_R + x_Ry_L=(x_L+x_R)(y_L+y_R)-x_Ly_L-x_Ry_R$ we can reduce the 4 multiplications needed down to 3 resulting in a more efficient algorithm. 

The pseudocode and for the algorithm are given below.

```{prf:algorithm} Integer Multiplication
:class: dropdown
:label: gauss-int-mult 
**procedure** $\text{gauss_int_mult}(x,y,n)$:

**Inputs** Given $n$-bit integers $x$ and $y$

**Output** Their integer product $xy$

1. if $n=1$: 
   1. return $xy$
2. $x_L = \text{ leftmost } \lceil \frac{n}{2} \rceil \text{ bits of } x$
3. $x_R = \text{ rightmost } \lfloor \frac{n}{2} \rfloor \text{ bits of } x$
4. $y_L = \text{ leftmost } \lceil \frac{n}{2} \rceil \text{ bits of } y$
5. $y_R = \text{ rightmost } \lfloor \frac{n}{2} \rfloor \text{ bits of } y$
6. $P_1 = \text{fast_int_mult}(x_L,y_L)$ $\quad \triangleright$ Compute $x_Ly_L$
7. $P_2 = \text{fast_int_mult}(x_R,y_R)$ $\quad \triangleright$ Compute $x_Ry_R$
8. $P_3 = \text{fast_int_mult}(x_L+x_R,y_L+y_R)$ $\quad \triangleright$ Compute $(x_L+x_R)(y_L+y_R)$
9. return $2^{n} P_1 + 2^{n}(P_3 - P_1 - P_2) + P_2$

```

The time complexity for this algorithm is given by the recurrence relation

$$
T(n) = 3T\left(\frac{n}{2}\right) + O(n)
$$

which has the solution $T(n) = O(n^{1.59})$ which is a very nice improvement.

## Solving recurrence relations

A recurrence relation, which we'll call recurrence for short from now on, is an equation that describes a function in terms of it's values on other, usually smaller, arguments. To solve these recurrences there are three main methods described below {cite:p}`cormen2022introduction`.

### Substitution

Given a recursion based algorithm we guess the solution and use induction to prove it is correct.


### Recursion-tree 

Given a recursion based algorithm you draw out a the recursion as a tree with with the number of branches at each node being the number of subproblems being created, the depth being the levels of recursion and the nodes of the tree representing the cost of an operation at that level of recursion. By examining the tree you then attempt to write out the expression for the time complexity by summing up the nodes.

### Master method

Given a recursion based algorithm if we can express it's runtime as a recurrence of the form

$$
T(n) = aT\left(\left\lceil\frac{n}{b}\right\rceil\right) + O(n^d)
$$

where $n$ is the size of the initial problem, $a$ is the number of subproblems that we divide the problem into at each recursion level, $\frac{n}{b}$ is the size of the subproblems and $O(n^d)$ is the time needed to combine the solutions of said subproblems into solutions for the larger problems we can use the *master theorem* to solve the recurrence. 

```{prf:theorem} Master Theorem
:class: dropdown
Given the recurrence 

$$
T(n) = aT\left(\left\lceil\frac{n}{b}\right\rceil\right) + O(n^d)
$$

the solution $T(n)$ is given by

$$
T(n) = \begin{cases}
  O(n^d)  & \text{if } d > \log_b(a)\\
  O(n^d\log(n))  & \text{if } d = \log_b(a)\\
  O(n^{\log_b(a)})  & \text{if } d < \log_b(a)\\
\end{cases}
$$
which is very useful. The proof along with the theorem is given in {cite:p}`dasgupta2008algorithms`.
```

## Square matrix multiplication

### Naive matrix multiplication  
Let's go over an example from {cite:p}`cormen2022introduction` now. Let $A$ and $B$ be $n\times n$ matrices. Their product $C=AB$ is given by (recall that this is equivalent to taking the dot product between the rows of $A$ with the columns of $B$)

$$
C_{ij} = \sum_{k=1}^{n} A_{ik}B_{kj}
$$

where $C_{ij}$ is the element of matrix $C$ at row $i$ column $j$. The pseudocode for an algorithm that computes this formula is given below.

From the triply nested for-loops we can tell the runtime of this algorithm will be $\Theta(n^3)$, since for $n$ iterations we are performing $n$ operations $n$ times. 

We can however apply the divide and conquer method to this problem by utilizing *matrix partitioning*. Recall that for two $n\times n$ matrices $A$ and $B$, if $n$ is even, we can partition the matrices as 

$$
A = \begin{bmatrix}
A_{11} & A_{12}\\
A_{21} & A_{22}
\end{bmatrix} 
\text{ and } 
B = \begin{bmatrix}
B_{11} & B_{12}\\
B_{21} & B_{22}
\end{bmatrix} 
$$

where the blocks $A_{ij}$ and $B_{ij}$ are simply four $\frac{n}{2} \times \frac{n}{2}$ square block submatrices. We can now express the matrix product in terms of the standard matrices product formula except this time we treat the block matrices as scalars in a sense

$$
\begin{align*}
AB &= \begin{bmatrix}
A_{11} & A_{12}\\
A_{21} & A_{22}
\end{bmatrix}
\begin{bmatrix}
B_{11} & B_{12}\\
B_{21} & B_{22} 
\end{bmatrix} \\
&= \begin{bmatrix}
A_{11}B_{11} + A_{12}B_{21} &  A_{11}B_{12} + A_{12}B_{22} \\
A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + A_{22}B_{22} 
\end{bmatrix}.
\end{align*}
$$

Now we can begin to see how divide and conquer can be applied. Notice that we now have 4 smaller problems now i.e. computing the 4 expressions 

$$
C_{11} = A_{11}B_{11} + A_{12}B_{21}\\
C_{12} = A_{11}B_{12} + A_{12}B_{22}\\
C_{21} = A_{21}B_{11} + A_{22}B_{21}\\
C_{22} = A_{21}B_{12} + A_{22}B_{22} 
$$

which involve 4 addition and 8 multiplication of *smaller* matrices now i.e. matrices of size $\frac{n}{2}\times \frac{n}{2}$. The divide and conquer algorithm is given below.

```{prf:algorithm} Naive Matrix Multiplication
:class: dropdown
:label: naive-mat-mult 
**procedure** $\text{naive_mat_mult}(A,B)$:

**Inputs** Given two $n\times n$ matrices $A$ and $B$

**Output** Their matrix product $C=AB$

1. Initialize $C$ to an all zeros matrix of size $n\times n$
2. for $i=0$ to $n$:
   1. for $j=0$ to $n$:
      1. for $k=0$ to $n$:
         1. $C[i][j] = C[i][j] + A[i][k] \times B[k][j]$
3. return $C$

```

```{prf:algorithm} D&C Matrix Multiplication
:class: dropdown
:label: d&c-mat-mult 
**procedure** $\text{dc_mat_mult}(A,B,C, n)$:

**Inputs** Given two $n\times n$ matrices $A$ and $B$ and an all zeros matrix of size $n\times n$

**Output** Their matrix product $C=AB$

1. if $n=1$:
   1. $C[0][0] = C[0][0] + A[0][0]\times B[0][0]$ $\quad \triangleright$ These are simply scalars since $n=1$
2. Partition $A$, $B$ and $C$ into the 12 $\frac{n}{2}\times \frac{n}{2}$ matrices $A_{11},A_{21},A_{21},A_{22},B_{11},B_{21},B_{21},B_{22},C_{11},C_{12},C_{21}$ and $C_{22}$ $\quad \triangleright$ Divide step
3. $\text{dc_mat_mult}(A_{11},B_{11},C_{11}, \frac{n}{2})$ $\quad \triangleright$ This computes $A_{11}B_{11}$ from $C_{11} = A_{11}B_{11} + A_{12}B_{21}$
4. $\text{dc_mat_mult}(A_{12},B_{21},C_{11}, \frac{n}{2})$ $\quad \triangleright$ This computes $A_{12}B_{21}$ from $C_{11} = A_{11}B_{11} + A_{12}B_{21}$
5. $\text{dc_mat_mult}(A_{11},B_{12},C_{12}, \frac{n}{2})$
6. $\text{dc_mat_mult}(A_{12},B_{22},C_{12}, \frac{n}{2})$
7. $\text{dc_mat_mult}(A_{21},B_{11},C_{21}, \frac{n}{2})$
8. $\text{dc_mat_mult}(A_{22},B_{21},C_{21}, \frac{n}{2})$
9. $\text{dc_mat_mult}(A_{21},B_{12},C_{22}, \frac{n}{2})$
10. $\text{dc_mat_mult}(A_{22},B_{22},C_{22}, \frac{n}{2})$
```

Now since we divide the problem into 8 subproblems at each recursion (the 8 matrix multiplications of the submatrices) of size $\frac{n}{2}$ and the operation of partitioning the matrices can be made to have complexity $O(1)$ if we just keep track of the indices of the submatrices we are working with, instead of creating a whole new set of matrices in memory each time we partition a matrix into blocks which has complexity $O(n^2)$, we can characterize the runtime of this algorithm with the recurrence

$$
T(n) = 8T\left(\frac{n}{2}\right) + O(1).
$$

Using the master theorem we have the solution of this recurrence is $O(n^3)$ so not an asymptotic improvement over the naive method but a useful example of how to use divide and conquer to solve this problem. There is, however, a divide and conquer algorithm for matrix multiplication, devised by Volker Strassen, that is more clever and has better a runtime.


### Strassen's algorithm

Strassen's algorithm achieves a speed up over both the prior methods by utilizing a clever trick akin to the one found by Gauss for integer multiplication. Specifically the algorithm reduces the number of matrix multiplications that need to be computed at each recursion level from 8 to 7! This is done by computing the product as {cite:p}`dasgupta2008algorithms`

$$
AB = \begin{bmatrix}
P_5 + P_4 - P_2 + P_6 & P_1 + P_2\\
P_3 + P_4 & P_1 + P_5 - P_3 - P_7
\end{bmatrix}
$$

where 

$$
\begin{align*}
&P_1=A_{11}(B_{12} - B_{22}), \qquad P_2=(A_{11} + A_{12})B_{22}, \\
&P_3=(A_{21}+A_{22})B_{11}, \qquad P_4=A_{22}(B_{21}+B_{22}), \\
&P_5=(A_{11}+A_{22})(B_{11}+B_{22}), \qquad P_6=(A_{12}-A_{22})(B_{21}+B_{22})\\
&P_7=(A_{11}-A_{21})(B_{11}+B_{12}).
\end{align*}
$$

The recurrence for this algorithms time complexity is now given by

$$
T(n) = 7T\left(\frac{n}{2}\right) + O(1).
$$

and from the master theorem we get the solution to this recurrence is $T(n)=O(n^{log_{2}(7)}) \approx O(n^{2.807})$. This may not seem like a huge improvement but as the matrix sizes get large e.g. in the millions, which is this very common for computations in fields such as deep learning, this algorithms runtime is about an order of magnitude faster.

## Merge sort

Merge sort is one of the best divide and conquer algorithms for sorting a list of numbers. The idea behind merge sort is you have a list $L$ of numbers and you go about sorting it by first dividing the list into two halves of equal length. We then recursively sort each half of the list and then merge the two sorted lists. The merge sort algorithm is given below.

```{prf:algorithm} Merge sort
:class: dropdown
:label: merge-sort
**procedure** $\text{merge_sort}(L)$:

**Inputs** Given a list $L[1,...,n]$ of length $n$

**Output** The sorted list $L$ in ascending order

1. if $n>1$: $\quad \triangleright$ Divide $L$ into two smaller lists
   1. $L_1  = L\left[1,...,\lfloor\frac{n}{2}\rfloor\right]$ $\quad \triangleright$ Get first $\lfloor\frac{n}{2}\rfloor$ elements 
   2. $L_2  = L\left[\lfloor\frac{n}{2}\rfloor+1,...,n\right]$ $\quad \triangleright$ Get last $\lfloor\frac{n}{2}\rfloor$ elements 
   3. return $\text{merge}( \text{merge_sort}(L_1),  \text{ merge_sort}(L_2) )$
2. else: $\quad \triangleright$ When list has only 1 element return the element
   1. return $L[1]$

```

Now the $\text{merge}$ function is the crucial component here as it defines how we combine the resulting sorted lists. Let's think about how we would combine two sorted (ascending order) lists $L_1$ and $L_2$ into a list $L$. Well since both lists are sorted when now that the first element of the combined list will either be the first element of $L_1$ or the first element of $L_2$. But why are we assuming the lists are sorted? It isn't super clear where the lists are getting sorted in the first place. So the key thing to think about is the case where $\text{merge_sort}$'s base case is reached. When this happens we just return a a list with one element. Once this happens we start to recurse back up the recursion tree. So the $\text{merge}$ function is called with the input lists being single element lists. At that point we just compare the elements in both lists and concatenate them based on which is smaller as is done by the last two conditional statements in the $\text{merge}$ function. After this we move up one level in the recursion tree $\text{merge}$ is called again but this time the lists have length 2. So we can see that once we get to the top of the recursion tree we will have two list whose sizes, $n$ and $m$ respectively, add up to the size adds up to the length of the original input list. Clearly the $\text{merge}$ function compares the first element of $L_1$ with the first element of $L_2$ and then compares the second element of $L_1$ with the second element of $L_2$ and so on. This means $\text{merge}$ does $O(n+m)$ operations meaning the time for the merge step is linear in the size of the two lists. So we take a problem of size $n$ and divide it into two subproblems of size $\frac{n}{2}$ and solve each of these subproblems in time $O(n)$ thus the runtime of merge sort is characterized by the recurrence

$$
T(n) = 2T\left(\frac{n}{2}\right) + O(n)
$$

from the master theorem we have that the solution to this recurrence is $T(n)=O(n\log(n))$ which means we can sort a list in time that is log-linear in the size of the list. Not bad and in fact this is, provably, the best we can do for a *comparison based* sorting algorithm {cite:p}`dasgupta2008algorithms`. See radix sort for a non-comparison based sorting algorithm with complexity $O(n)$ {cite:p}`cormen2022introduction`.

```{prf:algorithm} Merge
:class: dropdown
:label: merge
**procedure** $\text{merge}(L_1, L_2)$:

**Inputs** Given 2 lists $L_1[1,...,n]$ and $L_2[1,...,m]$ of length $n$ and $m$ respectively sorted in ascending order.

**Output** The  combined list $L$ sorted in ascending order

1. if $n=0$:
   1. return $L_2[1,...,m]$
2. if $m=0$:
   1. return $L_1[1,...,n]$
3. if $L_1[1] \leq L_2[1]$:
   1. return $L_1[1] \oplus \text{merge}(L_1[2,...,n], L_2[1,...,m])$
4. else:
   1. return $L_2[1] \oplus \text{merge}(L_2[1,...,n], L_2[2,...,m])$

Note $L_1[1] \oplus \text{merge}(L_1[2,...,n], L_2[1,...,m])$ means we concatenate the list returned from the $\text{merge}$ call to the end of the one element list $L_1[1]$.
```

## Medians

## Fast Fourier transform 

The fast Fourier transform (FFT) is an algorithm that has revolutionized signal processing. {cite:p}`dasgupta2008algorithms` explains how one can 

## Practice problems

In [1]:
%load_ext watermark
%watermark -n -u -v -iv

Last updated: Sun Jul 14 2024

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 8.22.2

