# Chapter 8: Merge Sort and Quick Sort

**Inverted file**: data structure that allows a search engine to quickly return a list of the documents that contain a given keyword. Basically a lookup table that matches words to the documents containing these words. Keyword-document pairs are used to track relationships

## Merge Sort

**Divide and Conquer**: algorithm paradigm that consists of the following three steps:

1. **divide**: if the input size is smaller than a certain threshold, solve the problem directly using a straightforward method and return the solution so obtained. Otherwise, divide the input data into two or more disjoint subsets
2. **recur**: recursively solve the sub-problems associated with the subsets
3. **conquer**: take the solutions to the sub-problems and merge them into a solution to the original problem 

Merge sort applies divide and conquer to the sorting problem. Since the size of the input sequence roughly halves at each recursive call, the height of the merge-sort tree is about $logn$: 

![merge-sort-tree](./res/08-merge-sort-tree.PNG)

The merge algorithm is as follows:

![merge](./res/08-merge.PNG)

This algorithm has a running time of $O(n_1 + n_2)$, where $n_1$ is the size of $S_1$ and $n_2$ is the size of $S_2$

In merge sort, the time spent at all nodes of T at depth i is $O(2^i * \frac{n}{2^i})$, which is equal to $O(n)$. Therefore, merge sort has a total running time of $O(nlogn)$

## Quick Sort

Quick sort consists of the following steps:

- **divide**: if sequence S has at least two elements, select a specific element x from S which is called the pivot. A common practice is to choose x to be the last element in S. Remove all the elements from S and put them into three sequences:
  - L, storing elements less than x
  - E, storing elements equal to x
  - G, storing elements greater than x
- **recur**: recursively sort sequences L and G
- **conquer**: put the elements back into S in order by first inserting the elements of L, then those of E, and finally those of G

The height of this algorithm's tree is linear in the worst case ($n-1$): 

![quick-sort-tree](./res/08-quick-sort-tree.PNG)

The default quick sort has a running time of $O(n^2)$, but this may be improved upon via variations of the algorithm

### Randomized Quick Sort

This algorithm chooses a random element as the pivot. The goal of this is to divide S almost equally. It has a running time of $O(nlogn)$ with high probability

### In-place Quick Sort

Unlike merge sort, quick sort may be performed in-place:

![in-place-quick-sort](./res/08-in-place-quick-sort.PNG)

This algorithm uses only $O(logn)$ additional space

## Lower Bound on Comparison-based Sorting

The running time of any comparison-based sorting algorithm for sorting an n-element sequence is $\Omega(nlogn)$ in the worst case

Each of the possible initial permutations of a sequence are different from one another. The number of permutations of n objects is:

$$n! = n(n-1)(n-2) ... 2*1$$

As a result of this, the algorithm's tree T must have at least n! external nodes and therefore a height of at least $log(n!)$. This means that there are at least n/2 terms that are greater than or equal to n/2 in the product of n!, hence:

$$log(n!) \geq log(\frac{n}{2})^{\frac{n}{2}} = \frac{n}{2}log\frac{n}{2}$$

This is equal to $\Omega(nlogn)$

