# Lesson 1 Week 2
#### by Sheng BI

In [3]:
unsorted = []
with open("_bcb5c6658381416d19b01bfc1d3993b5_IntegerArray.txt") as file:
    for line in file:
        unsorted.append(int(line.replace('\n', '')))
# len(unsorted)

# Reference:  https://www.cp.eng.chula.ac.th/~piak/teaching/algo/algo2008/count-inv.htm
inversions_count = 0

def merge_sort_and_count(left,right):
    global inversions_count
    arr_sorted = []
    i=0
    j=0
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            arr_sorted.append(left[i])
            i = i + 1
        else:
            arr_sorted.append(right[j])
            j = j + 1
            inversions_count = inversions_count + len(left) - i
    arr_sorted = arr_sorted + left[i:]
    arr_sorted = arr_sorted + right[j:]
    # print(arr_sorted)
    return arr_sorted, inversions_count

def divide_and_merge_leaves(input):
    if len(input) < 2:
        return input, 1 
    m = len(input) // 2 # Python 3.
    A_sorted, A_count = divide_and_merge_leaves(input[:m])
    B_sorted, B_count = divide_and_merge_leaves(input[m:])
    
    L_sorted, r_count  = merge_sort_and_count(A_sorted,B_sorted)
    
    return L_sorted, r_count

print("Inversions count: {}".format(divide_and_merge_leaves(unsorted)[1]))

Inversions count: 2407905288


### Video 1 ~ Video 2

#### Divide and conquer reminder
- DIVIDE into smaller subproblems
- CONQUER via recursive calls
- COMBINE solutions of subproblems into one for the original problem.

#### Problem: Count inversions
- Number of inversions = number of pairs $(i,j)$ of array indices with $i<j$ and $A[i]>A[j]$.
- Other understandings:
    - write the sorted and unsorted in two rows and count the crosses.
    - numerical similarity between two lists (preferences).
- largest-possible number of inversions is $C_{n}^{2}$
- Brute-force takes $O(n^{2})$ times.
- Three types of inversions
    - Left inversion $i,j \leq \frac{n}{2}$
    - right inversion $i,j > \frac{n}{2}$
    - split if $i\leq \frac{n}{2} < j$
- Concepts:
```
Count(arr, length n)
    if n == 1: return 0
    else
        x = Count(1st half of arr, n/2)
        y = Count(2nd half of arr, n/2)
        z = CountSplitInv(arr, n)
    return x + y + z
Goal: implement CountSplitInv in linear O(n)
```
- Concepts version 2:
```
SortAndCount(arr, length n)
    if n == 1: return 0
    else
        B, x = SortAndCount(1st half of arr, n/2)
        C, y = SortAndCount(2nd half of arr, n/2)
        D, z = MergeAndCountSplitInv(arr, n)
    return x + y + z
Goal: implement MergeAndCountSplitInv in linear O(n)
B,C,D are sorted subarrays
```
- Pseudocode
```
D = output[length = n]
B = 1st half of arr
C = 2nd half of arr
i = 1
j = 1
for k = 1 to n
    if B[i] < C[j]
        D[k] = B[i]
        i++
    else B[i] > C[j]
        D[k] = C[j]
        j++
end
```
- When merging, whenever we copy an element from the second array to the output, we discover inversions.
- **Claim**: the split inversions involving an element $C[j]$ of the 2nd array are precisely the numbers left in the 1st array $B$, when $C[j]$ is copied to the output array $D$
- merging takes $O(n)$ time
- Sort and Count takes $O(nlog(n))$ times

### Strassen Video 3 

- original matrice multiplication takes $O(n^{3})$
- according to Divide and Conquer, it seems that we should decompose the originam matrice into smaller ones, each of size $(\frac{n}{2}, \frac{n}{2})$
- But simple decomposition still yields $O(n^{3})$ of running time.
- Strassen
    - recursiviely compute only 7 products.
    - do the necessary addition and subtractions.

### Closed-pair Problem, Video 4~Video 5
- All points have distinct $x-$ and $y-$ coordinates
    - Brute force takes $\theta(n^{2})$ (even in 1-D case)
- 1-D version of closest Pair, if we use merge and sort
    - Step 1: (merge) Sort points $O(nlog(n))$
    - Step 2: scan and return closest pair of adjacent points $O(n)$ time
- 2-D concepts
    ```
    Make copies of points sorted
    by x-coordinate and
    by y-coordinate
    [O(nlog(n))time]
    
    Use Divide and Conquer.
    ```
#### 2-D Pseudcode
0. sort by $x$ coordinate
1. Let $Q$ = left half of $P$, $R$ = right half of $P$.
form $Q_{x}$, $Q_{y}$, $R_{x}$, $R_{y}$ (all sorted, according to $x$ or $y$)
2. ($p_{1}, q_{1}$) = Closest pair($Q_{x}$, $Q_{y}$) 
3. ($p_{2}, q_{2}$) = Closest pair($R_{x}$, $R_{y}$) 
4. let $\delta = min(d(p_{1}, q_{1}), d(p_{2}, q_{2}))$
5. ($p_{3}, q_{3}$) = ClosestSplitPair($P_{x}, P_{y}, \delta$)
6. return best of ($p_{1}, q_{1}$), ($p_{2}, q_{2}$), ($p_{3}, q_{3}$)

<span> </span>

- recursive calls take $O(nlog(n))$ time
- outside recursive calls, it takes $O(n)$ time for ClosestSplitPair.

#### ClosestSplitPair
- Let $\bar{x}$ be the biggest x-coordinate in left half of sorted $P$
- Let $S_{y}$ be points of $P$ with x-coordinate in $[\bar{x} - \delta, \bar{x} + \delta]$ sorted by y-coordinate.
```
initialized best = \delta, bestpair = NULL
for i = 1 to |S_{y}| - 1:
    for j = 1 to min(7, |S_{y}| - 1):
        let p, q = ith, (i+j)th points of S_{y}
        if d(p,q) < best
            bestPair = (p,q), best = d(p,q)
```
- The nested loop only takes constant time!
- **Claim**: Let $p \in Q$, $q \in R$ be a split pair with $d(p,q) < \delta$, then
    - $p$ and $q$ are members of $S_{y}$
    - $p$ and $q$ are at most 7 positions apart in $S_{y}$
- If the above claim is correct, the running time is $O(nlog(n))$
- Key elements in proof
    - Since $d(p,q)<\delta$, $|x_{1} - x_{2}| < \delta$ and $|y_{1} - y_{2}| < \delta$.
    - any two points are at most 7 positions apart in $S_{y}$.
        - divide $S_{y}$ into 8 boxes of size $\delta/2 \times \delta/2 $
    - All points of $S_{y}$ with y-coordinate between those of $p$ and $q$ lie in one of those 8 boxes.
    - At most one point in each of 8 boxes
        - proof by contradiction, using definition of $Q$, $R$ and $\delta$

### Master Method ... Video 6 ~ Video 11
- $T(n) = O(n^{d}\times log(n))$ ------- in case $ a = b^{d}$
- $T(n) = O(n^{d})$ -------------------- in case $ a < b^{d}$
- $T(n) = O(n^{log_{b}(d)})$ -------------- in case $ a > b^{d}$

Examples: 
- Binary search
    - $a = 1, b=2, d=0$

#### Proof
- At each level $j = 0, 1, 2, ..., log_{b}n$, there are $a^{j} subproblems, each of size \frac {n}{b^{j}}$
    - recall that $log_{b}n$ is the number of times you can divide $n$ by $b$.
- total work at level $j$ done $\leq a^{j} \times c \times [\frac{n}{b^{j}}]^{d} = c n^{d} [\frac{a}{b^{d}}]^{j}$
- total work done $\leq c n^{d} \times \Sigma_{j=0}^{log_{b}(n)}[\frac{a}{b^{d}}]^{j}$
- $a^{log_{b}n}$ is the number of leaves of the recursioin tree!

#### Interpretation
- two opposing forces
    - the evil: $a$ = number of recursive calls, rate of subproblem proliferation
    - the good: $b$ = rate of work shrinkage (per subproblem)
    - the greater the rate, the smaller the work load
- intuition
    - if RSP == RWS, same amount of work each level (like merge sort), we expect $O(n^{d}log(n))$
    - if RSP < RWS, less work each level, most work at the root; might expect $O(n^{d})$
    - if RSP > RWS, more work each level, most work at the leaves; might expect $O(# of leaves)$