## recitation-05

In this lab, we'll continue working with sequence functions and sorting.

The sorting algorithms we've discussed so far all work by *comparing* numbers (e.g., `merge_sort`, `insertion_sort`, `selection_sort`). Today, we'll look at an algorithm that sorts without making any pairwise comparisons.

The algorithm is particularly suited for sorting lists with the following properties:
- elements are non-negative integers
- the maximum element is not too big
- many elements are repeated

For example:

$[2,2,1,0,1,0,1,3] \rightarrow [0,0,1,1,1,2,2,3]$

In addition to the input list of length $n$, the algorithm also takes as input the maximum value in the list ($k$). E.g., $k=3$ in the above example.

The algorithm proceeds by first counting how often each value appears. Based on these counts, the algorithm figures out the range of output locations to place each value. For example, because there are two 0s and three 1s, we know that the first 1 goes in the 3rd position, and the final 1 goes in the 5th position.

1. Implement a simple, sequential version of `count_values` (linear span and work) and test it with `test_count_values`.

.  
.  

2. Continuing our example, `counts` should now be `[2, 3, 2, 1]`. We next need to convert this into a list indicating the location of the first appearance of each value in the output.

E.g., `positions=[0, 2, 5, 7]` means that, in the final output, the first 0 appears at index 0, the first 1 appears at index 2, etc.

We can use `scan` to create the needed list. You may need to adjust slightly the output of scan to get the needed list. Complete `get_positions` and test with `test_get_positions`.

.  
.  
. 


3. What is the work and span of `get_positions`? (assume our more efficient version of `scan` from class)

> $W(n) \in O(k)$, $S(n) \in O(\lg k)$

.  
.  
. 


4. Finally, we'll use this positions array to construct the final output. First, we'll create the output list ($n$ elements). Then, we will loop through the original input array once again. For each value, we'll look up in the `positions` array where the value should go. E.g., for the first value 2, we look up `positions[2]`, which tells us the 2 should go in index 5 in the output. To update `counts` for future iterations, we will then increment `counts` by one for the value we just read. E.g., `positions[2]` will increment from 5 to 6; the next 2 we read will be placed in index 6.

Implement `construct_output` with a simple for loop and test with `test_construct_output`.

.  
.  
. 


5. What is the work and span of `construct_output`?

> $O(n)$ work and $O(n)$ span

.  
.  
. 


6. What is the work and span of `supersort`?

> $O(n+k)$ work and $O(n+\lg k)$ span

.  
.  
. 


7. Our implementation of `count_values` has poor span. Let's instead implement it using map-reduce. Complete `count_map`, `count_reduce`, which are used by `count_values_mr` to construct the `counts` variable using map-reduce. Test with `test_count_values_mr`.

.  
.  
. 


8. What is work and span of `count_values_mr`?

> map: $W(n) \in O(n)$ $~~~~S(n) \in O(1)$

> reduce: $W(n,k) \in O(n)$ $~~~~S(n,k) \in O(\lg n)$

> total: $W(n,k) \in O(n)$ $~~~~S(n,k)\in O(\lg n)$

.  
.  
. 




In [None]:
from collections import defaultdict

def supersort(a, k):
    """
    The main sorting algorithm. You'll complete the
    three funcions count_values, get_positions, and construct_output.
    
    Params:
      a.....the input list
      k.....the maximum element in a
      
    Returns:
      sorted version a
    """
    counts = count_values(a, k)            # W=n, S=n
    # counts = count_values_mr(a, k)         # map: W=n S=1  reduce: W=n S=lg n  total: W=n S=lg n
    positions = get_positions(counts)      # W=k, S=lgk
    return construct_output(a, positions)  # W=n+, S=n

def count_values(a, k):
    """
    Params:
      a.....input list
      k.....maximum value in a
      
    Returns:
      a list of k values; element i of the list indicates
      how often value i appears in a
      
    >>> count_values([2,2,1,0,1,0,1,3], 3)
    [2, 3, 2, 1]
    """
    ###TODO
    counts = [0] * (k+1)
    for v in a:
        counts[v] += 1
    return counts
    ###

def test_count_values():
    assert count_values([2,2,1,0,1,0,1,3], 3) == [2, 3, 2, 1]
    
def get_positions(counts):
    """
    >>> get_positions([2, 3, 2, 1])
    [0, 2, 5, 7]    
    """
    ###TODO
    history, _ = scan(plus, 0, counts)    
    # convert [2,5,7,8] to [0,2,5,7]
    return [0] + history[:-1]
    ###
    
def test_get_positions():
    assert get_positions([2, 3, 2, 1]) == [0, 2, 5, 7]
    
def construct_output(a, positions):
    """
    Construct the final, sorted output.

    Params:
      a...........input list
      positions...list of first location of each value in the output.
      
    Returns:
      sorted version of a
    """
    ###TODO
    output = [0] * len(a)
    for v in a:
        output[positions[v]] = v
        positions[v] += 1        # this can cause race conditions if done in parallel
    return output
    ###

def test_construct_output():
    assert construct_output([2,2,1,0,1,0,1,3], [0, 2, 5, 7]) == [0,0,1,1,1,2,2,3]
    
def count_values_mr(a, k):
    """
    use map-reduce to implement count_values
    """
    # done.
    int2count = dict(run_map_reduce(count_map, count_reduce, a))
    return [int2count.get(i,0) for i in range(k+1)]

def count_map(value):
    ###TODO
    return [(value, 1)]
    ###

def count_reduce(group):
    ###TODO
    return (group[0], reduce(plus, 0, group[1]))
    ###


def test_count_values_mr():
    assert count_values_mr([2,2,1,0,1,0,1,3], 3) == [2, 3, 2, 1]



def run_map_reduce(map_f, reduce_f, mylist):
    # done. 
    pairs = flatten(list(map(map_f, mylist)))
    groups = collect(pairs)
    return [reduce_f(g) for g in groups]

def collect(pairs):
    # done.     
    result = defaultdict(list)
    for pair in sorted(pairs):
        result[pair[0]].append(pair[1])
    return list(result.items())

def plus(x,y):
    # done. 
    return x + y


def scan(f, id_, a):
    # done. 
    return (
            [reduce(f, id_, a[:i+1]) for i in range(len(a))],
             reduce(f, id_, a)
           )

def reduce(f, id_, a):
    # done. do not change me.
    if len(a) == 0:
        return id_
    elif len(a) == 1:
        return a[0]
    else:
        return f(reduce(f, id_, a[:len(a)//2]),
                 reduce(f, id_, a[len(a)//2:]))
    
def iterate(f, x, a):
    # done. do not change me.
    if len(a) == 0:
        return x
    else:
        return iterate(f, f(x, a[0]), a[1:])
    
def flatten(sequences):
    return iterate(plus, [], sequences)

# supersort([9,5,10,5,5,1,2,3,3,6,6,6,8,10,6,6,6,1,2,2], 10)

# supersort([2,2,1,0,1,0,1,3,6], 6)

# test_count_values()
# test_get_positions()
# test_construct_output()

count_values_mr([2,2,1,0,1,0,1,3], 3)


[2, 3, 2, 1]