# MapReduce By Hand
The input to a MapReduce job is just a set of (input_key,input_value) pairs, which we’ll implement as a Python dictionary. In the wordcount example, the input keys will be the filenames of the files we’re interested in counting words in, and the corresponding input values will be the contents of those files:

In [None]:
filenames = ["/data/cs2300/examples/t1.txt","/data/cs2300/examples/t2.txt","/data/cs2300/examples/t3.txt"]
i = {}
for filename in filenames:
    f = open(filename)
    i[filename] = f.read()
    f.close()

Next we define our own Map function

In [None]:
import string

def mapper(input_key,input_value):
    return [(word,1) for word in remove_punctuation(input_value.lower()).split()]

def reducer(intermediate_key,intermediate_value_list):
    return (intermediate_key,sum(intermediate_value_list))

def remove_punctuation(s):
    return s.translate(str.maketrans("", "",string.punctuation))

In [None]:
 mapper("t1.txt",i["/data/cs2300/examples/t1.txt"])

In [None]:
import itertools

def map_reduce(i,mapper,reducer):
    intermediate = []
    for (key,value) in i.items():
        intermediate.extend(mapper(key,value))
    groups = {}
    for key, group in itertools.groupby(sorted(intermediate), lambda x: x[0]):
        groups[key] = list([y for x, y in group])
    return [reducer(intermediate_key,groups[intermediate_key]) for intermediate_key in groups] 

In [None]:
print(map_reduce(i,mapper,reducer))

Identify 5 well-known algorithms from this list: https://en.wikipedia.org/wiki/List_of_algorithms that are likely to be a good fit for the MapReduce model


## Challenge 1
In the following cell(s), try different text files in the /data/cs2300/examples directory and do some benchmarking to see how this approach scales!

## Challenge 2
In the following cell(s), use Pandas to accomplish the same task as Challenge 1 and compare the performance.  

## Challenge 3
In the following cell(s), see if you can use this mapper and reducer to solve the problem from Lab 3/4!