<a href="https://colab.research.google.com/github/Ananassio/Big-Data-Analytics/blob/main/Week_1/Assignment_MapReduce.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MapReduce with Python Functional Programming

#### Python *map* function
Python has a built in *map* function -> [see python docs](https://docs.python.org/3/library/functions.html#map)
* ``map(function, iterable object)`` applies a function to every member of the object (=data structure)
* iterable objects are e.g. lists, dicts, arrays, ... but also custom data structures (see [here](https://thispointer.com/python-how-to-make-a-class-iterable-create-iterator-class-for-it/))

In [1]:
#example
def Plus1(a):
    return a+1

A = [1,2,3,4]
print(A)
B = list(map(Plus1,A)) #need to cast map output to list
print(B)

[1, 2, 3, 4]
[2, 3, 4, 5]


In [2]:
#example 2 - map with an function that takes arguments
from functools import partial

def PlusX(a,x):
    return a+x


A = [1,2,3,4]
print(A)
B = list(map(partial(PlusX,x=2),A)) #use partial to fix parameters 
print(B)

C = [1,1,3,3]
D = list(map(PlusX,A,C)) #or input multiple iterable objects
print(D)

[1, 2, 3, 4]
[3, 4, 5, 6]
[2, 3, 6, 7]


In [3]:
#example 3 - map with lmbda functions
A = [1,2,3,4]
B = list(map(lambda x:x+1,A ))#implement function directly with lambda
print(B)

[2, 3, 4, 5]


In [4]:
#example 4 - Numpy has map "build in"
import numpy as np
A = np.random.rand(10,10)*20
A

array([[9.59419974e+00, 5.44121767e+00, 1.08023428e+01, 1.67568387e+01,
        1.37951909e+01, 4.51400449e+00, 1.09653530e+01, 1.14971327e+01,
        1.00588135e+01, 1.40751610e+01],
       [4.45612463e+00, 1.39344554e+01, 4.79969789e+00, 1.89232195e+01,
        4.93826773e-01, 1.85932389e+01, 1.99873287e+01, 1.93646417e+00,
        1.81544113e+01, 1.12238887e-01],
       [3.47519918e+00, 6.69212792e+00, 9.87426670e+00, 1.89970272e+01,
        1.13054589e+01, 1.59878197e+01, 1.96410080e+01, 1.07116351e+01,
        5.36024887e+00, 3.66519995e+00],
       [2.82328245e+00, 1.71355867e+01, 9.93248825e+00, 1.31822323e+01,
        5.55876188e+00, 6.46144385e+00, 1.17909428e+00, 7.39181088e+00,
        1.02491095e+01, 1.91753789e+01],
       [9.38275373e+00, 9.73044112e+00, 1.61635645e+01, 1.64990174e+01,
        1.38905133e+01, 2.64150848e+00, 2.24379763e+00, 1.27278247e+01,
        3.56249351e+00, 5.55021875e+00],
       [1.68757790e+01, 5.91568710e-01, 1.43259491e+01, 4.48379366e+00,
   

In [5]:
#apply function directly on each element of an array
def isLarger10(x):
    return x>10

B = isLarger10(A)

In [6]:
B

array([[False, False,  True,  True,  True, False,  True,  True,  True,
         True],
       [False,  True, False,  True, False,  True,  True, False,  True,
        False],
       [False, False, False,  True,  True,  True,  True,  True, False,
        False],
       [False,  True, False,  True, False, False, False, False,  True,
         True],
       [False, False,  True,  True,  True, False, False,  True, False,
        False],
       [ True, False,  True, False,  True, False,  True,  True, False,
         True],
       [ True,  True, False,  True,  True,  True, False, False,  True,
        False],
       [ True, False, False, False,  True,  True,  True,  True,  True,
         True],
       [False,  True, False, False,  True,  True, False,  True,  True,
        False],
       [False,  True, False,  True,  True, False, False, False,  True,
        False]])

#### *Reduce* in Python
*functools* also provides a *reduce* function. Again, it will take a function and one ore more iterable objects as arguments. (see [API](https://docs.python.org/3/library/functools.html#functools.reduce))

In [7]:
# importing functools for reduce() 
import functools 
  
# initializing list 
lis = [ 1 , 3, 5, 6, 2, ] 
  
def addIt(a,b):
    return a+b

# using reduce to compute sum of list 
print ("The sum of the list elements is : ",end="") 
print (functools.reduce(addIt,lis)) 
  
# using reduce to compute maximum element from list 
print ("The maximum element of the list is : ",end="") 
print (functools.reduce(lambda a,b : a if a > b else b,lis)) 

The sum of the list elements is : 17
The maximum element of the list is : 6


#### Splitting

In [8]:
import more_itertools as mit

A=[1,2,3,4,5,6,7,8,9]
B=list(mit.chunked(A, 3)) #split into lists of max size 3

for i in B: #iterate over the spitts 
    print(i)

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]


In [9]:
A='this is a short text in form of a python string'
B=list(mit.chunked(A, 5)) #split into lists of max size 5

for i in B: #iterate over the spitts 
    print(i)

['t', 'h', 'i', 's', ' ']
['i', 's', ' ', 'a', ' ']
['s', 'h', 'o', 'r', 't']
[' ', 't', 'e', 'x', 't']
[' ', 'i', 'n', ' ', 'f']
['o', 'r', 'm', ' ', 'o']
['f', ' ', 'a', ' ', 'p']
['y', 't', 'h', 'o', 'n']
[' ', 's', 't', 'r', 'i']
['n', 'g']


## Exercise: build a simple *Character Count* Algorithm based on the above *split, map* and *reduce* operators 

In [10]:
#some text from NYTimes
text = ' Byron Spencer, handing out water and burgers to protesters outside Los Angeles City Hall, said he was both “elated and defeated” by word of the new charges. He said he had seen countless surges of outrage over police brutality against black men, only to have it happen again. “I’m 55, I’m black and I’m male — I’ve seen the cycle,” he said. “It’s almost like PTSD constantly having this conversation with my son.” Cierra Sesay reacted to the charges at a demonstration in the shadow of the State Capitol in Denver. “It’s amazing, it’s another box we can check,” she said. “But it goes up so much higher. It’s about the system.” In San Francisco, Tevita Tomasi — who is of Polynesian descent and described himself as “dark and tall and big” — said he regularly faced racial profiling, evidence of the bigger forces that must be overcome. On Wednesday, he distributed bottled water at what he said was his first demonstration, but one that would not be his last. What would stop him from protesting?'

* HINT: use list pf *python* [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) to store the character counts in the map
* HINT 2: merge the dicts in reduce

In [56]:
split=list(mit.chunked(text, len(text))) 


In [67]:
def IsALetter(a,b):
    if str(b) not in ' ':
      c = 1
    else:
      c = 0
    a = a+c
    return  a

In [68]:
print (functools.reduce(IsALetter,text)) 

TypeError: ignored

In [64]:
a = list('sa')

a[0]
if a[0] not in ' ':
  a = 1
else:
  a = 0
print(a)

1


In [35]:
IsALetter('d',0)

1

In [38]:
split[1].index

<function list.index>