The goal of this notebook is to demonstrate how data is passed from map to filter when we chain together operations.  For example, how is data passed from map to filter in the following:  `rdd.map(square).filter(less_than_30)`.  Start out by defining custom square and filter functions.

In [1]:
import numpy as np

In [2]:
def square(el):
    return el * el
    
def less_than_30(el):
    if el < 30:
        return True
    else:
        return False

In the next cell, we define our own rdd class and define map and filter methods.

In [3]:
# your code here
# define our own simple as possible rdd class
def times_3(x):
    return x*3 

def is_even(x):
    if x%2 == 0:
        return True
    else:
        return False
    
def my_sum(x1,x2):
    return x1+x2

class rdd:
    def __init__(self, arg_array):
        self.local_array = arg_array
    
    # define our own map function
    def my_map(self, fun):
        if len(self.local_array) == 0:
            return self
        
        # create a new numpy array to hold the result
        # remember that rdd's are supposed to be immutable
        mapped = np.empty_like(self.local_array)
        
        # for each element in the local array
        for i, el in enumerate(self.local_array):
            # execute the user defined function on this element
            mapped[i] = fun(self.local_array[i])
        
        # This is the key to how the data is passed.  The return value is another rdd object!
        # The fact that we are returning a new rdd is the key to how chaining works
        # create a new rdd using the array created by applying the user defined function
        return rdd(mapped)
    
    def my_filter(self, fun):
        # create a temporary list to store the filtered results
        transformed = []

        for el in self.local_array:
            if fun(el):
                transformed.append(el)
                
        return rdd(np.array(transformed))
        
    def collect(self):
        return self.local_array

In [4]:
# create a my_rdd variable with a short list of numbers
my_rdd = rdd(np.array([2,4,6,8]))

# execute map and filter
# Note that my_rdd.my_map(square) returns a temporary rdd.  We then execute the my_filter function on 
# the temporary rdd returned by my_map.
print("map only:\n", my_rdd.my_map(square).collect())
print("map and filter:\n", my_rdd.my_map(square).my_filter(less_than_30).collect())

map only:
 [ 4 16 36 64]
map and filter:
 [ 4 16]
