In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set()
matplotlib.rcParams['figure.dpi'] = 144

# Introduction to Functional Programming Style

Algorithms for distributed computing must work well when data and computation are divided among separate nodes. Since different nodes will process different data, this history of their computation will differ. Many algorithms make use of computation history in some way; in computer science jargon we call such algorithms **stateful**. In distributed computing, we will try to avoid algorithms with output that depends on the **state** of the node executing the code. Distributed computing should be redundant and failure-tolerant, so reproducibility is important.

Functional programming takes the perspective that programs are compositions of stateless functions that operate on immutable data.  This means the output of a function is completely determined by its input.  We will see this can often provide the following benefits:

1. Logic easier to **understand**
1. Code easier to **test**
1. Code easier to **parallelize** (the big one for us)
1. Code can be easier to **prove** correct in a mathematical sense (we won't get much into this, but there is a host of cool math concepts like $\lambda$ calculus related to this idea).

Now of course nothing comes for free and functional programs can sometimes be a bit trickier to write.

## Stateful vs. Stateless Code

Let's consider a very simple example: an object that counts the number of characters in a collection of strings. We implement it in two ways:

1. A counter that maintains an internal count as we pass subsequent containers to its `count` method.
1. A counter that requires a running count to be passed in as an argument to its `count` method.

The `StatefulCounter` will be a full-fledged object, while in contrast we'll see the `StatelessCounter` is essentially just a namespace for a `count` function.

In [None]:
class StatefulCounter(object):

    def __init__(self):
        self.counter = 0

    def count(self, string):
        for _ in string:
            self.counter += 1

        return self.counter

In [None]:
class StatelessCounter(object):

    def count(self, counter, string):
        for _ in string:
            counter += 1

        return counter

In [None]:
strings = ['a' * 12, 'a' * 5, 'a' * 33]
stateful = StatefulCounter()
stateless = StatelessCounter()

In [None]:
for string in strings:
    stateful.count(string)

print(stateful.counter)

In [None]:
count = 0
for string in strings:
    count = stateless.count(count, string)

print(count)

While this example is so simple it may seem trivial, the fact that the stateless counter passes information out of the object to a variable represents the moment that different nodes in a distributed context can perform IO between each other or to nodes that will handle later stages of computation.

Let's consider a slightly more complex example.

In [None]:
import numpy as np

In [None]:
class StatefulLinearRegression(object):
    
    def fit(self, X, y):
        self.coef_ = X.dot(y) / X.dot(X)
    
    def predict(self, X):
        return [self.coef_ * x for x in X]
    
    def score(self, X, y):
        sse = np.sum((self.predict(X) - y)**2)
        return np.sqrt(sse / len(y))

In [None]:
class StatelessLinearRegression(object):
    
    def fit(self, X, y):
        coef = X.dot(y) / X.dot(X)

        def predict(X):
            return [coef * x for x in X]

        def score(X, y):
            sse = np.sum((predict(X) - y)**2)
            return np.sqrt(sse / len(y))

        return predict, score

Now we can generate some random data that we can use our `StateFulLinearRegression` and `StatelessLinearRegression` to fit.

In [None]:
X1 = np.random.uniform(size=100)
y1 = 2 * X1 + np.random.uniform(-0.1, 0.1, 100)

X2 = np.random.uniform(size=100)
y2 = 2 * X2 + np.random.uniform(-0.1, 0.1, 100)

In [None]:
stateful_linreg = StatefulLinearRegression()
stateful_linreg.fit(X1, y1)
stateful_linreg.score(X2, y2)

In [None]:
predict, score = StatelessLinearRegression().fit(X1, y1)
score(X2, y2)

Our stateless linear regression object's `fit` method returns a `predict` and `score` method. In functional programming, functions are first class constructs and are often passed into functions are arguments or are returned by functions. The `fit` method is an example of a **closure**, a technique for binding functions with an environment. The `coef` variable and `predict` and `score` are all defined within the scope of the `fit` method, which effectively means `coef` is distributed as a constant with the returned `predict` and `score` methods. 

**Question**: *What are the advantages and disadvantages of each approach?*

## Decorators 

The ubiquitous Python Decorator is a nice example of some of the paradigms of functional programming, particular a function a first class object. A `Decorator` is a function which operates on other functions generally to add some extra functionality.  For example, lets write a decorator which times operations.

In [None]:
import time
def timeit(func):
    def f_(*args, **kwargs):
        ts = time.time()
        return_ = func(*args, **kwargs)
        print("Elapsed time for {} : {:.2e}".format(func.__name__, time.time() - ts))
        return return_
    return f_

We can use this function to decorate another function, or in other words create a new function which is the output of the decorator applied to the base function.

In [None]:
@timeit
def multiply_random_matrices(shape):
    return np.random.rand(shape, shape) @ np.random.rand(shape, shape)

In [None]:
multiply_random_matrices(10)

Now lets see how Python implements the normal functional operations.

## Map, Filter, and Reduce

While Python is usually considered an object-oriented programming language, it also contains features of a functional language, including the commonplace functions `map`, `filter`, and `reduce` (as of Python 3, `reduce` is now part of `functools` instead of a built-in function). We will often structure distributed jobs as a composition of data transformations and aggregations.

For a transformation, we define a function which transform individual data elements, and `map` that function to the data structure.

In [None]:
text = [
    'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
    'Morbi iaculis egestas leo, in consectetur diam ornare in. Nulla eleifend cursus turpis in luctus.',
    'Nullam accumsan congue hendrerit.'
    ]

In [None]:
def tokenize(text):
    return text.split()

list(map(tokenize, text)) # list to visualize output; map returns generator

In Spark we will we will also encounter `flatMap`, which additionally unpacks the output of the mapped function into a single data structure. Below is an example implementation to illustrate how Spark's `flatMap` works.

In [None]:
def flatMap(f, data):
    return [element for nested in map(f, data) for element in nested]

flatMap(tokenize, text)

Filtering is very similar, except the transformation function should return a Boolean for each data element, which will then be used to filter the data structure.

In [None]:
def vowel_start(word):
    return word[0] in 'aeiou'

In [None]:
print(tokenize(text[0]))
print(list(filter(vowel_start, tokenize(text[0]))))

When aggregating a data structure, we will use `reduce` to apply our aggregating function.

In [None]:
from functools import reduce

In [None]:
def total_length(accumulator, word):
    if isinstance(accumulator, int):
        return accumulator + len(word)
    else:
        return len(accumulator) + len(word)

reduce(total_length, tokenize(text[0]))

Functions used with `reduce` are always functions of two arguments: an argument that acts as an "accumulator" and an argument that iterates through the data structure being reduced. In the language of imperative programming, the above would be:

In [None]:
data_structure = tokenize(text[0])

accumulator = data_structure[0] # accumulator is initialized to first value in data structure
for word in data_structure[1:]:
    if isinstance(accumulator, int):
        accumulator += len(word)
    else:
        accumulator = len(accumulator) + len(word)

accumulator

The `functools` [documentation](https://docs.python.org/3/library/functools.html#functools.reduce) includes an example implementation of `reduce` for general functions.

In some functional languages, `reduce` is called `foldLeft`, highlighting the "folding" of each subsequent element of the data structure, `word`, into `accumulator`. By default in Python, `reduce` initializes the "accumulator" to the first element of the iterable, but in languages (like Scala) with `foldLeft`, we choose the initialization of the accumulator. This can simplify the function being used for reduction, since it avoids any special cases associated with initializing the accumulator.

In [None]:
def total_length(accumulator, word):
    return accumulator + len(word)

reduce(total_length, tokenize(text[0]), 0)

We will see these patterns repeated as we learn Spark's API.  

**Questions**:
- *How are map and reduce useful in distributed computing?*
- *What other operations do we need for functional computing?*

## Anonymous Functions

So far we have been using named functions, but often the overhead of writing an entire function definition is non-optimal both in terms of number of lines of code, but also because we may not want to deal with things such as namespace conflicts.

The general solution is to use **anonymous functions**, sometimes called **lambda functions**, however, Python does not have great support for these tools.  That said they can still be useful for simple things.  Lets first see an example, where we tokenize the text without defining a tokenize function.  We are calling `list(map(...))` so often, we can just make it a function `map_`.

In [None]:
def map_(*args, **kwargs):
    return list(map(*args, **kwargs))

In [None]:
map_(lambda x : x.split(), text)

The keyword here is the `lambda` which then takes arguments, followed by a colon, followed by the return value. We pass around the anonymous function object just like any other function object, even bind it to a name.

In [None]:
new_tokenize = lambda x : x.split()
new_tokenize(text[0])

Especially for little bits of code, anonymous functions can be useful and are used widely in Spark code.  That said, general best practices do suggest using a named function for anything non-trivial and it seems like the Python community is moving further in that direction with the removal of tuple unpacking.

*Copyright &copy; 2019 The Data Incubator.  All rights reserved.*