# Functional Programming 

Functional programming in Python - things we will cover:

- map,
- lazy computation,
- lambda functions,
- filter,
- reduce.

Three attributes of functional programs:

1. no side effects,
2. variables don't vary - immutable data,
3. first class functions - no objects and no state.


## Resources

[Functional Programming in Python
By Marcus Sanatan](https://stackabuse.com/functional-programming-in-python/)

[Clean Architecture - Uncle Bob Martin](https://www.amazon.co.uk/Clean-Architecture-Craftsmans-Software-Structure/dp/0134494164)

[What is Functional Programming? - Scott Murphy](https://www.youtube.com/watch?v=KHojnWHemO0)

## What is functional programming?

Three things:

1. no side effects,
2. variables don't vary / immutable data,
3. first class functions - no objects and no state,

Examples of functional programming languages include Lisp, Haskell, Erlang, Clojure.

Also a useful idea when thinking about implementations

[John Carmack (on Parallel Implementations)](http://sevangelatos.com/john-carmack-on-parallel-implementations/) - it is eaiser to switch out implementations if ideas are expressed as pure functions.  Internal state & multiple entry points make programming harder.

## 1. No side effects

- same inputs -> same outputs - always,
- idempotent - things always run / perform the same way,
- no dependency on the state of the outside world.

Not doing something like:

In [None]:
def pipeline(data):
    new_data = clean(data)
    database.save(new_data)
    database.load(features)
    return features

## 2. Variables that don't vary - immutable data

Variables are only ever initialized:

- they are never changed,
- this avoids problems such as race conditions / deadlocks.

Variables being immutable means we can't do:

In [None]:
#  can't do this - mutates the object in place
def f(x):
    x += 1
    return x

#  can do this - creates a new object
def f(x):
    x = x + 1
    return x

An example of immutable data is the reading & writing of the input & output of each stage in a data pipeline to storage (commonly S3).

## 3. Functions are first class

We can pass functions around like other variables (also known as higher order functions).

Below we pass the `sum` function into a generic `controller` function - which just executes the function it is given:

In [None]:
def controller(func, data):
    return func(data)

data = [1, 2, 3]
controller(sum, data)

Passing in another function (`len`) gives different results - the length of our data:

In [None]:
controller(len, data)

## Map

Apply a function to each element of an iterable - similar to `df.apply` in pandas and to a `for` loop:

In [None]:
def lower(s):
    return s.lower()

cities = ['Berlin', 'Auckland', 'London', 'Sheffield']
m = map(lower, cities)
m

Python returns a map object (technically a generator?), not the transformed data.  This is an example of **lazy computation**, which is a two step process:

1. build a pipeline/graph
2. put data through it when needed

Examples include Tensorflow 1, Spark, Python generators, Prefect flows.

As we are more impatient than lazy, we can get all the processed data by calling `list` on the generator:

In [None]:
list(map(lower, cities))

In [None]:
cities

## Lambda functions

Lambda functions are anonymous - meaning the function is not assigned to a variable. 

In Python we can have objects with no variable reference (until they get garbage collected :)

We can do our same `.lower` map using an anonymous lambda function:

In [None]:
list(map(lambda x: x.lower(), cities))

In the object above we define a lambda function:

In [None]:
lambda x: x.lower()

We can do more complex things in lambdas, such as accessing elements of the input data:

In [None]:
populations = [
    ('Berlin', 3.7, 'eu'),
    ('Auckland', 1.7, 'pac'),
    ('London', 8.9, 'eu'),
    ('Sheffield', 0.5, 'eu')
]

list(map(lambda x: (x[0], x[1] * 1000), populations))

We have total flexibility in **what data structure** we use in the iterable, and **how we interact with it** in the lambda.  We have decomposed and separated the data from the behaviour!

We could map over a sequence of namedtuples, and access elements using the attribute `.` syntax:

In [None]:
from collections import namedtuple

pop = namedtuple('city', ['city', 'population', 'continent'])

populations = [pop(*p) for p in populations]

list(map(lambda x: (x.city, x.population * 1000), populations))

## Reduce

Reduce will aggregate a sequence to single values (either a single value for the entire sequence, or one single value per group).  Also known as aggregation or a groupby.

The Python standard library has it's own reduce function - inconveniently (and unlike `map` or `sum`) it is not a builtin  - it is hidden away in `functools.reduce`.  This object operates on each element and accumulates, returning a single aggregated example.

We can use this `reduce` function on our `populations` dataset:

- define a lambda function that adds on the population of the sequence to the total,
- our sequence populations,
- an initial value of `0`.

We can first map this out in normal Python - iterating over a list and incrementing a float:

In [None]:
total = 0
for p in populations:
    total += p.population
total

The equivilant of the above using `functools.reduce` is:

In [None]:
from functools import reduce

reduce(lambda total, pop: total + pop[1], populations, 0)

In [None]:
reduce?

We can also use this `reduce` function to groupby continent:

In [None]:
def gb(acc, city):
    acc[city.continent].append(city.city)
    return acc

reduce(
    gb,
    populations,
    {'eu': [], 'pac': []}
)

## Filter

Tests each element, keeps those that pass - similar to boolean masking in pandas/numpy:

In [None]:
list(filter(lambda x: x[1] > 1.0, populations))

## Practical

Create a data processing pipeline that selects the cities that have populations greater than the average of all cities.

Two steps:
1. `reduce` to find the average of all cities,
2. `filter` to select using cities above that average.

This can be done in a single reduce step - suggest first getting the two step solution working :)

## Practical

Implement the same pipeline using two list comprehensions:

## Question

In Python, any combination of `map` & `filter` can be done with a single list comprehension - why do we need to use two in the example above?

## Practical

Create a data processing pipeline that finds the average population for both continents:
1. reduce to (key, (populations)),
2. map from (key, (populations)) to (key, avg).