Gentle introduction to Functional Programming

What?

Programming paradigm based on lambda calculus (Alonzo Church)
functions are first class citizens
- higher-order functions
- partial application
- currying
Functions are pure
- immutable data: no state
- encapsulated side effects
lazy sequences

You shall NOT have

Assignments
Mutable data structures
While/For Loops
Control Over Order of Execution
Side Effects

Why?

Give me a reason

threading sucks
OOP does not compose
write better code
it’s the next big paradigm shift (and it’s already happening)
you are already doing it

OOP is dead

Object-oriented programming is both anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum.

Robert Harper (professor at Carniage Mellon University)

What is a function

$f(x) = 2 x + 1$

output depends only on the input
no side effect -> pure

$f(4) = 9$

referential transparency: a function call can be always replaced by its result

Side effects

Side effect:

def adder(x, y):
    res = x + y
    open('output.txt', 'w').write(str(res))
    return res

Non referential transparent:

SETTINGS = {'counter': 1}

def increment(inc_value):
    return SETTINGS['counter'] + inc_value

Domains

Functional languages

Haskell
Clojure
Scala
F#
Erlang
Elixir
Elm

Haskell (1990)

pure
lazy
pattern matching
algebraic data types
type inference

Fibonacci

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

Fibonacci stream

Or better:

-- zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
fibs:: [Int]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

-- get the 10th fibonacci number from
fib :: Int -> Int
fib n = fibs !! n

Which in Python is roughly:

first_fibs = [0, 1, 2, 3, 5]

fibs = [0, 1] + list(map(sum, zip(first_fibs, first_fibs[1:])))
# Out[5]: [0, 1, 1, 2, 3, 5, 8]

Quicksort

Pseudocode:

quicksort(A, lo, hi)
    if lo < hi
        p := pivot(A, lo, hi)
        left, right := partition(A, p, lo, hi)
        quicksort(A, lo, left)
        quicksort(A, right, hi)

Haskell:

quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) =
    let smallerSorted = quicksort [a | a <- xs, a <= x]
        biggerSorted = quicksort [a | a <- xs, a > x]

    in  smallerSorted ++ [x] ++ biggerSorted

Python vs Haskell

✓ functions first class citizens
✓ lazy sequences (kind of)
❌ immutability (partial)
❌ Algebraic data types
❌ side effects encapsulation
❌ type inference

Screwed?

Functions

function
lambda
classmethod
method
staticmethod

Demo time

Partial application

functools.partial

import functools, operator

def mul(x, y):
    return x * y

mul(3, 10) # 30

mulby3 = functools.partial(operator.mul, 3)
mulby3(10) # 30

Equivalent to:

def mulby3(x):
    return mul(x, 3)

Lazy sequences

generators
itertools

import itertools

def fib_gen():
    a, b = 0, 1
    yield a
    while True:
        yield b
        a, b = b, a + b

for num in itertools.islice(fib_gen(), 10, 100):
    print(num)

Map

Standard for loop:

lis = [1, 2, 3, 4]

newlis = []
for l in lis:
    newlis.append(l * 2)

List comprehension:

[l * 2 for l in lis]

Look mum no loop:

import operator
mul_by_two = functools.partial(operator.mul, 2)
map(mul_by_two, lis)

Reduce

Loopy:

lis = [1, 2, 3, 4]

val = 0
for l in lis:
    val += 0

Not loopy:

val = functools.reduce(operator.add, lis)

Immutability and toolz

Toolz is a collection of utility functions inspired from FP languages.

import toolz

bills = {
    "Alice": 0,
    "Bob": 1,
}

MUTABLE:

def change_inline(bills):
    for key, val in bills.items():
        bills[key] = val + 1

IMMUTABLE:

def change_immutable(dic):
    inc = functools.partial(operator.add, 1)
    return toolz.valmap(inc, dic)

Other toolz functions

assoc	dissoc	itemmap	itemfilter
merge	merge_with	valfilter	valmap
partition	groupby	juxt	take
sliding_window	compose	diff	drop
interspose	interleave	…

Simple example

OOP

class Transformer(object):
    def __init__(self, collection):
        self.data = collection

    def func(self, collection):
        return filter(lambda x: x % 2 ==0, collection)

    def transform(self):
        self.data = self.func(self.data)

tr = Transformer(range(10))
tr.transform()
tr.data

FP

def evens(collection):
    return filter(lambda x: x % 2 ==0, collection)

def transform(func, collection):
    return func(collection)

transform(evens, range(10))

Refactor journey

The mess

import subprocess, sqlite3

def long_crappy_function():
    ## launching a shell command
    ls_cmd = 'ls'
    p = subprocess.Popen(ls_cmd,
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    ## filtering the output of a shell command
    out, err = p.communicate()
    res = []
    for line in out.decode('utf-8').splitlines():
        if 'to-match' in line:
            res.append(line)

    ## updating the results to database
    dbc = sqlite3.connect("lambda.db")
    cursor = dbc.cursor()

    for r in res:
       cursor.execute('INSERT INTO table VALUES (%s)' % r)

Extract ‘ls’ execution

def run_ls():
    ## launching a shell command
    ls_cmd = 'ls'
    p = subprocess.Popen(ls_cmd,
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    ## filtering the output of a shell command
    out, err = p.communicate()
    return out.decode('utf-8').splitlines()

Extract database update

def update_to_database(res):
    ## updating the results to database
    dbc = sqlite3.connect("lambda.db")
    cursor = dbc.cursor()

    for r in res:
       cursor.execute('INSERT INTO table VALUES (%s)' % r)

Extract filter output

def filter_output(lines):
    res = []
    for line in lines:
        if 'to-match' in line:
            res.append(line)

    return res

Or even better:

def filter_output(lines):
    return filter(lambda l: 'to-match' in l, lines)

# or using partial application
def filter_output2(lines):
    match_function = functools.partial(operator.contains, 'to-match')
    return filter(match_function, lines)

Now test this

Unit tests becomes possible

def filter_output(out):
    return filter(lambda l: 'to-match' in l, out)

def test_filter_output():
    lines = ['x1: to-match', 'x2', 'x3: to-match..']
    desired = ['x1: to-match', 'x3: to-match..']
    assert filter_output(lines) == desired

And finally

def write_filtered_ls_to_db():
    """Do a bit of everything
    """
    out = run_ls()
    res = filter_output(out)
    update_to_database(res)

Find the diamond

In the beginning

def merge_part_files():
    merged_key = output_bucket.initiate_multipart_upload(output_key)
    chunk = StringIO()
    parts = 0
    for key in input_bucket.list(input_prefix):
        if key.name.endswith(".gz"):
            if key.size < FIVE_MEGABYTES:
                chunk.write(key.get_contents_as_string())
                if chunk.len > FIVE_MEGABYTES:
                    chunk.seek(0)
                    parts += 1
                    merged_key.upload_part_from_file(chunk, parts)
                    chunk.close()
                    chunk = StringIO()
            else:
                parts += 1
                merged_key.copy_part_from_key(input_bucket.name, key.name,
                                              parts)
    if chunk.len:
        chunk.seek(0)
        parts += 1
        merged_key.upload_part_from_file(chunk, parts)
        chunk.close()

Fix the bug

while idx < len(key_list):
    key = key_list[idx]
    if key.size < CHUNK_SIZE:
        while chunk.len < CHUNK_SIZE:
            chunk.write(key.get_contents_as_string())
            idx += 1
            key = key_list[idx]

        chunk.seek(0)
        _inc(chunk.len)
        out.mp.upload_part_from_file(chunk, out.parts)
        chunk.close()
        chunk = StringIO()

    elif idx < len(key_list):
        _inc(key.size)
        out.mp.copy_part_from_key(input_bucket.name, key.key, out.parts)
        idx += 1

Extract partitioning

Key = namedtuple('Key', ['name', 'size'])
@pytest.mark.parametrize(('input', 'expected'), [
    ([Key('a', 10), Key('b', 20), Key('c', 50), Key('d', 60)],
     [[Key('a', 10), Key('b', 20), Key('c', 50)], Key('d', 60)]),
    ([Key('a', 40), Key('b', 40), Key('c', 20)],
     [[Key('a', 40), Key('b', 40)], [Key('c', 20)]]),
])
def test_partitioned_list(input, expected):
    assert list(s3_utils.partition_list(input, threshold=50)) == expected

def partition_list(lis, threshold):
    chunk, partial = [], 0
    idx = 0
    while idx < len(lis):
        if lis[idx].size < threshold:
            while partial < threshold and idx < len(lis):
                chunk.append(lis[idx])
                partial += lis[idx].size
                idx += 1

            yield chunk
            chunk, partial = [], 0
        else:
            yield lis[idx]
            idx += 1

Use the new abstraction

def merge_part_files():
    for parts in partition_list(key_list, threshold=CHUNK_SIZE):
        if isinstance(parts, list):
            chunk = StringIO()
            for part in parts:
                chunk.write(part.get_contents_as_string())
                _inc(out, part.size)

            chunk.seek(0)
            out.mp.upload_part_from_file(chunk, out.parts)
        else:
            _inc(out, parts.size)
            out.mp.copy_part_from_key(input_bucket.name, parts.key, out.parts)

Conclusions

Improved

Lock Free Concurrency.
Brevity. (Modular Code)
Lazy Evaluation.
Composability.
Parallelism.
Improved ways of Testing.
Referential Transparency.
Lesser Bugs.

Random tips

default to immutability
think in terms of transformations, not loops
compose your programs bottom-up
keep side effects and logic separate
write your tests first

Questions

Quotes

10 100

“It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.” —Alan Perlis

Describing

Functional programming is like describing your problem to a mathematician. Imperative programming is like giving instructions to an idiot. - Arcus #scheme

Cloud

OOP cannot save us from the Cloud Monster anymore. - Ju Gonçalves

Functions

Functional Programming is so called because a program consists entirely of functions.

John Hughes, Why Functional Programming Matters

Python FP

using Python for FP it’s like looking at a beautiful view through a dirty window - Andrea Crotti

Resources

Okasaki for persistent data structures
All Rich Hickey talks
Why functional programming matters

Extra material

Lambda calculus primer

Formal system for expressing computation based on

function abstraction
variable binding and substitution

Church numerals (s = suc):

$0 ≡ λ sz. z$

$1 ≡ λ sz. s(z)$

$2 ≡ λ sz. s(s(z))$

Lambda calculus 2

Successor

\begin{equation} S ≡ λ wyx. y(wyx) \end{equation} \begin{equation} S(0) ≡ (λ wyx.y(wyx))(λ sz.z) = \end{equation}

\begin{equation} λ yx.y ((λ sz. z) yx) = λ yx. y(x) ≡ 1 \end{equation}

Immutability

Persistent data structures 1/2

xs = [0, 1, 2]
ys = [3, 4, 5]

Persistent data structures 2/2

zs = xs ++ ys

Pypersistent?

Currying

toolz.curry

mul:: Int -> Int -> Int
mul x y = x * y

mulby3:: Int -> Int
mulby3 = \x -> mul x 3

Files

notes.org

Latest commit

History

notes.org

File metadata and controls

Gentle introduction to Functional Programming

What?

You shall NOT have

Why?

Give me a reason

OOP is dead

What is a function

Side effects

Domains

Functional languages

Haskell (1990)

Fibonacci

Fibonacci stream

Quicksort

Python vs Haskell

Screwed?

Functions

Partial application

Lazy sequences

Map

Reduce

Immutability and toolz

Other toolz functions

Simple example

OOP

FP

Refactor journey

The mess

Extract ‘ls’ execution

Extract database update

Extract filter output

Now test this

And finally

Find the diamond

In the beginning

Fix the bug

Extract partitioning

Use the new abstraction

Conclusions

Improved

Random tips

Questions

Quotes

10 100

Describing

Cloud

Functions

Python FP

Resources

Extra material

Lambda calculus primer

Lambda calculus 2

Immutability

Persistent data structures 1/2

Persistent data structures 2/2

Pypersistent?

Currying