Skip to content

Latest commit

 

History

History
1055 lines (767 loc) · 27.6 KB

notes.org

File metadata and controls

1055 lines (767 loc) · 27.6 KB

Gentle introduction to Functional Programming

What?

  • Programming paradigm based on lambda calculus (Alonzo Church)
  • functions are first class citizens
    • higher-order functions
    • partial application
    • currying
  • Functions are pure
    • immutable data: no state
    • encapsulated side effects
  • lazy sequences

You shall NOT have

  • Assignments
  • Mutable data structures
  • While/For Loops
  • Control Over Order of Execution
  • Side Effects

./images/wtf.gif

Why?

Give me a reason

  • threading sucks
  • OOP does not compose
  • write better code
  • it’s the next big paradigm shift (and it’s already happening)
  • you are already doing it

OOP is dead

./images/oop_rip.jpg

Object-oriented programming is both anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum.

Robert Harper (professor at Carniage Mellon University)

What is a function

$f(x) = 2 x + 1$

  • output depends only on the input
  • no side effect -> pure

$f(4) = 9$

  • referential transparency: a function call can be always replaced by its result

Side effects

Side effect:

def adder(x, y):
    res = x + y
    open('output.txt', 'w').write(str(res))
    return res

Non referential transparent:

SETTINGS = {'counter': 1}

def increment(inc_value):
    return SETTINGS['counter'] + inc_value

Domains

./images/domain.png

Functional languages

  • Haskell
  • Clojure
  • Scala
  • F#
  • Erlang
  • Elixir
  • Elm

Haskell (1990)

  • pure
  • lazy
  • pattern matching
  • algebraic data types
  • type inference

./images/haskell.png

Fibonacci

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

Fibonacci stream

Or better:

-- zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
fibs:: [Int]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

-- get the 10th fibonacci number from
fib :: Int -> Int
fib n = fibs !! n

Which in Python is roughly:

first_fibs = [0, 1, 2, 3, 5]

fibs = [0, 1] + list(map(sum, zip(first_fibs, first_fibs[1:])))
# Out[5]: [0, 1, 1, 2, 3, 5, 8]

Quicksort

Pseudocode:

quicksort(A, lo, hi)
    if lo < hi
        p := pivot(A, lo, hi)
        left, right := partition(A, p, lo, hi)
        quicksort(A, lo, left)
        quicksort(A, right, hi)

Haskell:

quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) =
    let smallerSorted = quicksort [a | a <- xs, a <= x]
        biggerSorted = quicksort [a | a <- xs, a > x]

    in  smallerSorted ++ [x] ++ biggerSorted

Python vs Haskell

  • ✓ functions first class citizens
  • ✓ lazy sequences (kind of)
  • ❌ immutability (partial)
  • ❌ Algebraic data types
  • ❌ side effects encapsulation
  • ❌ type inference

Screwed?

./images/screwed.png

Functions

  • function
  • lambda
  • classmethod
  • method
  • staticmethod

Demo time

Partial application

functools.partial

import functools, operator

def mul(x, y):
    return x * y

mul(3, 10) # 30

mulby3 = functools.partial(operator.mul, 3)
mulby3(10) # 30

Equivalent to:

def mulby3(x):
    return mul(x, 3)

Lazy sequences

  • generators
  • itertools
import itertools

def fib_gen():
    a, b = 0, 1
    yield a
    while True:
        yield b
        a, b = b, a + b

for num in itertools.islice(fib_gen(), 10, 100):
    print(num)

Map

Standard for loop:

lis = [1, 2, 3, 4]

newlis = []
for l in lis:
    newlis.append(l * 2)

List comprehension:

[l * 2 for l in lis]

Look mum no loop:

import operator
mul_by_two = functools.partial(operator.mul, 2)
map(mul_by_two, lis)

Reduce

Loopy:

lis = [1, 2, 3, 4]

val = 0
for l in lis:
    val += 0

Not loopy:

val = functools.reduce(operator.add, lis)

Immutability and toolz

Toolz is a collection of utility functions inspired from FP languages.

import toolz

bills = {
    "Alice": 0,
    "Bob": 1,
}

MUTABLE:

def change_inline(bills):
    for key, val in bills.items():
        bills[key] = val + 1

IMMUTABLE:

def change_immutable(dic):
    inc = functools.partial(operator.add, 1)
    return toolz.valmap(inc, dic)

Other toolz functions

assocdissocitemmapitemfilter
mergemerge_withvalfiltervalmap
partitiongroupbyjuxttake
sliding_windowcomposediffdrop
intersposeinterleave

Simple example

OOP

class Transformer(object):
    def __init__(self, collection):
        self.data = collection

    def func(self, collection):
        return filter(lambda x: x % 2 ==0, collection)

    def transform(self):
        self.data = self.func(self.data)

tr = Transformer(range(10))
tr.transform()
tr.data

FP

def evens(collection):
    return filter(lambda x: x % 2 ==0, collection)

def transform(func, collection):
    return func(collection)

transform(evens, range(10))

Refactor journey

The mess

import subprocess, sqlite3

def long_crappy_function():
    ## launching a shell command
    ls_cmd = 'ls'
    p = subprocess.Popen(ls_cmd,
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    ## filtering the output of a shell command
    out, err = p.communicate()
    res = []
    for line in out.decode('utf-8').splitlines():
        if 'to-match' in line:
            res.append(line)

    ## updating the results to database
    dbc = sqlite3.connect("lambda.db")
    cursor = dbc.cursor()

    for r in res:
       cursor.execute('INSERT INTO table VALUES (%s)' % r)

Extract ‘ls’ execution

def run_ls():
    ## launching a shell command
    ls_cmd = 'ls'
    p = subprocess.Popen(ls_cmd,
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    ## filtering the output of a shell command
    out, err = p.communicate()
    return out.decode('utf-8').splitlines()

Extract database update

def update_to_database(res):
    ## updating the results to database
    dbc = sqlite3.connect("lambda.db")
    cursor = dbc.cursor()

    for r in res:
       cursor.execute('INSERT INTO table VALUES (%s)' % r)

Extract filter output

def filter_output(lines):
    res = []
    for line in lines:
        if 'to-match' in line:
            res.append(line)

    return res

Or even better:

def filter_output(lines):
    return filter(lambda l: 'to-match' in l, lines)

# or using partial application
def filter_output2(lines):
    match_function = functools.partial(operator.contains, 'to-match')
    return filter(match_function, lines)

Now test this

Unit tests becomes possible

def filter_output(out):
    return filter(lambda l: 'to-match' in l, out)

def test_filter_output():
    lines = ['x1: to-match', 'x2', 'x3: to-match..']
    desired = ['x1: to-match', 'x3: to-match..']
    assert filter_output(lines) == desired

And finally

def write_filtered_ls_to_db():
    """Do a bit of everything
    """
    out = run_ls()
    res = filter_output(out)
    update_to_database(res)

Find the diamond

In the beginning

def merge_part_files():
    merged_key = output_bucket.initiate_multipart_upload(output_key)
    chunk = StringIO()
    parts = 0
    for key in input_bucket.list(input_prefix):
        if key.name.endswith(".gz"):
            if key.size < FIVE_MEGABYTES:
                chunk.write(key.get_contents_as_string())
                if chunk.len > FIVE_MEGABYTES:
                    chunk.seek(0)
                    parts += 1
                    merged_key.upload_part_from_file(chunk, parts)
                    chunk.close()
                    chunk = StringIO()
            else:
                parts += 1
                merged_key.copy_part_from_key(input_bucket.name, key.name,
                                              parts)
    if chunk.len:
        chunk.seek(0)
        parts += 1
        merged_key.upload_part_from_file(chunk, parts)
        chunk.close()

Fix the bug

while idx < len(key_list):
    key = key_list[idx]
    if key.size < CHUNK_SIZE:
        while chunk.len < CHUNK_SIZE:
            chunk.write(key.get_contents_as_string())
            idx += 1
            key = key_list[idx]

        chunk.seek(0)
        _inc(chunk.len)
        out.mp.upload_part_from_file(chunk, out.parts)
        chunk.close()
        chunk = StringIO()

    elif idx < len(key_list):
        _inc(key.size)
        out.mp.copy_part_from_key(input_bucket.name, key.key, out.parts)
        idx += 1

Extract partitioning

Key = namedtuple('Key', ['name', 'size'])
@pytest.mark.parametrize(('input', 'expected'), [
    ([Key('a', 10), Key('b', 20), Key('c', 50), Key('d', 60)],
     [[Key('a', 10), Key('b', 20), Key('c', 50)], Key('d', 60)]),
    ([Key('a', 40), Key('b', 40), Key('c', 20)],
     [[Key('a', 40), Key('b', 40)], [Key('c', 20)]]),
])
def test_partitioned_list(input, expected):
    assert list(s3_utils.partition_list(input, threshold=50)) == expected
def partition_list(lis, threshold):
    chunk, partial = [], 0
    idx = 0
    while idx < len(lis):
        if lis[idx].size < threshold:
            while partial < threshold and idx < len(lis):
                chunk.append(lis[idx])
                partial += lis[idx].size
                idx += 1

            yield chunk
            chunk, partial = [], 0
        else:
            yield lis[idx]
            idx += 1

Use the new abstraction

def merge_part_files():
    for parts in partition_list(key_list, threshold=CHUNK_SIZE):
        if isinstance(parts, list):
            chunk = StringIO()
            for part in parts:
                chunk.write(part.get_contents_as_string())
                _inc(out, part.size)

            chunk.seek(0)
            out.mp.upload_part_from_file(chunk, out.parts)
        else:
            _inc(out, parts.size)
            out.mp.copy_part_from_key(input_bucket.name, parts.key, out.parts)

Conclusions

Improved

  1. Lock Free Concurrency.
  2. Brevity. (Modular Code)
  3. Lazy Evaluation.
  4. Composability.
  5. Parallelism.
  6. Improved ways of Testing.
  7. Referential Transparency.
  8. Lesser Bugs.

Random tips

  • default to immutability
  • think in terms of transformations, not loops
  • compose your programs bottom-up
  • keep side effects and logic separate
  • write your tests first

Questions

./images/questions.gif

Quotes

10 100

“It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.” —Alan Perlis

Describing

Functional programming is like describing your problem to a mathematician. Imperative programming is like giving instructions to an idiot. - Arcus #scheme

Cloud

OOP cannot save us from the Cloud Monster anymore. - Ju Gonçalves

Functions

Functional Programming is so called because a program consists entirely of functions.

  • John Hughes, Why Functional Programming Matters

Python FP

using Python for FP it’s like looking at a beautiful view through a dirty window - Andrea Crotti

Resources

  • Okasaki for persistent data structures
  • All Rich Hickey talks
  • Why functional programming matters

Extra material

Lambda calculus primer

Formal system for expressing computation based on

  • function abstraction
  • variable binding and substitution

Church numerals (s = suc):

$0 ≡ λ sz. z$

$1 ≡ λ sz. s(z)$

$2 ≡ λ sz. s(s(z))$

Lambda calculus 2

Successor

\begin{equation} S ≡ λ wyx. y(wyx) \end{equation} \begin{equation} S(0) ≡ (λ wyx.y(wyx))(λ sz.z) = \end{equation}

\begin{equation} λ yx.y ((λ sz. z) yx) = λ yx. y(x) ≡ 1 \end{equation}

Immutability

./images/too_many_objects.png

Persistent data structures 1/2

xs = [0, 1, 2]
ys = [3, 4, 5]

./images/persistent1.png

Persistent data structures 2/2

zs = xs ++ ys

./images/persistent2.png

Pypersistent?

Currying

toolz.curry

mul:: Int -> Int -> Int
mul x y = x * y

mulby3:: Int -> Int
mulby3 = \x -> mul x 3